[ensembl-dev] biotype in gtf file

Amonida Zadissa amonida at sanger.ac.uk
Thu Nov 3 11:36:11 GMT 2011


Hi Arno,

The second column gives the biotype for the current feature (in your
example, an exon). Exons belonging to the same transcript will have
the same biotype as the transcript itself, hence the second column
gives you the biotype for the current transcript.

Please note that protein coding genes may contain non-coding
transcripts, just like the example you gave, while protein coding
transcripts can't obviously belong to nonc-coding genes.

Hope this helps.

Cheers,
Amonida

On Thu, Nov 03, 2011 at 10:59:55AM +0100, Arno Velds wrote:
> Right I'll reply to myself. A helpful colleague reminded me that it says  
> gene_biotype in column 9, not transcript biotype. I should probably just  
> use the value in column 2 to to decide if a transcript has a product?
>
>
> On 11/03/2011 10:29 AM, Arno Velds wrote:
>> Hi,
>>
>> I have a question about the ensembl GTF file (for Homo sapiens release 64).
>>
>> If I look at the ensembl website for Gene: PTTG1 (ENSG00000164611), the
>> transcript ENST00000524244 and ENST00000523659 have the biotype
>> 'retained intron' and no protein product. The gtf file however shows in
>> the 9th column the biotype protein coding. This is not what I expect.
>>
>> 5    retained_intron    exon    159848862    159848898    .    +    .
>>    gene_id "ENSG00000164611"; transcript_id "ENST00000524244";
>> exon_number "1"; gene_name "PTTG1"; gene_biotype "protein_coding";
>> transcript_name "PTTG1-003";
>>
>>
>> Is the biotype incorrectly set or do i have to interpret this file
>> differently? This not an isolated case, for example retained intron
>> lists these biotype counts in the group column for all gtf lines:
>>
>> retained_intron    polymorphic_pseudogene    125
>> retained_intron    lincRNA    480
>> retained_intron    processed_transcript    1643
>> retained_intron    protein_coding    82818
>>
>> This number is also quite high:
>> processed_transcript    protein_coding    115114
>>
>> I attached the entire table of counts to this mail.
>>
>>
>> Thanks your assistance,
>>
>> Arno
>>
>>
>> <http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000164611;r=5:159848829-159855748;t=ENST00000523659>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list