[ensembl-dev] biotype in gtf file
Amonida Zadissa
amonida at sanger.ac.uk
Thu Nov 3 11:36:11 GMT 2011
Hi Arno,
The second column gives the biotype for the current feature (in your
example, an exon). Exons belonging to the same transcript will have
the same biotype as the transcript itself, hence the second column
gives you the biotype for the current transcript.
Please note that protein coding genes may contain non-coding
transcripts, just like the example you gave, while protein coding
transcripts can't obviously belong to nonc-coding genes.
Hope this helps.
Cheers,
Amonida
On Thu, Nov 03, 2011 at 10:59:55AM +0100, Arno Velds wrote:
> Right I'll reply to myself. A helpful colleague reminded me that it says
> gene_biotype in column 9, not transcript biotype. I should probably just
> use the value in column 2 to to decide if a transcript has a product?
>
>
> On 11/03/2011 10:29 AM, Arno Velds wrote:
>> Hi,
>>
>> I have a question about the ensembl GTF file (for Homo sapiens release 64).
>>
>> If I look at the ensembl website for Gene: PTTG1 (ENSG00000164611), the
>> transcript ENST00000524244 and ENST00000523659 have the biotype
>> 'retained intron' and no protein product. The gtf file however shows in
>> the 9th column the biotype protein coding. This is not what I expect.
>>
>> 5 retained_intron exon 159848862 159848898 . + .
>> gene_id "ENSG00000164611"; transcript_id "ENST00000524244";
>> exon_number "1"; gene_name "PTTG1"; gene_biotype "protein_coding";
>> transcript_name "PTTG1-003";
>>
>>
>> Is the biotype incorrectly set or do i have to interpret this file
>> differently? This not an isolated case, for example retained intron
>> lists these biotype counts in the group column for all gtf lines:
>>
>> retained_intron polymorphic_pseudogene 125
>> retained_intron lincRNA 480
>> retained_intron processed_transcript 1643
>> retained_intron protein_coding 82818
>>
>> This number is also quite high:
>> processed_transcript protein_coding 115114
>>
>> I attached the entire table of counts to this mail.
>>
>>
>> Thanks your assistance,
>>
>> Arno
>>
>>
>> <http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000164611;r=5:159848829-159855748;t=ENST00000523659>
>>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list