[ensembl-dev] biotype in gtf file
Arno Velds
a.velds at nki.nl
Thu Nov 3 09:59:55 GMT 2011
Right I'll reply to myself. A helpful colleague reminded me that it says
gene_biotype in column 9, not transcript biotype. I should probably just
use the value in column 2 to to decide if a transcript has a product?
On 11/03/2011 10:29 AM, Arno Velds wrote:
> Hi,
>
> I have a question about the ensembl GTF file (for Homo sapiens release 64).
>
> If I look at the ensembl website for Gene: PTTG1 (ENSG00000164611), the
> transcript ENST00000524244 and ENST00000523659 have the biotype
> 'retained intron' and no protein product. The gtf file however shows in
> the 9th column the biotype protein coding. This is not what I expect.
>
> 5 retained_intron exon 159848862 159848898 . + .
> gene_id "ENSG00000164611"; transcript_id "ENST00000524244";
> exon_number "1"; gene_name "PTTG1"; gene_biotype "protein_coding";
> transcript_name "PTTG1-003";
>
>
> Is the biotype incorrectly set or do i have to interpret this file
> differently? This not an isolated case, for example retained intron
> lists these biotype counts in the group column for all gtf lines:
>
> retained_intron polymorphic_pseudogene 125
> retained_intron lincRNA 480
> retained_intron processed_transcript 1643
> retained_intron protein_coding 82818
>
> This number is also quite high:
> processed_transcript protein_coding 115114
>
> I attached the entire table of counts to this mail.
>
>
> Thanks your assistance,
>
> Arno
>
>
> <http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000164611;r=5:159848829-159855748;t=ENST00000523659>
>
More information about the Dev
mailing list