[ensembl-dev] biotype in gtf file

Arno Velds a.velds at nki.nl
Thu Nov 3 09:59:55 GMT 2011


Right I'll reply to myself. A helpful colleague reminded me that it says 
gene_biotype in column 9, not transcript biotype. I should probably just 
use the value in column 2 to to decide if a transcript has a product?


On 11/03/2011 10:29 AM, Arno Velds wrote:
> Hi,
>
> I have a question about the ensembl GTF file (for Homo sapiens release 64).
>
> If I look at the ensembl website for Gene: PTTG1 (ENSG00000164611), the
> transcript ENST00000524244 and ENST00000523659 have the biotype
> 'retained intron' and no protein product. The gtf file however shows in
> the 9th column the biotype protein coding. This is not what I expect.
>
> 5    retained_intron    exon    159848862    159848898    .    +    .
>    gene_id "ENSG00000164611"; transcript_id "ENST00000524244";
> exon_number "1"; gene_name "PTTG1"; gene_biotype "protein_coding";
> transcript_name "PTTG1-003";
>
>
> Is the biotype incorrectly set or do i have to interpret this file
> differently? This not an isolated case, for example retained intron
> lists these biotype counts in the group column for all gtf lines:
>
> retained_intron    polymorphic_pseudogene    125
> retained_intron    lincRNA    480
> retained_intron    processed_transcript    1643
> retained_intron    protein_coding    82818
>
> This number is also quite high:
> processed_transcript    protein_coding    115114
>
> I attached the entire table of counts to this mail.
>
>
> Thanks your assistance,
>
> Arno
>
>
> <http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000164611;r=5:159848829-159855748;t=ENST00000523659>
>




More information about the Dev mailing list