[ensembl-dev] biotype in gtf file

Arno Velds a.velds at nki.nl
Thu Nov 3 09:29:47 GMT 2011


Hi,

I have a question about the ensembl GTF file (for Homo sapiens release 64).

If I look at the ensembl website for Gene: PTTG1 (ENSG00000164611), the 
transcript ENST00000524244 and ENST00000523659 have the biotype 
'retained intron' and no protein product. The gtf file however shows in 
the 9th column the biotype protein coding. This is not what I expect.

5    retained_intron    exon    159848862    159848898    .    +    .    
  gene_id "ENSG00000164611"; transcript_id "ENST00000524244"; 
exon_number "1"; gene_name "PTTG1"; gene_biotype "protein_coding"; 
transcript_name "PTTG1-003";


Is the biotype incorrectly set or do i have to interpret this file 
differently? This not an isolated case, for example retained intron 
lists these biotype counts in the group column for all gtf lines:

retained_intron    polymorphic_pseudogene    125
retained_intron    lincRNA    480
retained_intron    processed_transcript    1643
retained_intron    protein_coding    82818

This number is also quite high:
processed_transcript    protein_coding    115114

I attached the entire table of counts to this mail.


Thanks your assistance,

Arno


<http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000164611;r=5:159848829-159855748;t=ENST00000523659> 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: biotypes.txt
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20111103/c24734e5/attachment.txt>


More information about the Dev mailing list