[ensembl-dev] GTF dumps - processed_transcripts marked as protein_coding

Keiran Raine kr2 at sanger.ac.uk
Thu Jul 14 12:59:27 BST 2011


Hi all,

Has anyone else encountered this issue?

We use the GTF dumps for various things and one thing I'm having  
trouble understanding is why some items marked as  
'processed_transcripts' when viewed via the web interface are marked  
as 'protein_coding' in the dumps.

Take the gene MRPL40 (Homo sapiens).  MRPL40-003 is clearly indicated  
to have a Biotype of 'Processed transcript' yet the entry in the GTF  
file is 'protein_coding' (tabs replaced with space for readability):

22 protein_coding exon 19420462 19420871 . + . gene_id  
"ENSG00000185608"; transcript_id "ENST00000471259"; exon_number "1";  
gene_name "MRPL40"; transcript_name "MRPL40-003";
22 protein_coding exon 19422259 19422417 . + . gene_id  
"ENSG00000185608"; transcript_id "ENST00000471259"; exon_number "2";  
gene_name "MRPL40"; transcript_name "MRPL40-003";
22 protein_coding exon 19423161 19423533 . + . gene_id  
"ENSG00000185608"; transcript_id "ENST00000471259"; exon_number "3";  
gene_name "MRPL40"; transcript_name "MRPL40-003";

I'm aware that not all Biotypes are reflected directly, for instance  
I'm aware that 'Nonsense mediated decay' is still considered  
'protein_coding' as a protein is still generated, however there are  
many examples of transcripts in the GTF file being marked as  
'processed_transcript'.

It appears that the value being recorded is the biotype of the gene  
rather than the biotype of the transcript, is this correct?  If so is  
this what it should be or could it be a bug in the exports?  I can  
confirm that this is consistent between v58 and v63 of Ensembl.

Regards,

Keiran Raine
Senior Computer Biologist
The Cancer Genome Project
Ext: 7703
kr2 at sanger.ac.uk





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110714/8f904da6/attachment.html>


More information about the Dev mailing list