[ensembl-dev] biotype in gtf file
Arno Velds
a.velds at nki.nl
Thu Nov 3 09:29:47 GMT 2011
Hi,
I have a question about the ensembl GTF file (for Homo sapiens release 64).
If I look at the ensembl website for Gene: PTTG1 (ENSG00000164611), the
transcript ENST00000524244 and ENST00000523659 have the biotype
'retained intron' and no protein product. The gtf file however shows in
the 9th column the biotype protein coding. This is not what I expect.
5 retained_intron exon 159848862 159848898 . + .
gene_id "ENSG00000164611"; transcript_id "ENST00000524244";
exon_number "1"; gene_name "PTTG1"; gene_biotype "protein_coding";
transcript_name "PTTG1-003";
Is the biotype incorrectly set or do i have to interpret this file
differently? This not an isolated case, for example retained intron
lists these biotype counts in the group column for all gtf lines:
retained_intron polymorphic_pseudogene 125
retained_intron lincRNA 480
retained_intron processed_transcript 1643
retained_intron protein_coding 82818
This number is also quite high:
processed_transcript protein_coding 115114
I attached the entire table of counts to this mail.
Thanks your assistance,
Arno
<http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000164611;r=5:159848829-159855748;t=ENST00000523659>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: biotypes.txt
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20111103/c24734e5/attachment.txt>
More information about the Dev
mailing list