[ensembl-dev] Question regarding canonical transcripts

Duarte Molha duartemolha at gmail.com
Fri Jul 22 10:47:42 BST 2016


Hi Guys

I have a script that based on a gene symbol connects to ensembl and
retrieves the canonical transcript and then does the same using the
external database adaptor to get the canonical refseq transcript.

However this does not seem to give me the correct result

Take for example the gene SKI ( I am using GRCh37 assembly btw)

If you open this gene on the Ensembl browser:

http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343


On SKI, Ensembl annotates as the canonical transcript: ENST00000378536
<http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000157933;r=1:2159997-2161343;t=ENST00000378536>

However, using by script, the external database adaptor returns the refseq
XP_005244832.1 <http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> as the
refseq canonical transcript, even though the correct canonical transcripts
is NM_003036.3

http://www.ncbi.nlm.nih.gov/gene/6497

Unless I am understanding this incorrectly if the coding regions is the
same length in 2 transcripts the longest should be the canonical

The longer Refseq is NM_003036.3  (has a longer 5prime UTR)

Can you help me understand this?

Many thanks

Duarte
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160722/6bb35626/attachment.html>


More information about the Dev mailing list