[ensembl-dev] Question regarding canonical transcripts
Duarte Molha
duartemolha at gmail.com
Fri Jul 22 10:47:42 BST 2016
Hi Guys
I have a script that based on a gene symbol connects to ensembl and
retrieves the canonical transcript and then does the same using the
external database adaptor to get the canonical refseq transcript.
However this does not seem to give me the correct result
Take for example the gene SKI ( I am using GRCh37 assembly btw)
If you open this gene on the Ensembl browser:
http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
On SKI, Ensembl annotates as the canonical transcript: ENST00000378536
<http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000157933;r=1:2159997-2161343;t=ENST00000378536>
However, using by script, the external database adaptor returns the refseq
XP_005244832.1 <http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> as the
refseq canonical transcript, even though the correct canonical transcripts
is NM_003036.3
http://www.ncbi.nlm.nih.gov/gene/6497
Unless I am understanding this incorrectly if the coding regions is the
same length in 2 transcripts the longest should be the canonical
The longer Refseq is NM_003036.3 (has a longer 5prime UTR)
Can you help me understand this?
Many thanks
Duarte
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160722/6bb35626/attachment.html>
More information about the Dev
mailing list