[ensembl-dev] Question regarding canonical transcripts

Kieron Taylor ktaylor at ebi.ac.uk
Mon Jul 25 11:58:56 BST 2016


Hi Duarte,

Can you send us a snippet of code that accesses the external database adaptor (DBEntryAdaptor?). It sounds like you may not be reading enough of your results to get the RefSeq ID you expect. We have all of the RefSeq IDs you mention associated at some level to the transcript, but some are from "RefSeq peptide predicted" for example.

Kieron



Kieron Taylor PhD.
Ensembl Developer

EMBL, European Bioinformatics Institute






> On 22 Jul 2016, at 10:47, Duarte Molha <duartemolha at gmail.com> wrote:
> 
> Hi Guys
> 
> I have a script that based on a gene symbol connects to ensembl and retrieves the canonical transcript and then does the same using the external database adaptor to get the canonical refseq transcript.
> 
> However this does not seem to give me the correct result
> 
> Take for example the gene SKI ( I am using GRCh37 assembly btw)
> 
> If you open this gene on the Ensembl browser:
> 
> http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
> 
> 
> On SKI, Ensembl annotates as the canonical transcript: ENST00000378536
> 
> However, using by script, the external database adaptor returns the refseq XP_005244832.1 as the refseq canonical transcript, even though the correct canonical transcripts is NM_003036.3 
> 
> http://www.ncbi.nlm.nih.gov/gene/6497
> 
> Unless I am understanding this incorrectly if the coding regions is the same length in 2 transcripts the longest should be the canonical 
> 
> The longer Refseq is NM_003036.3  (has a longer 5prime UTR)
> 
> Can you help me understand this?
> 
> Many thanks
> 
> Duarte
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list