[ensembl-dev] Question regarding canonical transcripts

Duarte Molha duartemolha at gmail.com
Mon Jul 25 17:07:12 BST 2016


I will try and produce here the relevant parts of the script.

But I still am at loss why  XP_005244832.1
<http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> has been tagged as
canonical

For what you are saying is that I simply might not have cycled trough all
of the refseq transcripts... but is there going to be more than one
refseq transcript tagged as canonical for each gene?

Not sure I follow!

Thanks

Duarte






[image: --]
Duarte Molha
[image: https://]about.me/duarte
<https://about.me/duarte?promo=email_sig>

On 25 July 2016 at 11:58, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:

> Hi Duarte,
>
> Can you send us a snippet of code that accesses the external database
> adaptor (DBEntryAdaptor?). It sounds like you may not be reading enough of
> your results to get the RefSeq ID you expect. We have all of the RefSeq IDs
> you mention associated at some level to the transcript, but some are from
> "RefSeq peptide predicted" for example.
>
> Kieron
>
>
>
> Kieron Taylor PhD.
> Ensembl Developer
>
> EMBL, European Bioinformatics Institute
>
>
>
>
>
>
> > On 22 Jul 2016, at 10:47, Duarte Molha <duartemolha at gmail.com> wrote:
> >
> > Hi Guys
> >
> > I have a script that based on a gene symbol connects to ensembl and
> retrieves the canonical transcript and then does the same using the
> external database adaptor to get the canonical refseq transcript.
> >
> > However this does not seem to give me the correct result
> >
> > Take for example the gene SKI ( I am using GRCh37 assembly btw)
> >
> > If you open this gene on the Ensembl browser:
> >
> >
> http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
> >
> >
> > On SKI, Ensembl annotates as the canonical transcript: ENST00000378536
> >
> > However, using by script, the external database adaptor returns the
> refseq XP_005244832.1 as the refseq canonical transcript, even though the
> correct canonical transcripts is NM_003036.3
> >
> > http://www.ncbi.nlm.nih.gov/gene/6497
> >
> > Unless I am understanding this incorrectly if the coding regions is the
> same length in 2 transcripts the longest should be the canonical
> >
> > The longer Refseq is NM_003036.3  (has a longer 5prime UTR)
> >
> > Can you help me understand this?
> >
> > Many thanks
> >
> > Duarte
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160725/e278ba2e/attachment.html>


More information about the Dev mailing list