[ensembl-dev] Question regarding canonical transcripts

mag mr6 at ebi.ac.uk
Tue Jul 26 16:06:54 BST 2016


Hi Duarte,

A canonical transcript is usually the transcript with the longest 
translation for a given gene
http://www.ensembl.org/Help/Glossary?id=346

In your example, XP_005244832.1 has a translation of 730 aa while 
NP_003027.1 only has 728.
Hence, it is chosen as the canonical transcript.

As Kieron mentioned, if you want specifically curated RefSeq annotation, 
it might be better to fetch all external annotations then filter out the 
ones you are interested in.


Regards,
Magali

On 25/07/2016 17:07, Duarte Molha wrote:
> I will try and produce here the relevant parts of the script.
>
> But I still am at loss why XP_005244832.1 
> <http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> has been tagged 
> as canonical
>
> For what you are saying is that I simply might not have cycled trough 
> all of the refseq transcripts... but is there going to be more than 
> one refseq transcript tagged as canonical for each gene?
>
> Not sure I follow!
>
> Thanks
>
> Duarte
>
>
>
>
> -- 
> Duarte Molha
> https://about.me/duarte
>
> <https://about.me/duarte?promo=email_sig>
>
>
> On 25 July 2016 at 11:58, Kieron Taylor <ktaylor at ebi.ac.uk 
> <mailto:ktaylor at ebi.ac.uk>> wrote:
>
>     Hi Duarte,
>
>     Can you send us a snippet of code that accesses the external
>     database adaptor (DBEntryAdaptor?). It sounds like you may not be
>     reading enough of your results to get the RefSeq ID you expect. We
>     have all of the RefSeq IDs you mention associated at some level to
>     the transcript, but some are from "RefSeq peptide predicted" for
>     example.
>
>     Kieron
>
>
>
>     Kieron Taylor PhD.
>     Ensembl Developer
>
>     EMBL, European Bioinformatics Institute
>
>
>
>
>
>
>     > On 22 Jul 2016, at 10:47, Duarte Molha <duartemolha at gmail.com
>     <mailto:duartemolha at gmail.com>> wrote:
>     >
>     > Hi Guys
>     >
>     > I have a script that based on a gene symbol connects to ensembl
>     and retrieves the canonical transcript and then does the same
>     using the external database adaptor to get the canonical refseq
>     transcript.
>     >
>     > However this does not seem to give me the correct result
>     >
>     > Take for example the gene SKI ( I am using GRCh37 assembly btw)
>     >
>     > If you open this gene on the Ensembl browser:
>     >
>     >
>     http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
>     >
>     >
>     > On SKI, Ensembl annotates as the canonical transcript:
>     ENST00000378536
>     >
>     > However, using by script, the external database adaptor returns
>     the refseq XP_005244832.1 as the refseq canonical transcript, even
>     though the correct canonical transcripts is NM_003036.3
>     >
>     > http://www.ncbi.nlm.nih.gov/gene/6497
>     >
>     > Unless I am understanding this incorrectly if the coding regions
>     is the same length in 2 transcripts the longest should be the
>     canonical
>     >
>     > The longer Refseq is NM_003036.3  (has a longer 5prime UTR)
>     >
>     > Can you help me understand this?
>     >
>     > Many thanks
>     >
>     > Duarte
>     > _______________________________________________
>     > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     > Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     > Ensembl Blog: http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160726/51a69817/attachment.html>


More information about the Dev mailing list