[ensembl-dev] Question regarding canonical transcripts

Cyriac Kandoth kandothc at mskcc.org
Mon Aug 1 19:03:14 BST 2016


Hi, hope this is still relevant to this thread - what is the rationale for
choosing 5kb? Is there no evidence for promoter regions beyond that? Is it
the same limit at the 3' end?

~C

On Jul 29, 2016 4:24 AM, "Will McLaren" <wm2 at ebi.ac.uk> wrote:

> Hi Lin,
>
> This is actually not a case of Ensembl not providing a canonical
> transcript. It actually shows your input variant overlapping only one
> transcript of a gene, and that transcript is not the canonical one.
>
> If you look at the transcript diagram [1] you can see ENST00000497517
> extends many kb 5' of the other transcripts' start sites (beyond the 5kb
> range within which VEP will call an overlap), so only that transcript is
> annotated.
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
> [1] : http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000115705
>
> On 29 July 2016 at 07:32, 林琼芬 <qiongfen0 at gmail.com<mailto:
> qiongfen0 at gmail.com>> wrote:
> yes, just like the one below
> 1       25372580        rs12731221      G       A
> 1       28733759        rs78873359      CA      C
> 2       1397282 rs9326165       G       A
> 2       1405785 rs74412499      G       A
> 2       88285154        rs149707353     C       T
> 3       85008865        .       C       A
> 3       180575632       rs58197854      AT      A
> 3       180575641       rs114361217     A       T
> 5       42842763        rs9686343       C       A
> 6       5109555 rs149371287     G       A
> 6       143929729       rs6899521       T       C
> 7       72024054        rs193119573     G       A
> 7       72024079        rs376943542     G       A
> 7       89571465        rs10226999      C       G
> 10      11639703        rs77896587      G       A
>
> the VEP result would like this, do not have the canonical transcript.
> Thanks a lot !
> [内嵌图片 1]
>
>
> Best regard!
> Lin
>
> 2016-07-27 20:50 GMT+08:00 Will McLaren <wm2 at ebi.ac.uk<mailto:
> wm2 at ebi.ac.uk>>:
> Hi Lin,
>
> Can you provide an example of some input for which VEP does not provide a
> canonical transcript?
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
> On 27 July 2016 at 08:02, 林琼芬 <qiongfen0 at gmail.com<mailto:
> qiongfen0 at gmail.com>> wrote:
> Hi Magali,
> As you mean, a canonical transcript is usually the transcript with the
> longest translation for a given gene, than, maybe all gene has a canonical
> transcript. However, when I use VEP-release-77, some variants has no
> canonical transcript result after annotation, would you know what happen to
> it?
> Hope to hear form you.
>
> Best regard!
> Lin
>
> 2016-07-26 23:06 GMT+08:00 mag <mr6 at ebi.ac.uk<mailto:mr6 at ebi.ac.uk>>:
> Hi Duarte,
>
> A canonical transcript is usually the transcript with the longest
> translation for a given gene
> http://www.ensembl.org/Help/Glossary?id=346
>
> In your example, XP_005244832.1 has a translation of 730 aa while
> NP_003027.1 only has 728.
> Hence, it is chosen as the canonical transcript.
>
> As Kieron mentioned, if you want specifically curated RefSeq annotation,
> it might be better to fetch all external annotations then filter out the
> ones you are interested in.
>
>
> Regards,
> Magali
>
>
> On 25/07/2016 17:07, Duarte Molha wrote:
> I will try and produce here the relevant parts of the script.
>
> But I still am at loss why  XP_005244832.1<
> http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> has been tagged as
> canonical
>
> For what you are saying is that I simply might not have cycled trough all
> of the refseq transcripts... but is there going to be more than one refseq
> transcript tagged as canonical for each gene?
>
> Not sure I follow!
>
> Thanks
>
> Duarte
>
>
>
>
>
> <https://about.me/duarte?promo=email_sig>
>
> Duarte Molha
> about.me/duarte
>
>
>
>
>
>
>
> On 25 July 2016 at 11:58, Kieron Taylor <ktaylor at ebi.ac.uk<mailto:
> ktaylor at ebi.ac.uk>> wrote:
> Hi Duarte,
>
> Can you send us a snippet of code that accesses the external database
> adaptor (DBEntryAdaptor?). It sounds like you may not be reading enough of
> your results to get the RefSeq ID you expect. We have all of the RefSeq IDs
> you mention associated at some level to the transcript, but some are from
> "RefSeq peptide predicted" for example.
>
> Kieron
>
>
>
> Kieron Taylor PhD.
> Ensembl Developer
>
> EMBL, European Bioinformatics Institute
>
>
>
>
>
>
> > On 22 Jul 2016, at 10:47, Duarte Molha <duartemolha at gmail.com<mailto:
> duartemolha at gmail.com>> wrote:
> >
> > Hi Guys
> >
> > I have a script that based on a gene symbol connects to ensembl and
> retrieves the canonical transcript and then does the same using the
> external database adaptor to get the canonical refseq transcript.
> >
> > However this does not seem to give me the correct result
> >
> > Take for example the gene SKI ( I am using GRCh37 assembly btw)
> >
> > If you open this gene on the Ensembl browser:
> >
> >
> http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
> >
> >
> > On SKI, Ensembl annotates as the canonical transcript: ENST00000378536
> >
> > However, using by script, the external database adaptor returns the
> refseq XP_005244832.1 as the refseq canonical transcript, even though the
> correct canonical transcripts is NM_003036.3
> >
> > http://www.ncbi.nlm.nih.gov/gene/6497
> >
> > Unless I am understanding this incorrectly if the coding regions is the
> same length in 2 transcripts the longest should be the canonical
> >
> > The longer Refseq is NM_003036.3  (has a longer 5prime UTR)
> >
> > Can you help me understand this?
> >
> > Many thanks
> >
> > Duarte
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> > Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> --
>
> Arron Lin
>
> BGI Research Institute
>
> Email: qiongfen0 at gmail.com<mailto:qiongfen0 at gmail.com>
>
> Beishan Industrial Zone| Yantian  District| Shenzhen 518083
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> --
>
> Arron Lin
>
> BGI Research Institute
>
> Email: qiongfen0 at gmail.com<mailto:qiongfen0 at gmail.com>
>
> Beishan Industrial Zone| Yantian  District| Shenzhen 518083
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160801/76e61bfd/attachment.html>


More information about the Dev mailing list