[ensembl-dev] Question regarding canonical transcripts

Fri Jul 29 09:24:12 BST 2016

Hi Lin,

This is actually not a case of Ensembl not providing a canonical
transcript. It actually shows your input variant overlapping only one
transcript of a gene, and that transcript is not the canonical one.

If you look at the transcript diagram [1] you can see ENST00000497517
extends many kb 5' of the other transcripts' start sites (beyond the 5kb
range within which VEP will call an overlap), so only that transcript is
annotated.

Regards

Will McLaren
Ensembl Variation

[1] : http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000115705

On 29 July 2016 at 07:32, 林琼芬 <qiongfen0 at gmail.com> wrote:

> yes, just like the one below
> 1 25372580 rs12731221 G A
> 1 28733759 rs78873359 CA C
> 2 1397282 rs9326165 G A
> 2 1405785 rs74412499 G A
> 2 88285154 rs149707353 C T
> 3 85008865 . C A
> 3 180575632 rs58197854 AT A
> 3 180575641 rs114361217 A T
> 5 42842763 rs9686343 C A
> 6 5109555 rs149371287 G A
> 6 143929729 rs6899521 T C
> 7 72024054 rs193119573 G A
> 7 72024079 rs376943542 G A
> 7 89571465 rs10226999 C G
> 10 11639703 rs77896587 G A
> the VEP result would like this, do not have the canonical transcript. Thanks
> a lot !
> [image: 内嵌图片 1]
>
>
> Best regard!
> Lin
>
> 2016-07-27 20:50 GMT+08:00 Will McLaren <wm2 at ebi.ac.uk>:
>
>> Hi Lin,
>>
>> Can you provide an example of some input for which VEP does not provide a
>> canonical transcript?
>>
>> Regards
>>
>> Will McLaren
>> Ensembl Variation
>>
>> On 27 July 2016 at 08:02, 林琼芬 <qiongfen0 at gmail.com> wrote:
>>
>>> Hi Magali,
>>> As you mean, a canonical transcript is usually the transcript with the
>>> longest translation for a given gene, than, maybe all gene has a canonical
>>> transcript. However, when I use VEP-release-77, some variants has no canonical
>>> transcript result after annotation, would you know what happen to it?
>>> Hope to hear form you.
>>>
>>> Best regard!
>>> Lin
>>>
>>> 2016-07-26 23:06 GMT+08:00 mag <mr6 at ebi.ac.uk>:
>>>
>>>> Hi Duarte,
>>>>
>>>> A canonical transcript is usually the transcript with the longest
>>>> translation for a given gene
>>>> http://www.ensembl.org/Help/Glossary?id=346
>>>>
>>>> In your example, XP_005244832.1 has a translation of 730 aa while
>>>> NP_003027.1 only has 728.
>>>> Hence, it is chosen as the canonical transcript.
>>>>
>>>> As Kieron mentioned, if you want specifically curated RefSeq
>>>> annotation, it might be better to fetch all external annotations then
>>>> filter out the ones you are interested in.
>>>>
>>>>
>>>> Regards,
>>>> Magali
>>>>
>>>>
>>>> On 25/07/2016 17:07, Duarte Molha wrote:
>>>>
>>>> I will try and produce here the relevant parts of the script.
>>>>
>>>> But I still am at loss why  XP_005244832.1
>>>> <http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> has been tagged
>>>> as canonical
>>>>
>>>> For what you are saying is that I simply might not have cycled trough
>>>> all of the refseq transcripts... but is there going to be more than one
>>>> refseq transcript tagged as canonical for each gene?
>>>>
>>>> Not sure I follow!
>>>>
>>>> Thanks
>>>>
>>>> Duarte
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [image: --]
>>>> Duarte Molha
>>>> [image: https://]about.me/duarte
>>>> <https://about.me/duarte?promo=email_sig>
>>>>
>>>> On 25 July 2016 at 11:58, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:
>>>>
>>>>> Hi Duarte,
>>>>>
>>>>> Can you send us a snippet of code that accesses the external database
>>>>> adaptor (DBEntryAdaptor?). It sounds like you may not be reading enough of
>>>>> your results to get the RefSeq ID you expect. We have all of the RefSeq IDs
>>>>> you mention associated at some level to the transcript, but some are from
>>>>> "RefSeq peptide predicted" for example.
>>>>>
>>>>> Kieron
>>>>>
>>>>>
>>>>>
>>>>> Kieron Taylor PhD.
>>>>> Ensembl Developer
>>>>>
>>>>> EMBL, European Bioinformatics Institute
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> > On 22 Jul 2016, at 10:47, Duarte Molha <duartemolha at gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi Guys
>>>>> >
>>>>> > I have a script that based on a gene symbol connects to ensembl and
>>>>> retrieves the canonical transcript and then does the same using the
>>>>> external database adaptor to get the canonical refseq transcript.
>>>>> >
>>>>> > However this does not seem to give me the correct result
>>>>> >
>>>>> > Take for example the gene SKI ( I am using GRCh37 assembly btw)
>>>>> >
>>>>> > If you open this gene on the Ensembl browser:
>>>>> >
>>>>> >
>>>>> http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
>>>>> >
>>>>> >
>>>>> > On SKI, Ensembl annotates as the canonical transcript:
>>>>> ENST00000378536
>>>>> >
>>>>> > However, using by script, the external database adaptor returns the
>>>>> refseq XP_005244832.1 as the refseq canonical transcript, even though the
>>>>> correct canonical transcripts is NM_003036.3
>>>>> >
>>>>> > http://www.ncbi.nlm.nih.gov/gene/6497
>>>>> >
>>>>> > Unless I am understanding this incorrectly if the coding regions is
>>>>> the same length in 2 transcripts the longest should be the canonical
>>>>> >
>>>>> > The longer Refseq is NM_003036.3  (has a longer 5prime UTR)
>>>>> >
>>>>> > Can you help me understand this?
>>>>> >
>>>>> > Many thanks
>>>>> >
>>>>> > Duarte
>>>>> > _______________________________________________
>>>>> > Dev mailing list    Dev at ensembl.org
>>>>> > Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> > Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Arron Lin
>>>
>>> BGI Research Institute
>>>
>>> Email: qiongfen0 at gmail.com
>>>
>>> Beishan Industrial Zone| Yantian  District| Shenzhen 518083
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> --
>
> Arron Lin
>
> BGI Research Institute
>
> Email: qiongfen0 at gmail.com
>
> Beishan Industrial Zone| Yantian  District| Shenzhen 518083
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160729/a1dd1e01/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 19262 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160729/a1dd1e01/attachment.png>