[ensembl-dev] Question regarding canonical transcripts

林琼芬 qiongfen0 at gmail.com
Fri Jul 29 07:32:43 BST 2016


yes, just like the one below
1 25372580 rs12731221 G A
1 28733759 rs78873359 CA C
2 1397282 rs9326165 G A
2 1405785 rs74412499 G A
2 88285154 rs149707353 C T
3 85008865 . C A
3 180575632 rs58197854 AT A
3 180575641 rs114361217 A T
5 42842763 rs9686343 C A
6 5109555 rs149371287 G A
6 143929729 rs6899521 T C
7 72024054 rs193119573 G A
7 72024079 rs376943542 G A
7 89571465 rs10226999 C G
10 11639703 rs77896587 G A
the VEP result would like this, do not have the canonical transcript. Thanks
a lot !
[image: 内嵌图片 1]


Best regard!
Lin

2016-07-27 20:50 GMT+08:00 Will McLaren <wm2 at ebi.ac.uk>:

> Hi Lin,
>
> Can you provide an example of some input for which VEP does not provide a
> canonical transcript?
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
> On 27 July 2016 at 08:02, 林琼芬 <qiongfen0 at gmail.com> wrote:
>
>> Hi Magali,
>> As you mean, a canonical transcript is usually the transcript with the
>> longest translation for a given gene, than, maybe all gene has a canonical
>> transcript. However, when I use VEP-release-77, some variants has no canonical
>> transcript result after annotation, would you know what happen to it?
>> Hope to hear form you.
>>
>> Best regard!
>> Lin
>>
>> 2016-07-26 23:06 GMT+08:00 mag <mr6 at ebi.ac.uk>:
>>
>>> Hi Duarte,
>>>
>>> A canonical transcript is usually the transcript with the longest
>>> translation for a given gene
>>> http://www.ensembl.org/Help/Glossary?id=346
>>>
>>> In your example, XP_005244832.1 has a translation of 730 aa while
>>> NP_003027.1 only has 728.
>>> Hence, it is chosen as the canonical transcript.
>>>
>>> As Kieron mentioned, if you want specifically curated RefSeq annotation,
>>> it might be better to fetch all external annotations then filter out the
>>> ones you are interested in.
>>>
>>>
>>> Regards,
>>> Magali
>>>
>>>
>>> On 25/07/2016 17:07, Duarte Molha wrote:
>>>
>>> I will try and produce here the relevant parts of the script.
>>>
>>> But I still am at loss why  XP_005244832.1
>>> <http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> has been tagged as
>>> canonical
>>>
>>> For what you are saying is that I simply might not have cycled trough
>>> all of the refseq transcripts... but is there going to be more than one
>>> refseq transcript tagged as canonical for each gene?
>>>
>>> Not sure I follow!
>>>
>>> Thanks
>>>
>>> Duarte
>>>
>>>
>>>
>>>
>>>
>>>
>>> [image: --]
>>> Duarte Molha
>>> [image: https://]about.me/duarte
>>> <https://about.me/duarte?promo=email_sig>
>>>
>>> On 25 July 2016 at 11:58, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:
>>>
>>>> Hi Duarte,
>>>>
>>>> Can you send us a snippet of code that accesses the external database
>>>> adaptor (DBEntryAdaptor?). It sounds like you may not be reading enough of
>>>> your results to get the RefSeq ID you expect. We have all of the RefSeq IDs
>>>> you mention associated at some level to the transcript, but some are from
>>>> "RefSeq peptide predicted" for example.
>>>>
>>>> Kieron
>>>>
>>>>
>>>>
>>>> Kieron Taylor PhD.
>>>> Ensembl Developer
>>>>
>>>> EMBL, European Bioinformatics Institute
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> > On 22 Jul 2016, at 10:47, Duarte Molha <duartemolha at gmail.com> wrote:
>>>> >
>>>> > Hi Guys
>>>> >
>>>> > I have a script that based on a gene symbol connects to ensembl and
>>>> retrieves the canonical transcript and then does the same using the
>>>> external database adaptor to get the canonical refseq transcript.
>>>> >
>>>> > However this does not seem to give me the correct result
>>>> >
>>>> > Take for example the gene SKI ( I am using GRCh37 assembly btw)
>>>> >
>>>> > If you open this gene on the Ensembl browser:
>>>> >
>>>> >
>>>> http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
>>>> >
>>>> >
>>>> > On SKI, Ensembl annotates as the canonical transcript: ENST00000378536
>>>> >
>>>> > However, using by script, the external database adaptor returns the
>>>> refseq XP_005244832.1 as the refseq canonical transcript, even though the
>>>> correct canonical transcripts is NM_003036.3
>>>> >
>>>> > http://www.ncbi.nlm.nih.gov/gene/6497
>>>> >
>>>> > Unless I am understanding this incorrectly if the coding regions is
>>>> the same length in 2 transcripts the longest should be the canonical
>>>> >
>>>> > The longer Refseq is NM_003036.3  (has a longer 5prime UTR)
>>>> >
>>>> > Can you help me understand this?
>>>> >
>>>> > Many thanks
>>>> >
>>>> > Duarte
>>>> > _______________________________________________
>>>> > Dev mailing list    Dev at ensembl.org
>>>> > Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> > Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>> --
>>
>> Arron Lin
>>
>> BGI Research Institute
>>
>> Email: qiongfen0 at gmail.com
>>
>> Beishan Industrial Zone| Yantian  District| Shenzhen 518083
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 

Arron Lin

BGI Research Institute

Email: qiongfen0 at gmail.com

Beishan Industrial Zone| Yantian  District| Shenzhen 518083
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160729/edc4a346/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 19262 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160729/edc4a346/attachment.png>


More information about the Dev mailing list