[ensembl-dev] Question regarding canonical transcripts

Duarte Molha duartemolha at gmail.com
Wed Jul 27 10:32:07 BST 2016


Thank you Andy

I think I get it now. Can I ask a related question then? If I want to apply
the same principles that ensembl uses to select the canonical transcript
but want to ignore predicted refseq transcripts XM.. then I need to make my
own custom code to iterate across all refseq NM transcripts to determine
which one fits the description, correct?





[image: --]
Duarte Molha
[image: https://]about.me/duarte
<https://about.me/duarte?promo=email_sig>

On 26 July 2016 at 17:15, Andrew Yates <ayates at ebi.ac.uk> wrote:

> Hi Duarte
>
> No we are not saying there are two possible canonical transcripts because
> of their curated/predicted status.
>
> I did a quick search and found a relevant bit of information from UCSC's
> genome mailing list. The knownCanonical table is populated by UCSC [1] and
> not by RefSeq. The rules Ensembl has used to select a canonical transcript
> from our own gene set [2] and the rules UCSC [3] have used to select from
> the RefSeq set are not the same.
>
> Neither Ensembl nor UCSC claim this is a canonical transcript assigned by
> RefSeq. In both cases it is the application of our rules to an externally
> imported gene set.
>
> Andy
>
> 1 -
> https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/_6asF5KciPc/ANihqywjAwAJ
> 2 -
> https://github.com/Ensembl/ensembl/blob/release/85/modules/Bio/EnsEMBL/Utils/TranscriptSelector.pm#L46
> 3 - http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=knownGene
>
> ------------
> Andrew Yates - Genomics Technology Infrastructure Team Leader
> The European Bioinformatics Institute (EMBL-EBI)
> Wellcome Genome Campus
> Hinxton, Cambridge
> CB10 1SD, United Kingdom
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> Skype: andy.yates.ebi
> http://www.ebi.ac.uk/
> http://www.ensembl.org/
>
> On 26 Jul 2016, at 16:44, Duarte Molha <duartemolha at gmail.com> wrote:
>
> Now I am really confused !
>
> Even the UCSC tables link NM_003036.3  as the canonical transcript. Does
> this mean there can be 2 possible canonical transcripts
>
> one for curated annotations and one for predicted?
>
>
> Here is the table linkage of refseq transcripts in the knownCanonical
> table
>
> #filter: kgXref.geneSymbol = 'SKI'
> #hg19.knownCanonical.chrom	hg19.knownCanonical.chromStart	hg19.knownCanonical.chromEnd	hg19.knownCanonical.clusterId	hg19.knownCanonical.transcript	hg19.knownCanonical.protein	hg19.kgXref.geneSymbol	hg19.kgXref.refseq	hg19.kgXref.protAcc	hg19.kgXref.description
> chr1	2160133	2241652	98	uc001aja.4	uc001aja.4	SKI	NM_003036	NP_003027	Homo sapiens v-ski sarcoma viral oncogene homolog (avian) (SKI), mRNA.
>
>
>
> On 26 July 2016 at 16:06, mag <mr6 at ebi.ac.uk> wrote:
>
>> Hi Duarte,
>>
>> A canonical transcript is usually the transcript with the longest
>> translation for a given gene
>> http://www.ensembl.org/Help/Glossary?id=346
>>
>> In your example, XP_005244832.1 has a translation of 730 aa while
>> NP_003027.1 only has 728.
>> Hence, it is chosen as the canonical transcript.
>>
>> As Kieron mentioned, if you want specifically curated RefSeq annotation,
>> it might be better to fetch all external annotations then filter out the
>> ones you are interested in.
>>
>>
>> Regards,
>> Magali
>>
>>
>> On 25/07/2016 17:07, Duarte Molha wrote:
>>
>> I will try and produce here the relevant parts of the script.
>>
>> But I still am at loss why  XP_005244832.1
>> <http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> has been tagged as
>> canonical
>>
>> For what you are saying is that I simply might not have cycled trough all
>> of the refseq transcripts... but is there going to be more than one
>> refseq transcript tagged as canonical for each gene?
>>
>> Not sure I follow!
>>
>> Thanks
>>
>> Duarte
>>
>>
>>
>>
>>
>>
>> [image: --]
>> Duarte Molha
>> [image: https://]about.me/duarte
>> <https://about.me/duarte?promo=email_sig>
>>
>> On 25 July 2016 at 11:58, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:
>>
>>> Hi Duarte,
>>>
>>> Can you send us a snippet of code that accesses the external database
>>> adaptor (DBEntryAdaptor?). It sounds like you may not be reading enough of
>>> your results to get the RefSeq ID you expect. We have all of the RefSeq IDs
>>> you mention associated at some level to the transcript, but some are from
>>> "RefSeq peptide predicted" for example.
>>>
>>> Kieron
>>>
>>>
>>>
>>> Kieron Taylor PhD.
>>> Ensembl Developer
>>>
>>> EMBL, European Bioinformatics Institute
>>>
>>>
>>>
>>>
>>>
>>>
>>> > On 22 Jul 2016, at 10:47, Duarte Molha <duartemolha at gmail.com> wrote:
>>> >
>>> > Hi Guys
>>> >
>>> > I have a script that based on a gene symbol connects to ensembl and
>>> retrieves the canonical transcript and then does the same using the
>>> external database adaptor to get the canonical refseq transcript.
>>> >
>>> > However this does not seem to give me the correct result
>>> >
>>> > Take for example the gene SKI ( I am using GRCh37 assembly btw)
>>> >
>>> > If you open this gene on the Ensembl browser:
>>> >
>>> >
>>> http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
>>> >
>>> >
>>> > On SKI, Ensembl annotates as the canonical transcript: ENST00000378536
>>> >
>>> > However, using by script, the external database adaptor returns the
>>> refseq XP_005244832.1 as the refseq canonical transcript, even though the
>>> correct canonical transcripts is NM_003036.3
>>> >
>>> > http://www.ncbi.nlm.nih.gov/gene/6497
>>> >
>>> > Unless I am understanding this incorrectly if the coding regions is
>>> the same length in 2 transcripts the longest should be the canonical
>>> >
>>> > The longer Refseq is NM_003036.3  (has a longer 5prime UTR)
>>> >
>>> > Can you help me understand this?
>>> >
>>> > Many thanks
>>> >
>>> > Duarte
>>> > _______________________________________________
>>> > Dev mailing list    Dev at ensembl.org
>>> > Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> > Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160727/b2c7a516/attachment.html>


More information about the Dev mailing list