[ensembl-dev] Question regarding canonical transcripts

Duarte Molha duartemolha at gmail.com
Tue Jul 26 16:44:47 BST 2016


Now I am really confused !

Even the UCSC tables link NM_003036.3  as the canonical transcript. Does
this mean there can be 2 possible canonical transcripts

one for curated annotations and one for predicted?


Here is the table linkage of refseq transcripts in the knownCanonical table

#filter: kgXref.geneSymbol = 'SKI'
#hg19.knownCanonical.chrom	hg19.knownCanonical.chromStart	hg19.knownCanonical.chromEnd	hg19.knownCanonical.clusterId	hg19.knownCanonical.transcript	hg19.knownCanonical.protein	hg19.kgXref.geneSymbol	hg19.kgXref.refseq	hg19.kgXref.protAcc	hg19.kgXref.description
chr1	2160133	2241652	98	uc001aja.4	uc001aja.4	SKI	NM_003036	NP_003027	Homo
sapiens v-ski sarcoma viral oncogene homolog (avian) (SKI), mRNA.



On 26 July 2016 at 16:06, mag <mr6 at ebi.ac.uk> wrote:

> Hi Duarte,
>
> A canonical transcript is usually the transcript with the longest
> translation for a given gene
> http://www.ensembl.org/Help/Glossary?id=346
>
> In your example, XP_005244832.1 has a translation of 730 aa while
> NP_003027.1 only has 728.
> Hence, it is chosen as the canonical transcript.
>
> As Kieron mentioned, if you want specifically curated RefSeq annotation,
> it might be better to fetch all external annotations then filter out the
> ones you are interested in.
>
>
> Regards,
> Magali
>
>
> On 25/07/2016 17:07, Duarte Molha wrote:
>
> I will try and produce here the relevant parts of the script.
>
> But I still am at loss why  XP_005244832.1
> <http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1> has been tagged as
> canonical
>
> For what you are saying is that I simply might not have cycled trough all
> of the refseq transcripts... but is there going to be more than one
> refseq transcript tagged as canonical for each gene?
>
> Not sure I follow!
>
> Thanks
>
> Duarte
>
>
>
>
>
>
> [image: --]
> Duarte Molha
> [image: https://]about.me/duarte
> <https://about.me/duarte?promo=email_sig>
>
> On 25 July 2016 at 11:58, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:
>
>> Hi Duarte,
>>
>> Can you send us a snippet of code that accesses the external database
>> adaptor (DBEntryAdaptor?). It sounds like you may not be reading enough of
>> your results to get the RefSeq ID you expect. We have all of the RefSeq IDs
>> you mention associated at some level to the transcript, but some are from
>> "RefSeq peptide predicted" for example.
>>
>> Kieron
>>
>>
>>
>> Kieron Taylor PhD.
>> Ensembl Developer
>>
>> EMBL, European Bioinformatics Institute
>>
>>
>>
>>
>>
>>
>> > On 22 Jul 2016, at 10:47, Duarte Molha <duartemolha at gmail.com> wrote:
>> >
>> > Hi Guys
>> >
>> > I have a script that based on a gene symbol connects to ensembl and
>> retrieves the canonical transcript and then does the same using the
>> external database adaptor to get the canonical refseq transcript.
>> >
>> > However this does not seem to give me the correct result
>> >
>> > Take for example the gene SKI ( I am using GRCh37 assembly btw)
>> >
>> > If you open this gene on the Ensembl browser:
>> >
>> >
>> http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343
>> >
>> >
>> > On SKI, Ensembl annotates as the canonical transcript: ENST00000378536
>> >
>> > However, using by script, the external database adaptor returns the
>> refseq XP_005244832.1 as the refseq canonical transcript, even though the
>> correct canonical transcripts is NM_003036.3
>> >
>> > http://www.ncbi.nlm.nih.gov/gene/6497
>> >
>> > Unless I am understanding this incorrectly if the coding regions is the
>> same length in 2 transcripts the longest should be the canonical
>> >
>> > The longer Refseq is NM_003036.3  (has a longer 5prime UTR)
>> >
>> > Can you help me understand this?
>> >
>> > Many thanks
>> >
>> > Duarte
>> > _______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160726/ab947450/attachment.html>


More information about the Dev mailing list