[ensembl-dev] Multiple NM numbers per transcript

Wolf Beat Beat.Wolf at hefr.ch
Thu Mar 3 12:34:27 GMT 2016


Thank you for this excellent explanation. This indeed answers my questions.

Kind regards

Beat Wolf
________________________________________
From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] on behalf of mag [mr6 at ebi.ac.uk]
Sent: Thursday, March 03, 2016 12:00 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] Multiple NM numbers per transcript

Hi Beat Wolf,

By definition, the RefSeq transcript set and the Ensembl transcript set
are different.
RefSeq provide the original mRNA and peptide sequences, while Ensembl
align those sequences against an assembly and provide a model based on
the underlying genome sequence.
It is hence not possible to have a perfect one-to-one mapping between
the two resources.

If you look at the TP53 region in the genome
http://grch37.ensembl.org/Homo_sapiens/Share/7549c9ab053eb1682e4cca0b64e0861e227445319
where the RefSeq models have been represented as best possible against
the genome sequence

You can see that there are 4 Ensembl protein-coding transcripts starting
on the furthest 5' end of the region.
In comparison, there are 16 RefSeq mRNA sequences for that same region.

We attempt to provide mappings between the two resources where possible,
based on codon overlap (coding and non-coding).
We allow for mappings which are not 100%, due to the nature of the
resources.
This results in one Ensembl transcript possibly mapping several RefSeq
transcripts.
Please note there are some RefSeq transcripts for which we do not have
any mappings, as the differences are too large to account for sequencing
errors.
For example, NM_001126115 or NM_001276697 do not have a mapping as the
resulting models are much shorter than any of the Ensembl ones.

If you look at the supporting evidence used to build the Ensembl
transcript model
http://grch37.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000141510;r=17:7571720-7590856;t=ENST00000269305
you can see which RefSeq mRNA sequences were used at the time.
However, this is based on the evidence which was available at the time
the geneset was generated, in September 2013.


Hope this helps,
Magali

On 02/03/2016 16:38, Wolf Beat wrote:
> Hi everybody.
>
> I have a probably trivial question, but as a computer scientist some details in genetics sometimes get lost on me;)
>
> So, when i wanted to find out the NM number of TP53-001, i saw that this transcript has many different NM numbers:
> http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000141510;r=17:7571720-7590856;t=ENST00000269305
>
> What is strange to me, is that when i look at their details, they look different:
> http://www.ncbi.nlm.nih.gov/nuccore/NM_000546
> http://www.ncbi.nlm.nih.gov/nuccore/NM_001126112
>
> Those are two examples of NM numbers associated with TP53-001, but from what i understand those are clearly not the same transcript.
>
> Am i missing something important? What would the correct NM number be for this transcript and how can i determine which one it is?
>
> Kind regards
>
> Beat Wolf
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



More information about the Dev mailing list