[ensembl-dev] Multiple NM numbers per transcript

mag mr6 at ebi.ac.uk
Thu Mar 3 11:00:31 GMT 2016


Hi Beat Wolf,

By definition, the RefSeq transcript set and the Ensembl transcript set 
are different.
RefSeq provide the original mRNA and peptide sequences, while Ensembl 
align those sequences against an assembly and provide a model based on 
the underlying genome sequence.
It is hence not possible to have a perfect one-to-one mapping between 
the two resources.

If you look at the TP53 region in the genome
http://grch37.ensembl.org/Homo_sapiens/Share/7549c9ab053eb1682e4cca0b64e0861e227445319
where the RefSeq models have been represented as best possible against 
the genome sequence

You can see that there are 4 Ensembl protein-coding transcripts starting 
on the furthest 5' end of the region.
In comparison, there are 16 RefSeq mRNA sequences for that same region.

We attempt to provide mappings between the two resources where possible, 
based on codon overlap (coding and non-coding).
We allow for mappings which are not 100%, due to the nature of the 
resources.
This results in one Ensembl transcript possibly mapping several RefSeq 
transcripts.
Please note there are some RefSeq transcripts for which we do not have 
any mappings, as the differences are too large to account for sequencing 
errors.
For example, NM_001126115 or NM_001276697 do not have a mapping as the 
resulting models are much shorter than any of the Ensembl ones.

If you look at the supporting evidence used to build the Ensembl 
transcript model
http://grch37.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000141510;r=17:7571720-7590856;t=ENST00000269305
you can see which RefSeq mRNA sequences were used at the time.
However, this is based on the evidence which was available at the time 
the geneset was generated, in September 2013.


Hope this helps,
Magali

On 02/03/2016 16:38, Wolf Beat wrote:
> Hi everybody.
>
> I have a probably trivial question, but as a computer scientist some details in genetics sometimes get lost on me;)
>
> So, when i wanted to find out the NM number of TP53-001, i saw that this transcript has many different NM numbers:
> http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000141510;r=17:7571720-7590856;t=ENST00000269305
>
> What is strange to me, is that when i look at their details, they look different:
> http://www.ncbi.nlm.nih.gov/nuccore/NM_000546
> http://www.ncbi.nlm.nih.gov/nuccore/NM_001126112
>
> Those are two examples of NM numbers associated with TP53-001, but from what i understand those are clearly not the same transcript.
>
> Am i missing something important? What would the correct NM number be for this transcript and how can i determine which one it is?
>
> Kind regards
>
> Beat Wolf
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list