[ensembl-dev] Multiple NM numbers per transcript
Beat.Wolf at hefr.ch
Thu Mar 3 12:34:27 GMT 2016
Thank you for this excellent explanation. This indeed answers my questions.
From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] on behalf of mag [mr6 at ebi.ac.uk]
Sent: Thursday, March 03, 2016 12:00 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] Multiple NM numbers per transcript
Hi Beat Wolf,
By definition, the RefSeq transcript set and the Ensembl transcript set
RefSeq provide the original mRNA and peptide sequences, while Ensembl
align those sequences against an assembly and provide a model based on
the underlying genome sequence.
It is hence not possible to have a perfect one-to-one mapping between
the two resources.
If you look at the TP53 region in the genome
where the RefSeq models have been represented as best possible against
the genome sequence
You can see that there are 4 Ensembl protein-coding transcripts starting
on the furthest 5' end of the region.
In comparison, there are 16 RefSeq mRNA sequences for that same region.
We attempt to provide mappings between the two resources where possible,
based on codon overlap (coding and non-coding).
We allow for mappings which are not 100%, due to the nature of the
This results in one Ensembl transcript possibly mapping several RefSeq
Please note there are some RefSeq transcripts for which we do not have
any mappings, as the differences are too large to account for sequencing
For example, NM_001126115 or NM_001276697 do not have a mapping as the
resulting models are much shorter than any of the Ensembl ones.
If you look at the supporting evidence used to build the Ensembl
you can see which RefSeq mRNA sequences were used at the time.
However, this is based on the evidence which was available at the time
the geneset was generated, in September 2013.
Hope this helps,
On 02/03/2016 16:38, Wolf Beat wrote:
> Hi everybody.
> I have a probably trivial question, but as a computer scientist some details in genetics sometimes get lost on me;)
> So, when i wanted to find out the NM number of TP53-001, i saw that this transcript has many different NM numbers:
> What is strange to me, is that when i look at their details, they look different:
> Those are two examples of NM numbers associated with TP53-001, but from what i understand those are clearly not the same transcript.
> Am i missing something important? What would the correct NM number be for this transcript and how can i determine which one it is?
> Kind regards
> Beat Wolf
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
Dev mailing list Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
More information about the Dev