[ensembl-dev] Multiple NM numbers per transcript
mag
mr6 at ebi.ac.uk
Thu Mar 3 11:00:31 GMT 2016
Hi Beat Wolf,
By definition, the RefSeq transcript set and the Ensembl transcript set
are different.
RefSeq provide the original mRNA and peptide sequences, while Ensembl
align those sequences against an assembly and provide a model based on
the underlying genome sequence.
It is hence not possible to have a perfect one-to-one mapping between
the two resources.
If you look at the TP53 region in the genome
http://grch37.ensembl.org/Homo_sapiens/Share/7549c9ab053eb1682e4cca0b64e0861e227445319
where the RefSeq models have been represented as best possible against
the genome sequence
You can see that there are 4 Ensembl protein-coding transcripts starting
on the furthest 5' end of the region.
In comparison, there are 16 RefSeq mRNA sequences for that same region.
We attempt to provide mappings between the two resources where possible,
based on codon overlap (coding and non-coding).
We allow for mappings which are not 100%, due to the nature of the
resources.
This results in one Ensembl transcript possibly mapping several RefSeq
transcripts.
Please note there are some RefSeq transcripts for which we do not have
any mappings, as the differences are too large to account for sequencing
errors.
For example, NM_001126115 or NM_001276697 do not have a mapping as the
resulting models are much shorter than any of the Ensembl ones.
If you look at the supporting evidence used to build the Ensembl
transcript model
http://grch37.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000141510;r=17:7571720-7590856;t=ENST00000269305
you can see which RefSeq mRNA sequences were used at the time.
However, this is based on the evidence which was available at the time
the geneset was generated, in September 2013.
Hope this helps,
Magali
On 02/03/2016 16:38, Wolf Beat wrote:
> Hi everybody.
>
> I have a probably trivial question, but as a computer scientist some details in genetics sometimes get lost on me;)
>
> So, when i wanted to find out the NM number of TP53-001, i saw that this transcript has many different NM numbers:
> http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000141510;r=17:7571720-7590856;t=ENST00000269305
>
> What is strange to me, is that when i look at their details, they look different:
> http://www.ncbi.nlm.nih.gov/nuccore/NM_000546
> http://www.ncbi.nlm.nih.gov/nuccore/NM_001126112
>
> Those are two examples of NM numbers associated with TP53-001, but from what i understand those are clearly not the same transcript.
>
> Am i missing something important? What would the correct NM number be for this transcript and how can i determine which one it is?
>
> Kind regards
>
> Beat Wolf
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list