[ensembl-dev] Bug?? Error Mapping EnsemblID to entrez id

mag mr6 at ebi.ac.uk
Fri Sep 4 09:58:23 BST 2015


Hi Ashok,

Mapping between resources is a complicated process which unfortunately 
exposes some edge cases like this one.

To map Ensembl genes to EntrezGene ids, there is no direct mapping 
available, hence we map via their respective transcripts, Ensembl 
transcripts and RefSeq mRNAs.
Where the data is available, we attempt to map based on genomic 
coordinates, but when everything else fails, the sequences are aligned.
Only the best hit is kept, but we do allow for mismatches as we know 
models can vary between Ensembl and RefSeq, in particular regarding UTR 
regions.
In this particular example, the Ensembl transcript ENST00000618217 
aligns very well against 3 separate RefSeq sequences
http://e81.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000196873;r=9:68232003-68300015;t=ENST00000618217
corresponding to CBDW1, CBDW2 and CBDW3
Another transcript, ENST00000377342, aligns against 2 different RefSeq 
sequences, corresponding to CBDW3 and CBDW5
http://e81.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000196873;r=9:68232003-68300015;t=ENST00000377342

As a result, we have not one but 4 different EntrezGene ids for the same 
Ensembl gene.
Note that all these RefSeq sequences are predicted sequences, as noted 
by the XM_ prefix.
This means that we would never use any of those EntrezGene ids to name 
the gene.
However, we still provide the initial mappings as these are our best 
guess as to which RefSeq transcript corresponds to which Ensembl transcript.

We are hoping to improve these mappings by including genomic coordinate 
information for predicted models, as this is already done for the 
curated RefSeq (NM_ like identifiers)
This is unlikely to be available before the end of the year though.

For correct gene naming, we recommend using HGNC identifiers, as these 
are obtained via curated direct mappings from HGNC, who update them 
regularly.


Hope this helps,
Magali

On 03/09/2015 20:04, Ragavendran, Ashok wrote:
> hello,
>     I came upon this while using the Biomart interface. There are 
> errors mapping Ensembl Id to entrezgeneid. The ensembl id maps to the 
> wrong entrez, when I click the entrez link it shows a different 
> ensembl Id. Attached is a screenshot of the results. The Ensembl ID 
> refers to CBWD3, but the entrezgeneId are for CBDW1,CBDW2,CBDW5 and 
> CBDW3. The last result is the correct one, All others are wrong and 
> they actually have different Ensembl IDs, which is what i wanted to 
> retreive.
>
>     Is there something I am missing??
>
> Cheers
>     Ashok
> ====== Text based Results from querying the gene id ENSG00000196873 
> =======
> Ensembl Gene ID    EntrezGene ID
> ENSG00000196873    55871
> ENSG00000196873    150472
> ENSG00000196873    220869
> ENSG00000196873    445571
>
>
> ===== Screenshot of results: May not come through ===
>
>
>
> -- 
> Ashok Ragavendran
> Bioinformatics Specialist
> Center for Human Genetic Research
> Massachusetts General Hospital
> Richard B. Simches Research Center
> 185 Cambridge St, Boston MA 02114
> aragavendran at mgh.harvard.edu
> ph: +1-617-726-1329
>
> The information in this e-mail is intended only for the person to whom 
> it is
> addressed. If you believe this e-mail was sent to you in error and the 
> e-mail
> contains patient information, please contact the Partners Compliance 
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you 
> in error
> but does not contain patient information, please contact the sender 
> and properly
> dispose of the e-mail.
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150904/37a7a9b7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 93160 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150904/37a7a9b7/attachment.png>


More information about the Dev mailing list