[ensembl-dev] Bug?? Error Mapping EnsemblID to entrez id
mag
mr6 at ebi.ac.uk
Fri Sep 4 09:58:23 BST 2015
Hi Ashok,
Mapping between resources is a complicated process which unfortunately
exposes some edge cases like this one.
To map Ensembl genes to EntrezGene ids, there is no direct mapping
available, hence we map via their respective transcripts, Ensembl
transcripts and RefSeq mRNAs.
Where the data is available, we attempt to map based on genomic
coordinates, but when everything else fails, the sequences are aligned.
Only the best hit is kept, but we do allow for mismatches as we know
models can vary between Ensembl and RefSeq, in particular regarding UTR
regions.
In this particular example, the Ensembl transcript ENST00000618217
aligns very well against 3 separate RefSeq sequences
http://e81.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000196873;r=9:68232003-68300015;t=ENST00000618217
corresponding to CBDW1, CBDW2 and CBDW3
Another transcript, ENST00000377342, aligns against 2 different RefSeq
sequences, corresponding to CBDW3 and CBDW5
http://e81.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000196873;r=9:68232003-68300015;t=ENST00000377342
As a result, we have not one but 4 different EntrezGene ids for the same
Ensembl gene.
Note that all these RefSeq sequences are predicted sequences, as noted
by the XM_ prefix.
This means that we would never use any of those EntrezGene ids to name
the gene.
However, we still provide the initial mappings as these are our best
guess as to which RefSeq transcript corresponds to which Ensembl transcript.
We are hoping to improve these mappings by including genomic coordinate
information for predicted models, as this is already done for the
curated RefSeq (NM_ like identifiers)
This is unlikely to be available before the end of the year though.
For correct gene naming, we recommend using HGNC identifiers, as these
are obtained via curated direct mappings from HGNC, who update them
regularly.
Hope this helps,
Magali
On 03/09/2015 20:04, Ragavendran, Ashok wrote:
> hello,
> I came upon this while using the Biomart interface. There are
> errors mapping Ensembl Id to entrezgeneid. The ensembl id maps to the
> wrong entrez, when I click the entrez link it shows a different
> ensembl Id. Attached is a screenshot of the results. The Ensembl ID
> refers to CBWD3, but the entrezgeneId are for CBDW1,CBDW2,CBDW5 and
> CBDW3. The last result is the correct one, All others are wrong and
> they actually have different Ensembl IDs, which is what i wanted to
> retreive.
>
> Is there something I am missing??
>
> Cheers
> Ashok
> ====== Text based Results from querying the gene id ENSG00000196873
> =======
> Ensembl Gene ID EntrezGene ID
> ENSG00000196873 55871
> ENSG00000196873 150472
> ENSG00000196873 220869
> ENSG00000196873 445571
>
>
> ===== Screenshot of results: May not come through ===
>
>
>
> --
> Ashok Ragavendran
> Bioinformatics Specialist
> Center for Human Genetic Research
> Massachusetts General Hospital
> Richard B. Simches Research Center
> 185 Cambridge St, Boston MA 02114
> aragavendran at mgh.harvard.edu
> ph: +1-617-726-1329
>
> The information in this e-mail is intended only for the person to whom
> it is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you
> in error
> but does not contain patient information, please contact the sender
> and properly
> dispose of the e-mail.
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150904/37a7a9b7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 93160 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150904/37a7a9b7/attachment.png>
More information about the Dev
mailing list