[ensembl-dev] EmsemblID versus EntrezID issue for Bombus impatiens (a bumble bee)

James Allen jallen at ebi.ac.uk
Mon Nov 20 16:58:50 GMT 2017

The assembly in Ensembl Metazoa is BIMP_2.0 [1]. Note that no genes were submitted alongside this assembly. NCBI have generated a RefSeq dataset, which is what appears as the "associated GFF" at NCBI; but this is _not_ the gene set that we have in Ensembl Metazoa. We have the OGSv1.0 gene set from BeeBase; it's confusing labelled at BeeBase, but it is on the same BIMP_2.0 assembly that is in INSDC.

So, taking "Entrez ID" to be a synonym for "RefSeq Gene ID", there is no simple one-to-one mapping to "Ensembl IDs" (which term is synonymous with "BeeBase IDs"), because these are two different data sets. One can do mappings between the two, however. I'm not sure why there are no cross-references at the mRNA level (I'll look into it, but that'll take a while), but there are protein-level cross-references, based on BLAST alignments. 20,646 (out of 20,895) distinct RefSeq protein IDs map to 10,847 (out of 15,896) Ensembl/BeeBase protein IDs. These can be retrieved with BioMart [2]; the results can be combined with RefSeq gene-to-protein mappings [3] if necessary, to link the RefSeq Protein IDs from BioMart to the RefSeq Gene IDs.


1. https://www.ebi.ac.uk/ena/data/view/GCA_000188095.2
2. http://metazoa.ensembl.org/biomart/martview?VIRTUALSCHEMANAME=metazoa_mart&ATTRIBUTES=bimpatiens_eg_gene.default.feature_page.ensembl_gene_id|bimpatiens_eg_gene.default.feature_page.ensembl_transcript_id|bimpatiens_eg_gene.default.feature_page.ensembl_peptide_id|bimpatiens_eg_gene.default.feature_page.refseq_peptide&FILTERS=bimpatiens_eg_gene.default.filters.with_refseq_peptide.only&VISIBLEPANEL=resultspanel
3. https://www.ncbi.nlm.nih.gov/genome/proteins/3415?genome_assembly_id=34508

On Wed, 15 Nov 2017 21:05:46 +0000
"Pimsler, Meaghan" <mlpimsler at ua.edu> wrote:

> Hello-
> I am hoping someone can help me resolve this issue.
> Specifically, I am looking to find a way to cross-reference EnsemblID
> numbers with EntrezID numbers, and thus far am having a very
> difficult time; only 13 of the EntrezID’s have an association with
> and EnsemblID for the species Bombus impatiens.
> I am working on a transcriptome project using the Bombus impatiens
> genome, a metazoan Hymenoptera
> (http://metazoa.ensembl.org/Bombus_impatiens/Info/Annotation/#genebuild).
> I have done all of my alignments and and count-calling using the
> genome version available for download from NCBI
> (https://www.ncbi.nlm.nih.gov/genome/3415?genome_assembly_id=34508,
> specifically the Bombus impatiens 2.0 genome assembly and associated
> GFF.) This genome assembly is (or should be) the same version
> available at the above Ensembl link.
> The Ensemble version says that the BIMP_2.0 assembly used came from
> BeeBase. However, the only version of Bombus impatiens genome
> available on BeeBase is v1.0.
> Any assistance would be greatly appreciated.
