[ensembl-dev] "stable" ids of genes

Dan Staines dstaines at ebi.ac.uk
Thu Feb 21 11:24:11 GMT 2013


Hi Trevor,

Yes, this is a one-off change specific to the change in bacteria. 
Previously, since we modified a small number of gene models in the 
bacterial genomes we provided, we maintained our own stable IDs. We now 
use the gene models directly from the INSDC record, so this is no longer 
done and we're taking the same approach to stable IDs as for the 
majority of Ensembl Genomes species, which is to where possible use 
externally maintained identifiers which we don't track separately ourselves.

In the case of bacteria, the source for our stable IDs is the INSDC 
record itself, using the locus_tag for the gene, and the protein_id for 
the transcript/translation. Having said that, there are a small number 
of genes from older bacterial genomes where the locus_tag is missing or 
duplicated. In these cases, we use the protein_id if we can, or in 
exceptional cases we assign our own identifiers (this affects 14,780 
genes from 20,965,827). These are stable providing the original gene or 
CDS feature does not change its location as they are based on a key 
derived from INSDC feature type and location, but we won't do any other 
mapping of these.

Lots of information here:
http://bacteria.ensembl.org/info/building/building_ensembl_bacteria.html#identifiers
but please ask if anything is unclear.

Lastly, we will look at providing mapping tables from old to new 
identifiers as a one-off.

Hope this helps,

Dan.

-- 
Dan Staines, PhD               Ensembl Genomes Technical Coordinator
EMBL-EBI                       Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/




More information about the Dev mailing list