[ensembl-dev] ids from older versions of Ensembl.
Dan Staines
dstaines at ebi.ac.uk
Sat Aug 17 11:04:03 BST 2013
As is always the case, things are slightly different with Ensembl
Bacteria ;-)
As a general rule, Ensembl Genomes as a project does not do genebuilds,
but instead uses gene models from third parties. This means we use
identifiers assigned by those third parties rather than assigning our
own as Ensembl do.
The exception to this was the first iteration of Ensembl Bacteria, where
we sometimes made slight modifications to gene models from INSDC using
additional curation from UniProt. We took the decision at this point to
assign identifiers ourselves, with the consequent need to map those
identifiers between releases.
Moving to the much expanded second iteration of Ensembl Bacteria, the
general Ensembl Genomes strategy of using third party identifiers was
adopted, so we use locus_tag and protein_id identifiers from INSDC
(where available) as stable IDs. For the legacy identifiers for the
200-odd genomes from the first iteration, we provide mappings based on
protein_id identifiers where we can. Pairs of genes mapped in this way
are generally identical in sequence, though as I mentioned some models
in the first version are modified based on UniProt curation so you might
want to check sequence if that is important for your purposes.
As a slight wrinkle - there are a very small number of genes in the new
Ensembl Bacteria that come from historical records for which INSDC does
not currently provide a suitable unique identifier (these are usually
but not exclusively ncRNA genes). For these, we do still assign our own
identifiers, based on the underlying feature coordinates within the
record which are used to ensure the same identifier is always used if
the feature does not change. Given the small number of genes involved,
we don't provide any mapping beyond this, also since any update by INSDC
is likely to involve a correction to provide identifiers.
Hope this explains things a little more.
Dan.
--
Dan Staines, PhD
Technical Coordinator, Ensembl Genomes
European Bioinformatics Institute (EMBL-EBI)
http://www.ebi.ac.uk/
http://www.ensemblgenomes.org/
More information about the Dev
mailing list