[ensembl-dev] changes to organisation of bacterial collections in EnsemblGenomes

Dan Staines dstaines at ebi.ac.uk
Tue Feb 12 10:24:47 GMT 2013


Hi Trevor,

> Could you please provide some details to help me out?
>
> ·Is this change to collection organization finalized?
> ·Is the distribution of species to collections arbitrary?
> ·Will the distribution of particular species to particular collections
> change with each release?

There is no taxonomic basis for assigning genomes to collections 
(something we considered but the distribution doesn't lend itself to 
this) so the order is arbitrary as far as external users are concerned. 
We aim to add new genomes to the last collection (creating a new 
collection database once 250 genomes have been reached), and use the 
same collection name and species ID for existing genomes when reloading 
(leaving gaps in earlier collections when existing genomes are no longer 
available). However, we make no guarantee of this order or assignment 
and you should not be relying on collection names in your code. Whilst 
the species.production_name is usually stable, true continuity between 
genomes can only be guaranteed via the assembly.accession, which 
uniquely identifies a given version of an assembly for a given genome in 
the INSDC Genome Assembly database (past experience suggests names are 
not always stable and taxon IDs are not always unique).

> ·Will homologies for bacterial genes (proteins) no longer be curated in
> neither the ‘ensembl_compara_bacteria’ nor the
> ‘ensembl_compara_pan_homology’ databases?

We will not provide a bacterial homology, but >100 bacterial genomes are 
present in pan compara (selection is based on a number of criteria, 
namely presence in previous versions of pan compara, presence in UniProt 
reference proteome sets, and level of literature citation). We do 
however provide a family-based compara for all bacteria, where proteins 
are grouped into families based on their PANTHER or HAMAP classification 
(this is all documented on bacteria.ensembl.org)

Hope this helps,

Dan.

-- 
Dan Staines, PhD               Ensembl Genomes Technical Coordinator
EMBL-EBI                       Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/




More information about the Dev mailing list