[ensembl-dev] changes to organisation of bacterial collections in EnsemblGenomes
Dan Staines
dstaines at ebi.ac.uk
Tue Feb 12 10:24:47 GMT 2013
Hi Trevor,
> Could you please provide some details to help me out?
>
> ·Is this change to collection organization finalized?
> ·Is the distribution of species to collections arbitrary?
> ·Will the distribution of particular species to particular collections
> change with each release?
There is no taxonomic basis for assigning genomes to collections
(something we considered but the distribution doesn't lend itself to
this) so the order is arbitrary as far as external users are concerned.
We aim to add new genomes to the last collection (creating a new
collection database once 250 genomes have been reached), and use the
same collection name and species ID for existing genomes when reloading
(leaving gaps in earlier collections when existing genomes are no longer
available). However, we make no guarantee of this order or assignment
and you should not be relying on collection names in your code. Whilst
the species.production_name is usually stable, true continuity between
genomes can only be guaranteed via the assembly.accession, which
uniquely identifies a given version of an assembly for a given genome in
the INSDC Genome Assembly database (past experience suggests names are
not always stable and taxon IDs are not always unique).
> ·Will homologies for bacterial genes (proteins) no longer be curated in
> neither the ‘ensembl_compara_bacteria’ nor the
> ‘ensembl_compara_pan_homology’ databases?
We will not provide a bacterial homology, but >100 bacterial genomes are
present in pan compara (selection is based on a number of criteria,
namely presence in previous versions of pan compara, presence in UniProt
reference proteome sets, and level of literature citation). We do
however provide a family-based compara for all bacteria, where proteins
are grouped into families based on their PANTHER or HAMAP classification
(this is all documented on bacteria.ensembl.org)
Hope this helps,
Dan.
--
Dan Staines, PhD Ensembl Genomes Technical Coordinator
EMBL-EBI Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the Dev
mailing list