[ensembl-dev] Bacteria Collections

Dan Staines dstaines at ebi.ac.uk
Sat Sep 7 09:01:39 BST 2013


On 09/07/2013 01:52 AM, Alexander Pico wrote:
> If I understand correctly, after release 16, bacteria collections are
> now grouped into numbered sets. So, where I used to be able to download
> the mysql database for */escherichia coli_str_k_12_substr_mg1655
> /*directly, now I have to download /bacteria_22_collection_core/, which
> contains records for hundreds of organisms.  Is that right?

In fact, bacteria have always been grouped into collections of genomes 
in collection databases arranged around genus - E. coli K12 was part of 
the escherichia_shigella_collection database, which contained 46 
genomes. With the massive expansion of the bacteria dataset (over 9000 
in the upcoming EG20 release), the most pragmatic way to organise these 
is the use of numbered collections up to 250 genomes (as opposed to up 
to 70 in the old system). However, individual assemblies should remain 
in the same collection provided their INSDC assembly accession remains 
the same.

 > Is there any way to just download a database for a single bacterial 
species?

No, there is not. However, individual genomes can be downloaded in a 
variety of formats including GFF, and the ensemblgenomes API can be used 
to easily address single genomes from the larger collection.

Dan.

-- 
Dan Staines, PhD
Technical Coordinator, Ensembl Genomes
European Bioinformatics Institute (EMBL-EBI)
http://www.ebi.ac.uk/
http://www.ensemblgenomes.org/




More information about the Dev mailing list