[ensembl-dev] Bacteria Collections
Dan Staines
dstaines at ebi.ac.uk
Sat Sep 7 09:01:39 BST 2013
On 09/07/2013 01:52 AM, Alexander Pico wrote:
> If I understand correctly, after release 16, bacteria collections are
> now grouped into numbered sets. So, where I used to be able to download
> the mysql database for */escherichia coli_str_k_12_substr_mg1655
> /*directly, now I have to download /bacteria_22_collection_core/, which
> contains records for hundreds of organisms. Is that right?
In fact, bacteria have always been grouped into collections of genomes
in collection databases arranged around genus - E. coli K12 was part of
the escherichia_shigella_collection database, which contained 46
genomes. With the massive expansion of the bacteria dataset (over 9000
in the upcoming EG20 release), the most pragmatic way to organise these
is the use of numbered collections up to 250 genomes (as opposed to up
to 70 in the old system). However, individual assemblies should remain
in the same collection provided their INSDC assembly accession remains
the same.
> Is there any way to just download a database for a single bacterial
species?
No, there is not. However, individual genomes can be downloaded in a
variety of formats including GFF, and the ensemblgenomes API can be used
to easily address single genomes from the larger collection.
Dan.
--
Dan Staines, PhD
Technical Coordinator, Ensembl Genomes
European Bioinformatics Institute (EMBL-EBI)
http://www.ebi.ac.uk/
http://www.ensemblgenomes.org/
More information about the Dev
mailing list