[ensembl-dev] [EnsemblGenomes] getting genus name -> for all species in EnsemblGenomes sites

Andy Yates ayates at ebi.ac.uk
Mon Aug 8 10:23:39 BST 2011


Hi Giuseppe,

It is possible to group the species together using the species.division meta tag from the core databases. This will return entries like EnsemblPlants. All EG databases contain this tag & I would urge you to use it. One word of warning though is to disconnect from the core database once you have retrieved the value to avoid running out of connections.

Regards,

Andy

On 7 Aug 2011, at 14:22, Giuseppe Gallone wrote:

> Hi,
> 
> I have some scripts that use the API to query EnsemblGenomes data. I was wondering what is the best way to obtain an up-to-date list of EnsemblGenomes species through the APIs subroutine.
> 
> In other terms, every time I run the scripts I need to know that, e.g., the "Metazoa" database includes the genera
> 
> 'Acyrthosiphon'
> 'Aedes'
> 'Anopheles'
> 'Apis'
> 'Caenorhabditis'
> 'Culex'
> 'Daphnia'
> 'Drosophila'
> 'Ixodes'
> 'Nematostella'
> 'Pediculus'
> 'Pristionchus'
> 'Schistosoma'
> 'Pristionchus'
> 'Strongylocentrotus'
> 'Trichoplax'
> 
> while, e.g., 'fungi' includes
> 
> 'Aspergillus'
> 'Fusarium'
> 'Gibberella'
> 'Nectria'
> 'Neosartorya'
> 'Neurospora '
> 'Puccinia'
> 'Saccharomyces'
> 'Schizosaccharomyces'
> 'Ustilago'
> 
> and so on for all five sites. At the moment, I have these genera hard-coded using hashes linking them to their site name (Fusarium => fungi), but this is of course a sub-par solution as with every release new genera might get added and I'd need to keep the hash current.
> 
> I DID try using the genome adaptor. I call it once for each site, then get the genome names, and trim what's after the underscore the get the genus list. Example:
> 
> my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor('plants', 'compara', 'GenomeDB');
> my $all_genome_dbs = $genome_db_adaptor->fetch_all();
> foreach my $genome (@{$all_genome_dbs}){
>     $species_names{$genome->name} = 1;
> }
> ...etc
> 
> The problem with using the genomedadaptors is that there is a mismatch between the species indicated on the website and what I retrieve from the api. For example, for plants, the website
> 
> http://plants.ensembl.org/info/about/species.html
> 
> reports for V.63:
> 
> 'Arabidopsis'
> 'Brachypodium'
> 'Oryza'
> 'Physcomitrella'
> 'Populus'
> 'Sorghum'
> 'Vitis'
> 'Zea'
> 
> but what I get from the api is the following:
> 
> ancestral
> arabidopsis
> brachypodium
> caenorhabditis
> ciona
> drosophila
> homo
> oryza
> physcomitrella
> populus
> saccharomyces
> sorghum
> vitis
> zea
> 
> and similarly for the other sites.
> 
> Thanks a lot for your work and for your suggestions about this.
> 
> Best,
> Giuseppe
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Andrew Yates                   
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/









More information about the Dev mailing list