[ensembl-dev] [EnsemblGenomes] getting genus name -> for all species in EnsemblGenomes sites

Giuseppe Gallone G.Gallone at sms.ed.ac.uk
Sun Aug 7 14:22:55 BST 2011


Hi,

I have some scripts that use the API to query EnsemblGenomes data. I was 
wondering what is the best way to obtain an up-to-date list of 
EnsemblGenomes species through the APIs subroutine.

In other terms, every time I run the scripts I need to know that, e.g., 
the "Metazoa" database includes the genera

'Acyrthosiphon'
'Aedes'
'Anopheles'
'Apis'
'Caenorhabditis'
'Culex'
'Daphnia'
'Drosophila'
'Ixodes'
'Nematostella'
'Pediculus'
'Pristionchus'
'Schistosoma'
'Pristionchus'
'Strongylocentrotus'
'Trichoplax'

while, e.g., 'fungi' includes

'Aspergillus'
'Fusarium'
'Gibberella'
'Nectria'
'Neosartorya'
'Neurospora '
'Puccinia'
'Saccharomyces'
'Schizosaccharomyces'
'Ustilago'

and so on for all five sites. At the moment, I have these genera 
hard-coded using hashes linking them to their site name (Fusarium => 
fungi), but this is of course a sub-par solution as with every release 
new genera might get added and I'd need to keep the hash current.

I DID try using the genome adaptor. I call it once for each site, then 
get the genome names, and trim what's after the underscore the get the 
genus list. Example:

my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor('plants', 
'compara', 'GenomeDB');
my $all_genome_dbs = $genome_db_adaptor->fetch_all();
foreach my $genome (@{$all_genome_dbs}){
      $species_names{$genome->name} = 1;
}
...etc

The problem with using the genomedadaptors is that there is a mismatch 
between the species indicated on the website and what I retrieve from 
the api. For example, for plants, the website

http://plants.ensembl.org/info/about/species.html

reports for V.63:

'Arabidopsis'
'Brachypodium'
'Oryza'
'Physcomitrella'
'Populus'
'Sorghum'
'Vitis'
'Zea'

but what I get from the api is the following:

ancestral
arabidopsis
brachypodium
caenorhabditis
ciona
drosophila
homo
oryza
physcomitrella
populus
saccharomyces
sorghum
vitis
zea

and similarly for the other sites.

Thanks a lot for your work and for your suggestions about this.

Best,
Giuseppe


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




More information about the Dev mailing list