[ensembl-dev] [EnsemblGenomes] getting genus name -> for all species in EnsemblGenomes sites

Albert Vilella avilella at ebi.ac.uk
Sun Aug 7 18:23:04 BST 2011


If you ask for all the genome_db objects, and for each of them you ask
for the taxon, you will have the unique taxon id and all the taxonomic
information for them, going from the binomial species name up to the
root of living organisms:

my $taxon = $genome_db->taxon;

This will work for all genomes in ensemblgenomes except for some
bacteria, where multiple genomes will have unique genome_db_ids but will
have the same taxon_id, and only differ in their strain name.

Hope it helps,

Cheers,

Albert.

On Sun, 2011-08-07 at 14:22 +0100, Giuseppe Gallone wrote:
> Hi,
> 
> I have some scripts that use the API to query EnsemblGenomes data. I was 
> wondering what is the best way to obtain an up-to-date list of 
> EnsemblGenomes species through the APIs subroutine.
> 
> In other terms, every time I run the scripts I need to know that, e.g., 
> the "Metazoa" database includes the genera
> 
> 'Acyrthosiphon'
> 'Aedes'
> 'Anopheles'
> 'Apis'
> 'Caenorhabditis'
> 'Culex'
> 'Daphnia'
> 'Drosophila'
> 'Ixodes'
> 'Nematostella'
> 'Pediculus'
> 'Pristionchus'
> 'Schistosoma'
> 'Pristionchus'
> 'Strongylocentrotus'
> 'Trichoplax'
> 
> while, e.g., 'fungi' includes
> 
> 'Aspergillus'
> 'Fusarium'
> 'Gibberella'
> 'Nectria'
> 'Neosartorya'
> 'Neurospora '
> 'Puccinia'
> 'Saccharomyces'
> 'Schizosaccharomyces'
> 'Ustilago'
> 
> and so on for all five sites. At the moment, I have these genera 
> hard-coded using hashes linking them to their site name (Fusarium => 
> fungi), but this is of course a sub-par solution as with every release 
> new genera might get added and I'd need to keep the hash current.
> 
> I DID try using the genome adaptor. I call it once for each site, then 
> get the genome names, and trim what's after the underscore the get the 
> genus list. Example:
> 
> my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor('plants', 
> 'compara', 'GenomeDB');
> my $all_genome_dbs = $genome_db_adaptor->fetch_all();
> foreach my $genome (@{$all_genome_dbs}){
>       $species_names{$genome->name} = 1;
> }
> ...etc
> 
> The problem with using the genomedadaptors is that there is a mismatch 
> between the species indicated on the website and what I retrieve from 
> the api. For example, for plants, the website
> 
> http://plants.ensembl.org/info/about/species.html
> 
> reports for V.63:
> 
> 'Arabidopsis'
> 'Brachypodium'
> 'Oryza'
> 'Physcomitrella'
> 'Populus'
> 'Sorghum'
> 'Vitis'
> 'Zea'
> 
> but what I get from the api is the following:
> 
> ancestral
> arabidopsis
> brachypodium
> caenorhabditis
> ciona
> drosophila
> homo
> oryza
> physcomitrella
> populus
> saccharomyces
> sorghum
> vitis
> zea
> 
> and similarly for the other sites.
> 
> Thanks a lot for your work and for your suggestions about this.
> 
> Best,
> Giuseppe
> 
> 






More information about the Dev mailing list