[ensembl-dev] [EnsemblGenomes] getting genus name -> for all species in EnsemblGenomes sites

Mon Aug 8 11:17:47 BST 2011

Hi Andy, thanks for your reply. I'm not familiar with the concept of 
meta-tag. Could you provide an example on how to use it?

Thanks a lot
G

On 08/08/11 10:23, Andy Yates wrote:
> Hi Giuseppe,
>
> It is possible to group the species together using the species.division meta tag from the core databases. This will return entries like EnsemblPlants. All EG databases contain this tag&  I would urge you to use it. One word of warning though is to disconnect from the core database once you have retrieved the value to avoid running out of connections.
>
> Regards,
>
> Andy
>
> On 7 Aug 2011, at 14:22, Giuseppe Gallone wrote:
>
>> Hi,
>>
>> I have some scripts that use the API to query EnsemblGenomes data. I was wondering what is the best way to obtain an up-to-date list of EnsemblGenomes species through the APIs subroutine.
>>
>> In other terms, every time I run the scripts I need to know that, e.g., the "Metazoa" database includes the genera
>>
>> 'Acyrthosiphon'
>> 'Aedes'
>> 'Anopheles'
>> 'Apis'
>> 'Caenorhabditis'
>> 'Culex'
>> 'Daphnia'
>> 'Drosophila'
>> 'Ixodes'
>> 'Nematostella'
>> 'Pediculus'
>> 'Pristionchus'
>> 'Schistosoma'
>> 'Pristionchus'
>> 'Strongylocentrotus'
>> 'Trichoplax'
>>
>> while, e.g., 'fungi' includes
>>
>> 'Aspergillus'
>> 'Fusarium'
>> 'Gibberella'
>> 'Nectria'
>> 'Neosartorya'
>> 'Neurospora'
>> 'Puccinia'
>> 'Saccharomyces'
>> 'Schizosaccharomyces'
>> 'Ustilago'
>>
>> and so on for all five sites. At the moment, I have these genera hard-coded using hashes linking them to their site name (Fusarium =>  fungi), but this is of course a sub-par solution as with every release new genera might get added and I'd need to keep the hash current.
>>
>> I DID try using the genome adaptor. I call it once for each site, then get the genome names, and trim what's after the underscore the get the genus list. Example:
>>
>> my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor('plants', 'compara', 'GenomeDB');
>> my $all_genome_dbs = $genome_db_adaptor->fetch_all();
>> foreach my $genome (@{$all_genome_dbs}){
>>      $species_names{$genome->name} = 1;
>> }
>> ...etc
>>
>> The problem with using the genomedadaptors is that there is a mismatch between the species indicated on the website and what I retrieve from the api. For example, for plants, the website
>>
>> http://plants.ensembl.org/info/about/species.html
>>
>> reports for V.63:
>>
>> 'Arabidopsis'
>> 'Brachypodium'
>> 'Oryza'
>> 'Physcomitrella'
>> 'Populus'
>> 'Sorghum'
>> 'Vitis'
>> 'Zea'
>>
>> but what I get from the api is the following:
>>
>> ancestral
>> arabidopsis
>> brachypodium
>> caenorhabditis
>> ciona
>> drosophila
>> homo
>> oryza
>> physcomitrella
>> populus
>> saccharomyces
>> sorghum
>> vitis
>> zea
>>
>> and similarly for the other sites.
>>
>> Thanks a lot for your work and for your suggestions about this.
>>
>> Best,
>> Giuseppe
>>
>>
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/

-- 

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.