[ensembl-dev] [EnsemblGenomes] getting genus name -> for all species in EnsemblGenomes sites

Andy Yates ayates at ebi.ac.uk
Mon Aug 8 11:19:04 BST 2011


Sure thing. You can get them (say for d.mel).

my $dba = Bio::EnsEMBL::Registry->get_DBAdaptor('drosophila_melanogaster', 'core');
my $division = @{$dba->get_MetaContainer()->list_value_by_key('species.division')};
$dba->dbc()->disconnect_if_idle();

The disconnect if idle section is only really needed since you are working over all species & if the DB connections are not going to automatically disconnect you could risk a denial of service attack on a DB server.

Andy

On 8 Aug 2011, at 11:17, Giuseppe G. wrote:

> Hi Andy, thanks for your reply. I'm not familiar with the concept of meta-tag. Could you provide an example on how to use it?
> 
> Thanks a lot
> G
> 
> On 08/08/11 10:23, Andy Yates wrote:
>> Hi Giuseppe,
>> 
>> It is possible to group the species together using the species.division meta tag from the core databases. This will return entries like EnsemblPlants. All EG databases contain this tag&  I would urge you to use it. One word of warning though is to disconnect from the core database once you have retrieved the value to avoid running out of connections.
>> 
>> Regards,
>> 
>> Andy
>> 
>> On 7 Aug 2011, at 14:22, Giuseppe Gallone wrote:
>> 
>>> Hi,
>>> 
>>> I have some scripts that use the API to query EnsemblGenomes data. I was wondering what is the best way to obtain an up-to-date list of EnsemblGenomes species through the APIs subroutine.
>>> 
>>> In other terms, every time I run the scripts I need to know that, e.g., the "Metazoa" database includes the genera
>>> 
>>> 'Acyrthosiphon'
>>> 'Aedes'
>>> 'Anopheles'
>>> 'Apis'
>>> 'Caenorhabditis'
>>> 'Culex'
>>> 'Daphnia'
>>> 'Drosophila'
>>> 'Ixodes'
>>> 'Nematostella'
>>> 'Pediculus'
>>> 'Pristionchus'
>>> 'Schistosoma'
>>> 'Pristionchus'
>>> 'Strongylocentrotus'
>>> 'Trichoplax'
>>> 
>>> while, e.g., 'fungi' includes
>>> 
>>> 'Aspergillus'
>>> 'Fusarium'
>>> 'Gibberella'
>>> 'Nectria'
>>> 'Neosartorya'
>>> 'Neurospora'
>>> 'Puccinia'
>>> 'Saccharomyces'
>>> 'Schizosaccharomyces'
>>> 'Ustilago'
>>> 
>>> and so on for all five sites. At the moment, I have these genera hard-coded using hashes linking them to their site name (Fusarium =>  fungi), but this is of course a sub-par solution as with every release new genera might get added and I'd need to keep the hash current.
>>> 
>>> I DID try using the genome adaptor. I call it once for each site, then get the genome names, and trim what's after the underscore the get the genus list. Example:
>>> 
>>> my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor('plants', 'compara', 'GenomeDB');
>>> my $all_genome_dbs = $genome_db_adaptor->fetch_all();
>>> foreach my $genome (@{$all_genome_dbs}){
>>>     $species_names{$genome->name} = 1;
>>> }
>>> ...etc
>>> 
>>> The problem with using the genomedadaptors is that there is a mismatch between the species indicated on the website and what I retrieve from the api. For example, for plants, the website
>>> 
>>> http://plants.ensembl.org/info/about/species.html
>>> 
>>> reports for V.63:
>>> 
>>> 'Arabidopsis'
>>> 'Brachypodium'
>>> 'Oryza'
>>> 'Physcomitrella'
>>> 'Populus'
>>> 'Sorghum'
>>> 'Vitis'
>>> 'Zea'
>>> 
>>> but what I get from the api is the following:
>>> 
>>> ancestral
>>> arabidopsis
>>> brachypodium
>>> caenorhabditis
>>> ciona
>>> drosophila
>>> homo
>>> oryza
>>> physcomitrella
>>> populus
>>> saccharomyces
>>> sorghum
>>> vitis
>>> zea
>>> 
>>> and similarly for the other sites.
>>> 
>>> Thanks a lot for your work and for your suggestions about this.
>>> 
>>> Best,
>>> Giuseppe
>>> 
>>> 
>>> -- 
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
> 
> -- 
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Andrew Yates                   
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/









More information about the Dev mailing list