[ensembl-dev] [EnsemblGenomes] getting genus name -> for all species in EnsemblGenomes sites
Andy Yates
ayates at ebi.ac.uk
Mon Aug 8 11:19:04 BST 2011
Sure thing. You can get them (say for d.mel).
my $dba = Bio::EnsEMBL::Registry->get_DBAdaptor('drosophila_melanogaster', 'core');
my $division = @{$dba->get_MetaContainer()->list_value_by_key('species.division')};
$dba->dbc()->disconnect_if_idle();
The disconnect if idle section is only really needed since you are working over all species & if the DB connections are not going to automatically disconnect you could risk a denial of service attack on a DB server.
Andy
On 8 Aug 2011, at 11:17, Giuseppe G. wrote:
> Hi Andy, thanks for your reply. I'm not familiar with the concept of meta-tag. Could you provide an example on how to use it?
>
> Thanks a lot
> G
>
> On 08/08/11 10:23, Andy Yates wrote:
>> Hi Giuseppe,
>>
>> It is possible to group the species together using the species.division meta tag from the core databases. This will return entries like EnsemblPlants. All EG databases contain this tag& I would urge you to use it. One word of warning though is to disconnect from the core database once you have retrieved the value to avoid running out of connections.
>>
>> Regards,
>>
>> Andy
>>
>> On 7 Aug 2011, at 14:22, Giuseppe Gallone wrote:
>>
>>> Hi,
>>>
>>> I have some scripts that use the API to query EnsemblGenomes data. I was wondering what is the best way to obtain an up-to-date list of EnsemblGenomes species through the APIs subroutine.
>>>
>>> In other terms, every time I run the scripts I need to know that, e.g., the "Metazoa" database includes the genera
>>>
>>> 'Acyrthosiphon'
>>> 'Aedes'
>>> 'Anopheles'
>>> 'Apis'
>>> 'Caenorhabditis'
>>> 'Culex'
>>> 'Daphnia'
>>> 'Drosophila'
>>> 'Ixodes'
>>> 'Nematostella'
>>> 'Pediculus'
>>> 'Pristionchus'
>>> 'Schistosoma'
>>> 'Pristionchus'
>>> 'Strongylocentrotus'
>>> 'Trichoplax'
>>>
>>> while, e.g., 'fungi' includes
>>>
>>> 'Aspergillus'
>>> 'Fusarium'
>>> 'Gibberella'
>>> 'Nectria'
>>> 'Neosartorya'
>>> 'Neurospora'
>>> 'Puccinia'
>>> 'Saccharomyces'
>>> 'Schizosaccharomyces'
>>> 'Ustilago'
>>>
>>> and so on for all five sites. At the moment, I have these genera hard-coded using hashes linking them to their site name (Fusarium => fungi), but this is of course a sub-par solution as with every release new genera might get added and I'd need to keep the hash current.
>>>
>>> I DID try using the genome adaptor. I call it once for each site, then get the genome names, and trim what's after the underscore the get the genus list. Example:
>>>
>>> my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor('plants', 'compara', 'GenomeDB');
>>> my $all_genome_dbs = $genome_db_adaptor->fetch_all();
>>> foreach my $genome (@{$all_genome_dbs}){
>>> $species_names{$genome->name} = 1;
>>> }
>>> ...etc
>>>
>>> The problem with using the genomedadaptors is that there is a mismatch between the species indicated on the website and what I retrieve from the api. For example, for plants, the website
>>>
>>> http://plants.ensembl.org/info/about/species.html
>>>
>>> reports for V.63:
>>>
>>> 'Arabidopsis'
>>> 'Brachypodium'
>>> 'Oryza'
>>> 'Physcomitrella'
>>> 'Populus'
>>> 'Sorghum'
>>> 'Vitis'
>>> 'Zea'
>>>
>>> but what I get from the api is the following:
>>>
>>> ancestral
>>> arabidopsis
>>> brachypodium
>>> caenorhabditis
>>> ciona
>>> drosophila
>>> homo
>>> oryza
>>> physcomitrella
>>> populus
>>> saccharomyces
>>> sorghum
>>> vitis
>>> zea
>>>
>>> and similarly for the other sites.
>>>
>>> Thanks a lot for your work and for your suggestions about this.
>>>
>>> Best,
>>> Giuseppe
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>
> --
>
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
--
Andrew Yates
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the Dev
mailing list