[ensembl-dev] [EnsemblGenomes] getting genus name -> for all species in EnsemblGenomes sites

Giuseppe G. G.Gallone at sms.ed.ac.uk
Mon Aug 8 17:32:27 BST 2011


Hi Albert,

talking about this: It seems like I cannot get genus/species strings 
from a taxon object, for some species.

For instance, I have a NCBItaxon object for species 'Saccharomyces 
cerevisiae S288c' (NCBI ID: 559292). This information is actually taken 
from the object itself (through debug inspection) so it's there.

I tried the following methods:

my $NCBI_name_binomial   = $NCBItaxon->binomial;
my $NCBI_name_common     = $NCBItaxon->common_name;
my $NCBI_name_short      = $NCBItaxon->short_name;
my $NCBI_name_alias      = $NCBItaxon->ensembl_alias;
my $NCBI_name_alias_name = $NCBItaxon->ensembl_alias_name;
my $NCBI_name_genus      = $NCBItaxon->genus;
my $NCBI_name_species    = $NCBItaxon->species;

They all return 'undef' apart from short_name(), which returns 
'ScerS288c'. What is going on? Am I doing something wrong?

It would be really great to get genus/name string from the taxon object. 
I need to feed it to a gene adaptor object

my $gene_adaptor = $registry->get_adaptor($NCBItaxon_name, "core", "Gene");

which won't work with the short name, ie.

my $gene_adaptor = $registry->get_adaptor('ScerS288c', "core", "Gene");

returns undef.
Thanks a lot
G

On 07/08/11 18:23, Albert Vilella wrote:
> If you ask for all the genome_db objects, and for each of them you ask
> for the taxon, you will have the unique taxon id and all the taxonomic
> information for them, going from the binomial species name up to the
> root of living organisms:
>
> my $taxon = $genome_db->taxon;
>
> This will work for all genomes in ensemblgenomes except for some
> bacteria, where multiple genomes will have unique genome_db_ids but will
> have the same taxon_id, and only differ in their strain name.
>
> Hope it helps,
>
> Cheers,
>
> Albert.
>
> On Sun, 2011-08-07 at 14:22 +0100, Giuseppe Gallone wrote:
>> Hi,
>>
>> I have some scripts that use the API to query EnsemblGenomes data. I was
>> wondering what is the best way to obtain an up-to-date list of
>> EnsemblGenomes species through the APIs subroutine.
>>
>> In other terms, every time I run the scripts I need to know that, e.g.,
>> the "Metazoa" database includes the genera
>>
>> 'Acyrthosiphon'
>> 'Aedes'
>> 'Anopheles'
>> 'Apis'
>> 'Caenorhabditis'
>> 'Culex'
>> 'Daphnia'
>> 'Drosophila'
>> 'Ixodes'
>> 'Nematostella'
>> 'Pediculus'
>> 'Pristionchus'
>> 'Schistosoma'
>> 'Pristionchus'
>> 'Strongylocentrotus'
>> 'Trichoplax'
>>
>> while, e.g., 'fungi' includes
>>
>> 'Aspergillus'
>> 'Fusarium'
>> 'Gibberella'
>> 'Nectria'
>> 'Neosartorya'
>> 'Neurospora'
>> 'Puccinia'
>> 'Saccharomyces'
>> 'Schizosaccharomyces'
>> 'Ustilago'
>>
>> and so on for all five sites. At the moment, I have these genera
>> hard-coded using hashes linking them to their site name (Fusarium =>
>> fungi), but this is of course a sub-par solution as with every release
>> new genera might get added and I'd need to keep the hash current.
>>
>> I DID try using the genome adaptor. I call it once for each site, then
>> get the genome names, and trim what's after the underscore the get the
>> genus list. Example:
>>
>> my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor('plants',
>> 'compara', 'GenomeDB');
>> my $all_genome_dbs = $genome_db_adaptor->fetch_all();
>> foreach my $genome (@{$all_genome_dbs}){
>>        $species_names{$genome->name} = 1;
>> }
>> ...etc
>>
>> The problem with using the genomedadaptors is that there is a mismatch
>> between the species indicated on the website and what I retrieve from
>> the api. For example, for plants, the website
>>
>> http://plants.ensembl.org/info/about/species.html
>>
>> reports for V.63:
>>
>> 'Arabidopsis'
>> 'Brachypodium'
>> 'Oryza'
>> 'Physcomitrella'
>> 'Populus'
>> 'Sorghum'
>> 'Vitis'
>> 'Zea'
>>
>> but what I get from the api is the following:
>>
>> ancestral
>> arabidopsis
>> brachypodium
>> caenorhabditis
>> ciona
>> drosophila
>> homo
>> oryza
>> physcomitrella
>> populus
>> saccharomyces
>> sorghum
>> vitis
>> zea
>>
>> and similarly for the other sites.
>>
>> Thanks a lot for your work and for your suggestions about this.
>>
>> Best,
>> Giuseppe
>>
>>
>
>
>

-- 

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.




More information about the Dev mailing list