[ensembl-dev] $NCBI_taxon_adaptor->fetch_node_by_name() broken in EnsemblGenomes 7

Michael Paulini mh6 at sanger.ac.uk
Wed Dec 15 09:38:16 GMT 2010


Hi Giuseppe,

as I just saw that mail by chance uses our C.japonica data:
we currently have frozen any new/inproved annotation on C.japonica, in 
preparation for a new assembly and gene predictions coming from WashU 
that should solve a lot of the problems in the current one.
So you might want to be on the look out for suspicious things 
(misassemblies, heterygosity, wrong/partial gene predictions) when 
working with the current C.japonica.

Michael

(help at wormbase.org)

On 14/12/2010 16:05, Andy Yates wrote:
> Hi Giuseppe,
>
> The most correct way is as you've said to use the scientific name as defined in the taxonomy but it will fail. In your example you've given 'Caenorhabditis japonica DF5081' as your name however the taxonomy knows nothing about this name. In this situation you would have to use just 'Caenorhabditis japonica'.
>
> All I can suggest is that your users should consult the taxonomy for the names to use or you will have to change tack and start to use the alias lookup system from the Registry. Changing the system would mean you could use the names from the Ensembl/EnsemblGenomes websites but this obviously relies on you being able to change your code easily
>
> Regards,
>
> Andy
>
> On 14 Dec 2010, at 15:55, Giuseppe G. wrote:
>
>> Hi Andy, thanks for that. I checked and the data is there now.
>>
>> Thanks for the suggestion related to name/taxon_id mapping.
>>
>> As a general rule for fetch_node_by_name(), what is the most correct way to deal with ensembl genome strains having complex names?
>>
>> eg:
>>
>> $NCBI_taxon_adaptor->fetch_node_by_name('Caenorhabditis japonica DF5081')
>>
>> is this correct?
>>
>>
>> Giuseppe
>>
>>
>> On 14/12/10 13:39, Andy Yates wrote:
>>> Hi Giuseppe,
>>>
>>> It seems that the taxonomy tables were missed out of the load for some unknown reason. I have gone&   populated the servers and am updating the MySQL dumps we provide so for anyone who is providing mirrors of this data you will have to update ncbi_taxa_name and ncbi_taxa_node.
>>>
>>> Your method should now work. However as an aside if you are looking for homologies linked to Plasmodium falciparum please watch out for the naming used in the Taxonomy as well as the taxonomy identifier linked to the genome db. In this situation our falciparum is linked to taxon 36329 which has the name Plasmodium falciparum 3D7. Using Plasmodium falciparum will return back 5833&   will not link to any valid GenomeDB.
>>>
>>> Reagrds,
>>>
>>> Andy
>>>
>>> On 14 Dec 2010, at 13:16, Giuseppe G. wrote:
>>>
>>>> Hi,
>>>>
>>>> There is a problem with $NCBI_taxon_adaptor->fetch_node_by_name() in release 60.
>>>>
>>>>
>>>> STEPS TO REPRODUCE THE BUG
>>>> --------------------------
>>>>
>>>> 1) get a registry by doing
>>>>           $registry->load_registry_from_multiple_dbs(
>>>>           {    #VERTEBRATES
>>>>                   -host       =>   'ensembldb.ensembl.org',
>>>>                   -user       =>   'anonymous',
>>>>                   -verbose    =>   1
>>>>               },
>>>>           {     #EnsemblGenomes
>>>>                      -host    =>   'mysql.ebi.ac.uk',
>>>>                      -user    =>   'anonymous',
>>>>                      -port    =>   4157,
>>>>                      -verbose =>   1
>>>>               }
>>>>           );
>>>>
>>>> -this step is successful
>>>>
>>>> 2) get a taxon adaptor using pan_homology by doing
>>>>
>>>> my $NCBI_taxon_adaptor = $registry->get_adaptor('pan_homology', 'compara', "NCBITaxon");
>>>>
>>>> -this step is successful
>>>>
>>>> 3) get a taxon id from the species name by using fetch_node_by_name()
>>>>
>>>> my $source_taxon = $NCBI_taxon_adaptor->fetch_node_by_name($source_organism);
>>>>
>>>> where for $source_organism I tried 'Plasmodium falciparum', 'plasmodium_falciparum', 'drosophila_melanogaster', ecc ecc
>>>>
>>>> -source_taxon is undefined
>>>> -----------------------------
>>>>
>>>>
>>>> ADDITIONAL INFORMATION
>>>> ----------------------
>>>> -This worked  in release 59/6
>>>> -The method still works when the registry is connecting to vertebrate compara
>>>>
>>>>
>>>> Any help is particularly appreciated because a core feature of my scripts relies on this adaptor and its seamless functionality (as per API docs) across Multi, ensemblgenomes and pan compara.
>>>>
>>>>
>>>> Thanks a lot in advance.
>>>>
>>>> Giuseppe





More information about the Dev mailing list