[ensembl-dev] find_leaf_by_name in ensembl Metazoa

Moretti Sébastien sebastien.moretti at unil.ch
Tue Dec 18 13:24:09 GMT 2012


I get $seq_name from the protein alignment fasta headers, coming from 
the tree object.
So, I should always get protein id from there.

It looks the external program I use to filter alignments may change 
fasta headers when they contain characters like : or -
I will send a bug report there.


I changed my code for:

my $leaf = $tree->find_leaf_by_name($seq_name);
if ( ! defined $leaf ){
     my $member_adaptor = $reg->get_adaptor('metazoa', 'compara', 'Member');
     my $gene_member    = 
$member_adaptor->fetch_by_source_stable_id('ENSEMBLGENE', $seq_name);
     my $peptide_member = $gene_member->get_canonical_Member();
     $leaf = $tree->find_leaf_by_name($peptide_member->stable_id);
}

My code is more robust now, thanks

> In that case, 'EHJ66117' is a protein ID and the gene name would be
> 'KGM_15886':
> http://metazoa.ensembl.org/Danaus_plexippus/Gene/Summary?g=KGM_15886;r=JH388931:31249-35238;t=EHJ66117
>
>
> Matthieu
>
> On 18/12/12 11:20, Moretti Sébastien wrote:
>> So, here is what I did:
>>
>> my $member_adaptor = $reg->get_adaptor('metazoa', 'compara', 'Member');
>> my $gene_member    =
>> $member_adaptor->fetch_by_source_stable_id('ENSEMBLGENE', $seq_name);
>> my $peptide_member = $gene_member->get_canonical_Member();
>> my $leaf = $tree->find_leaf_by_name($peptide_member->stable_id);
>>
>>
>> It works for some genes that previously failed, but now it stops with
>> this new error message:
>>      Can't call method "get_canonical_Member" on an undefined value
>> This is for gene EHJ66117, always in EMGT00050000000001 gene tree.
>>
>>
>>> Hi Sébastien
>>>
>>> It works for this gene, but I'm not even sure it would work for all the
>>> genes of that species. And the rule may be even different for other
>>> species.
>>>
>>> A safer way is to use API to translate gene IDs to protein IDs. We are
>>> actually only interested in the protein used to build the gene tree (the
>>> "canonical" protein) and not all the possible translations.
>>>
>>> my $gene_member =
>>> $member_adaptor->fetch_by_source_stable_id('ENSEMBLGENE', 'ADAR011299');
>>> my $peptide_member = $gene_member->get_canonical_Member();
>>>
>>> $tree->find_leaf_by_name($peptide_member->stable_id);
>>>
>>> This should work for any species, and both on Ensembl and Ensembl
>>> Genomes.
>>>
>>> Matthieu
>>>
>>> On 18/12/12 10:43, Moretti Sébastien wrote:
>>>> Hi Matthieu
>>>>
>>>> you mean I can do something like
>>>>      $tree->find_leaf_by_name($seq_name) ||
>>>> $tree->find_leaf_by_name($seq_name.'-PA');
>>>>
>>>> In other words, adding -PA in all failed cases will fix my problem ?
>>>>
>>>>> Hi Sébastien
>>>>>
>>>>> This happens because the names of the gene tree leaves are protein IDs
>>>>> and ADAR011299 is a gene ID. In your case, it should work with
>>>>> ADAR011299-PA.
>>>>> metazoa.ensembl.org/Anopheles_darlingi/Gene/Compara_Tree?db=core;g=ADAR011299
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> For some species, gene IDs and protein IDs are often identical, which
>>>>> can be quite confusing.
>>>>>
>>>>> Best regards,
>>>>> Matthieu
>>>>>
>>>>> On 18/12/12 10:12, Moretti Sébastien wrote:
>>>>>> Hi
>>>>>>
>>>>>> with ensembl API 69, or previous APIs, I used to get a leaf object
>>>>>> with
>>>>>> this function:
>>>>>>      my $leaf = $tree->find_leaf_by_name($seq_name);
>>>>>>      print $leaf->node_id;
>>>>>>
>>>>>> I've never got problems with ensembl vertebrate data.
>>>>>>
>>>>>>
>>>>>> I tried the same script with Ensembl Metazoa and got this error
>>>>>> message:
>>>>>>      Can't call method "node_id" on an undefined value
>>>>>> or
>>>>>>      Use of uninitialized value $leaf in concatenation
>>>>>> $leaf appears to be undefined, find_leaf_by_name returns undef.
>>>>>>
>>>>>>
>>>>>> Do you have an explanation for this ?
>>>>>> Regards
>>>>>>
>>>>>> e.g.
>>>>>>      my $leaf = $tree->find_leaf_by_name('ADAR011299');
>>>>>>      in EMGT00050000000001 gene family
-- 
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4079
http://selectome.unil.ch/ http://bgee.unil.ch/




More information about the Dev mailing list