[ensembl-dev] find_leaf_by_name in ensembl Metazoa

Matthieu Muffato muffato at ebi.ac.uk
Tue Dec 18 11:34:44 GMT 2012


In that case, 'EHJ66117' is a protein ID and the gene name would be 
'KGM_15886':
http://metazoa.ensembl.org/Danaus_plexippus/Gene/Summary?g=KGM_15886;r=JH388931:31249-35238;t=EHJ66117

Matthieu

On 18/12/12 11:20, Moretti Sébastien wrote:
> So, here is what I did:
>
> my $member_adaptor = $reg->get_adaptor('metazoa', 'compara', 'Member');
> my $gene_member    =
> $member_adaptor->fetch_by_source_stable_id('ENSEMBLGENE', $seq_name);
> my $peptide_member = $gene_member->get_canonical_Member();
> my $leaf = $tree->find_leaf_by_name($peptide_member->stable_id);
>
>
> It works for some genes that previously failed, but now it stops with
> this new error message:
>      Can't call method "get_canonical_Member" on an undefined value
> This is for gene EHJ66117, always in EMGT00050000000001 gene tree.
>
>
>> Hi Sébastien
>>
>> It works for this gene, but I'm not even sure it would work for all the
>> genes of that species. And the rule may be even different for other
>> species.
>>
>> A safer way is to use API to translate gene IDs to protein IDs. We are
>> actually only interested in the protein used to build the gene tree (the
>> "canonical" protein) and not all the possible translations.
>>
>> my $gene_member =
>> $member_adaptor->fetch_by_source_stable_id('ENSEMBLGENE', 'ADAR011299');
>> my $peptide_member = $gene_member->get_canonical_Member();
>>
>> $tree->find_leaf_by_name($peptide_member->stable_id);
>>
>> This should work for any species, and both on Ensembl and Ensembl
>> Genomes.
>>
>> Matthieu
>>
>> On 18/12/12 10:43, Moretti Sébastien wrote:
>>> Hi Matthieu
>>>
>>> you mean I can do something like
>>>      $tree->find_leaf_by_name($seq_name) ||
>>> $tree->find_leaf_by_name($seq_name.'-PA');
>>>
>>> In other words, adding -PA in all failed cases will fix my problem ?
>>>
>>>> Hi Sébastien
>>>>
>>>> This happens because the names of the gene tree leaves are protein IDs
>>>> and ADAR011299 is a gene ID. In your case, it should work with
>>>> ADAR011299-PA.
>>>> metazoa.ensembl.org/Anopheles_darlingi/Gene/Compara_Tree?db=core;g=ADAR011299
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> For some species, gene IDs and protein IDs are often identical, which
>>>> can be quite confusing.
>>>>
>>>> Best regards,
>>>> Matthieu
>>>>
>>>> On 18/12/12 10:12, Moretti Sébastien wrote:
>>>>> Hi
>>>>>
>>>>> with ensembl API 69, or previous APIs, I used to get a leaf object
>>>>> with
>>>>> this function:
>>>>>      my $leaf = $tree->find_leaf_by_name($seq_name);
>>>>>      print $leaf->node_id;
>>>>>
>>>>> I've never got problems with ensembl vertebrate data.
>>>>>
>>>>>
>>>>> I tried the same script with Ensembl Metazoa and got this error
>>>>> message:
>>>>>      Can't call method "node_id" on an undefined value
>>>>> or
>>>>>      Use of uninitialized value $leaf in concatenation
>>>>> $leaf appears to be undefined, find_leaf_by_name returns undef.
>>>>>
>>>>>
>>>>> Do you have an explanation for this ?
>>>>> Regards
>>>>>
>>>>> e.g.
>>>>>      my $leaf = $tree->find_leaf_by_name('ADAR011299');
>>>>>      in EMGT00050000000001 gene family
>


-- 
Matthieu Muffato, Ph.D.
Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom




More information about the Dev mailing list