[ensembl-dev] protein tree id and tagvalue

Matthieu Muffato muffato at ebi.ac.uk
Wed Jul 4 09:29:02 BST 2012

On 04/07/12 07:55, Moretti Sébastien wrote:
>> Hi !
>> On 03/07/12 16:04, Moretti Sébastien wrote:
>>>> Hi Sébastien
>>> Hi
>>>> The fetch_node_by_node_id(1) trick might not work any more as (1) we
>>>> are
>>>> now storing tree of trees (called "super trees" in Compara), (2) we
>>>> have
>>>> merged the API and the SQL tables of protein and ncRNA trees.
>>>> If you want all the protein trees as they were in the previous release,
>>>> please use fetch_all(-tree_type => 'tree', -member_type => 'protein')
>>>> from the GeneTreeAdaptor. This will return an arrayref of GeneTree
>>>> objects. Each tree has a stable_id and other general properties / tags.
>>>> Then, by calling "->root()" on a tree, you will jump to its root node,
>>>> on which you can call children() recursively.
>>> Do I have also to change the adaptor like this ?
>>> my $protein_tree_adaptor = $reg->get_adaptor('Multi', 'compara',
>>> 'ProteinTree');
>>> becomes
>>> my $protein_tree_adaptor = $reg->get_adaptor('Multi', 'compara',
>>> 'GeneTree');
>> Yes, only the GeneTreeAdaptor can return GeneTree objects
> I am a bit confused.
> What is the best between ProteinTree and GeneTree ?
> Am I right saying that ProteinTree has to be associated with fetch_all()
> , and GeneTree with fetch_all(-tree_type => 'tree', -member_type =>
> 'protein') ?

Hi Sébastien

The two adaptors are designed to return different kinds of objects.

Methods from the ProteinTreeAdaptor return nodes (root nodes, internal 
nodes, leaves). It is a specialized version of GeneTreeNodeAdaptor to 
only keep "protein" nodes in the result and discard "ncrna" nodes.

Methods from the GeneTreeAdaptor return a GeneTree object.

In the past, we did not have a GeneTree object, and a tree was instanced 
by its root node. Since that, fetch_all in the ProteinTreeAdaptor had 
this special behaviour: it only returned root nodes and not internal 
nodes / leaves (in Ensembl, the fetch_all() method of an adaptor usually 
returns everything).

To fetch all trees, both adaptors only have a fetch_all() method.
The ProteinTreeAdaptor one does not have any arguments and returns all 
the root nodes. This includes trees, super-trees, and the clusterset 
(the artificial tree that connects them all). You will have to filter 
the result. The GeneTreeAdaptor one has more arguments and you can 
select a type of tree, a type of member, etc. The ProteinTreeAdaptor 
will actually be deprecated in e68. So I encourage you to use 
GeneTreeAdaptor instead.

> And that an adaptor with ProteinTree will require
> $tree->tree->get_tagvalue('taxon_name')
> but an adaptor with GeneTree will require
> $tree->root->get_tagvalue('taxon_name') ???

In your example, the first variable name is a bit confusing
With ProteinTreeAdaptor, you fetch $root, and you can call 
With GeneTreeAdaptor, you fetch $tree. Then, as you said, you can call 


>>>> Regarding the node tags, they have been cleaned up and rationalized.
>>>> "Duplication" does not exist any more. Please use "node_type" instead,
>>>> which is one of {speciation,duplication,dubious,gene_split} and roughly
>>>> maps to Duplication = 0, 2, 1 (respectively). taxon_name still exists
>>>> $protein_tree_adaptor->fetch_all returns all the roots of protein trees
>>>> and super-trees. You would still need to update the "Duplication" tag
>>>> call and filter out super-trees.
>>> So, doing this should work:
>>> my $protein_tree_adaptor = $reg->get_adaptor('Multi', 'compara',
>>> 'ProteinTree');
>>> my @children = @{$protein_tree_adaptor->fetch_all()};
>>> for my $tree (@children){
>>>      my $node_id  = $tree->root->node_id;
>>>      my $taxon    = $tree->root->get_tagvalue('taxon_name')
>>> }
>>> And $tree->root->get_tagvalue('node_type') to get duplication status.
>> Not really because the ProteinTree adaptor is used to return root nodes
>> (not tree objects). You can directly use "->node_id()" and
>> "->get_tagvalue(...)" on $tree
>>> Does $tree->tree->stable_id(); from API 66 work again ?
>> Yes, you can call ->tree() on a tree node to jump to the tree object.
>> To summarize the link between the two objects:
>>   - GeneTree::root() returns a GeneTreeNode
>>   - GeneTreeNode::tree() returns a GeneTree
>>   - GeneTreeNode::children() returns an array-ref of GeneTreeNode
>>   - GeneTreeNode::parent() returns a GeneTreeNode
>> Regards,
>> Matthieu

Matthieu Muffato, Ph.D.
Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom

More information about the Dev mailing list