[ensembl-dev] Gene Tree ambiguous nodes

Sébastien Moretti sebastien.moretti at unil.ch
Tue May 17 12:59:36 BST 2011

Thanks Matthieu

What is the threshold to define a node as ambiguous ?
Value range seems to be between 0 and 1

> Hello Sébastien
> In the database, the value is stored in the protein_tree_tag table (tag:
> "duplication_confidence_score"). It can be retrieved with the following
> API method: $node->get_tagvalue("duplication_confidence_score");
> Regards,
> Matthieu Muffato
>>> Hello Sébastien
>>> An ambiguous node is a duplication node with a duplication confidence
>>> score of 0. It means that the two resulting copies of the duplication
>>> can
>>> not be found at the same time in the same species. There is indeed a
>>> correlation with the bootstrap value, but the latter isn't use in the
>>> definition.
>> Do you know where this duplication confidence score is stored in the
>> compara database ?
>> Or how to access it through the ensembl API ?
>>> Right now, at most one lost taxon id is stored in the database. So the
>>> the
>>> API cannot help you to retrieve the full information, you'll have to
>>> rebuild the list of lost taxa by comparing the gene trees to the species
>>> tree.
>> Okay.
>>> Regards
>>> Matthieu Muffato
>>>> At the same time, do you store lost taxa in trees (NHX) as TreeFam does
>>>> ?
>>>>> Hi
>>>>> I wonder how ambiguous nodes are defined in gene trees.
>>>>> Nothing seems to be attached to the D flag in NHX labels for ambiguous
>>>>> nodes.
>>>>> ambiguous nodes seem to be related to bootstrap values.
>>>>> Is it true ?
>>>>> If true, what is the bootstrap threshold you use to define a node as
>>>>> ambiguous ?
>>>>> Regards

