[ensembl-dev] protein tree id and tagvalue

Matthieu Muffato muffato at ebi.ac.uk
Thu Jul 5 14:29:30 BST 2012


On the protein-tree side, there are no trees with only 1 sequence. There 
are between 4,000 and 5,000 trees with 2 sequences.

With respect to the database, they contain almost the same information 
as larger trees: multiple alignment, node / tree tags, homologies, etc
The main difference is that we don't run treebest on them (treebest 
needs at least 3 sequences). As a consequence, we don't have any branch 
lengths on these trees

Matthieu

On 05/07/12 13:18, Moretti Sébastien wrote:
> There are no "fake" trees with 1 or 2 sequences only ?
>
>> All the internal nodes should have those tags.
>> Leaves can only have one tag: "lost_taxon_id". It contains the list of
>> taxa that have lost the gene on the terminal branch.
>>
>> Regards,
>> Matthieu
>>
>> On Thu 05 Jul 2012 12:54:11 BST, Moretti Sébastien wrote:
>>>> Nice to hear that :)
>>>>
>>>> What do you mean by "has_tag() == false" ? has_tag is supposed to have
>>>> one argument: the tag name to be tested.
>>>> And are you speaking of node tags (from a GeneTreeNode object) ? or
>>>> tree-wide tags (from a GeneTree object) ?
>>>
>>> I speak about GeneTreeNode objects.
>>> Some trees, or nodes in trees, do not have has_tag('taxon_name') or
>>> has_tag('node_type').





More information about the Dev mailing list