[ensembl-dev] Protein tree duplication nodes from newick format

Luc Merenda merenda.luc at gmail.com
Wed Jun 29 11:07:09 BST 2011

Thank you very much Andy for you clear explanation and the useful info.

2011/6/29 Andy Yates <ayates at ebi.ac.uk>:
> Hi Luc,
> AFAIR newick format does not allow for the addition of annotations other than the name of the node & the distance of the branch. There is obviously some leeway as to what a name is (the website's newick trees have names like ENSP00000369497_Hsap_) but really newick is not the right format here. Newhampshire extended (NHX) & PhyloXML are better formats for representing this information. Whenever a NHX node has an attribute where D=Y then that is a duplication node. PhyloXML has clade elements with a tag <duplications>1</duplications>.
> Extended newhampshire can be generated using a method available on the NestedSet object:
> my $nhx = $tree->nhx_format();
> PhyloXML is available from a writer object (this example writes the format to a scalar but you can pass it a file location if required):
> use Bio::EnsEMBL::Compara::Graph::PhyloXMLWriter;
> use IO::String;
> my $string_handle = IO::String->new();
> my $w = Bio::EnsEMBL::Compara::Graph::PhyloXMLWriter->new(-HANDLE => $string_handle, -NO_SEQUENCES => 1);
> $w->write_trees($tree);
> $w->finish();
> my $phyloxml = ${$string_handle->string_ref()};
> Hope this helps,
> Andy
> On 29 Jun 2011, at 10:24, Luc Merenda wrote:
>> Hi Ensembl,
>> I would like to know if it is possible to discriminate duplication
>> nodes in a compara protein tree using the newick format of the tree
>> obtained by using the
>> newick_format method in the Bio::EnsEMBL::Compara NestedSet class.
>> Thank you in advance.
>> Luc
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> --
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/

More information about the Dev mailing list