[ensembl-dev] Protein tree duplication nodes from newick format

Andy Yates ayates at ebi.ac.uk
Wed Jun 29 10:57:04 BST 2011

Hi Luc,

AFAIR newick format does not allow for the addition of annotations other than the name of the node & the distance of the branch. There is obviously some leeway as to what a name is (the website's newick trees have names like ENSP00000369497_Hsap_) but really newick is not the right format here. Newhampshire extended (NHX) & PhyloXML are better formats for representing this information. Whenever a NHX node has an attribute where D=Y then that is a duplication node. PhyloXML has clade elements with a tag <duplications>1</duplications>.

Extended newhampshire can be generated using a method available on the NestedSet object:

my $nhx = $tree->nhx_format();

PhyloXML is available from a writer object (this example writes the format to a scalar but you can pass it a file location if required):

use Bio::EnsEMBL::Compara::Graph::PhyloXMLWriter;
use IO::String;
my $string_handle = IO::String->new();
my $w = Bio::EnsEMBL::Compara::Graph::PhyloXMLWriter->new(-HANDLE => $string_handle, -NO_SEQUENCES => 1);
my $phyloxml = ${$string_handle->string_ref()};

Hope this helps,


On 29 Jun 2011, at 10:24, Luc Merenda wrote:

> Hi Ensembl,
> I would like to know if it is possible to discriminate duplication
> nodes in a compara protein tree using the newick format of the tree
> obtained by using the
> newick_format method in the Bio::EnsEMBL::Compara NestedSet class.
> Thank you in advance.
> Luc
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/

More information about the Dev mailing list