[ensembl-dev] Protein tree duplication nodes from newick format
Andy Yates
ayates at ebi.ac.uk
Wed Jun 29 10:57:04 BST 2011
Hi Luc,
AFAIR newick format does not allow for the addition of annotations other than the name of the node & the distance of the branch. There is obviously some leeway as to what a name is (the website's newick trees have names like ENSP00000369497_Hsap_) but really newick is not the right format here. Newhampshire extended (NHX) & PhyloXML are better formats for representing this information. Whenever a NHX node has an attribute where D=Y then that is a duplication node. PhyloXML has clade elements with a tag <duplications>1</duplications>.
Extended newhampshire can be generated using a method available on the NestedSet object:
my $nhx = $tree->nhx_format();
PhyloXML is available from a writer object (this example writes the format to a scalar but you can pass it a file location if required):
use Bio::EnsEMBL::Compara::Graph::PhyloXMLWriter;
use IO::String;
my $string_handle = IO::String->new();
my $w = Bio::EnsEMBL::Compara::Graph::PhyloXMLWriter->new(-HANDLE => $string_handle, -NO_SEQUENCES => 1);
$w->write_trees($tree);
$w->finish();
my $phyloxml = ${$string_handle->string_ref()};
Hope this helps,
Andy
On 29 Jun 2011, at 10:24, Luc Merenda wrote:
> Hi Ensembl,
> I would like to know if it is possible to discriminate duplication
> nodes in a compara protein tree using the newick format of the tree
> obtained by using the
> newick_format method in the Bio::EnsEMBL::Compara NestedSet class.
> Thank you in advance.
> Luc
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
--
Andrew Yates Ensembl Genomes Engineer
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the Dev
mailing list