[ensembl-dev] Fwd: Retrieving CAFE gene tree data...

Miguel Pignatelli mp at ebi.ac.uk
Thu Jun 6 16:38:34 BST 2013


Hi Steve,

It is a bit frustrating writing a parser for the CAFE output format just 
to realize that the users want the raw textual output :-S

Btw, it is not easy for me to understand why you want the raw the 
un-parsed CAFE output. I would say that you would need to parse it 
anyway to make sense of all those per-node-pair-information, etc...

Anyway, you can get something similar using the following code:

my $gene_stable_id = 'ENSFM00250000006933';
my $member = $gene_member_adaptor->fetch_by_source_stable_id(undef, 
$gene_stable_id);
my $gene_tree = $gene_tree_adaptor->fetch_default_for_Member($member);
my $cafe_tree = $cafe_tree_adaptor->fetch_by_GeneTree($gene_tree);

print $member->stable_id, "\t";
print $gene_tree->stable_id, "\t";

my $tree_fmt = '%{-s}%{x-}_%{N}:%{d}';
print $cafe_tree->newick_format('ryo', $tree_fmt), "\t";
print $cafe_tree->pvalue_avg, "\n";

For the per-internal-node information you can use something like:

for my $node (@{$cafe_tree->get_all_nodes}) {
   my $node_name = $node->is_leaf ? $node->genome_db->short_name : 
$node->taxon_id;
   my $node_n_members = $node->n_members;
   my $node_pvalue = $node->pvalue || "birth";
   my $dynamics = "[no change]";
   if ($node->is_contraction) {
     $dynamics = "[contraction]";
   } elsif ($node->is_expansion) {
     $dynamics = "[expansion]";
   }
   print "$node_name => $node_n_members ($node_pvalue) $dynamics\n";
}

If you prefer to get the nodes that are significantly expanded or 
contracted instead of having to traverse the whole tree, you can use the 
specialized methods on the API instead.

Please, let me know if this solves your issue,

Cheers,

M;




On 06/06/13 13:55, Steve Moss wrote:
> Dear EnsEMBL developers,
>
> I'm trying to work out how best to retrieve a tree (Newick format?)
> showing the significant expansions and contractions, as you can see
> here, for example
> http://www.ensembl.org/Homo_sapiens/Gene/SpeciesTree?db=core;g=ENSG00000159917;r=19:44782947-44813601;t=ENST00000291182.
>
> I'm playing around with the API at the moment and pulling some data out,
> but it isn't that intuitive. The current CAFEGeneFamily and
> CAFEGeneFamilyAdaptor code doesn't seem to have any examples on this and
> I can't find anything in the EnsEMBL tutorial information.
>
> I've been through all my candidate genes, pulled the CAFE gene tree root
> IDs for those data, that have a significant CAFE gene gain/loss tree,
> but am struggling with where to go from there to build the final
> "product" i.e. a text representation of the above graphic?
>
> Ideally, I would like to be able to output this in a CAFE style output
> format (EnsEMBL Family ID, Gene Tree (with species name and gene count),
> Family P Value, Nodes P Values), e.g.:
>
> ENSFM00250000006933(((((((((Homosapiens_7:6.4,Pantroglodytes_1:6.4)_1:2.4,Gorillagorilla_1:8.8)_1:6.9,Pongoabelii_1:15.7)_1:4.7,Nomascusleucogenys_1:20.4)_1:8.8,Macacamulatta_1:29.2)_1:13.4,Callithrixjacchus_1:42.6)_1:22.6,Tarsiussyrichta_0:65.2)_1:8.8,(Microcebusmurinus_1:57.9,Otolemurgarnettii_1:57.9)_1:16.1)_1:16,Tupaiabelangeri_1:90)_10.004000((0.000000,0.538467),(0.513088,0.543748),(0.534524,0.597471),(0.524517,0.585395),(0.544478,0.583311),(0.568740,0.701965),(0.613489,0.272744),(0.541873,0.581644),(0.675047,0.563343),(0.591851,0.503703))
>
> Has anyone done this already?
>
> Kindest regards,
>
> Steve Moss
> http://about.me/gawbul
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 

Miguel Pignatelli, PhD

Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
Room A3-33
Phone + 44 (0) 1223 494 598
Fax   + 44 (0) 1223 494 468




More information about the Dev mailing list