[ensembl-dev] Fwd: Retrieving CAFE gene tree data...

Miguel Pignatelli mp at ebi.ac.uk
Fri Jun 7 10:29:07 BST 2013


Hi Steve,

On 07/06/13 10:17, Steve Moss wrote:
> Dear Miguel,
>
> That is fantastic, thank you so much for your help!
>
> I'm perhaps an exception to the rule. I had run a CAFE analysis on these
> data myself, before this was included in the Compara pipeline/API, and
> so wanted to check and see if my results tallied with the results that
> you get.
>

Yes, that makes sense.
It would be great if you get back to us with the results of your 
comparison. Any help in improving our analysis is always welcome.

> I have written a parser for the CAFE data already and so this code will
> help me to output your data in a format that can run through my parser
> and allow me to make the direct comparisons, that I discuss above.
>
> Many thanks for that code, it works beautifully and with a little
> modification to the per-internal-node section, it is just what I need!
>

Glad to hear that.

> Yes, I had seen the $cafe_tree->get_expansions and
> $cafe_tree->get_contractions methods, which look very useful.
>
> Once again, many thanks for your assistance!
>

You're very welcome,

Cheers,

M;

> Kindest regards,
>
> Steve Moss
> http://about.me/gawbul
>
>
> On 6 June 2013 16:38, Miguel Pignatelli <mp at ebi.ac.uk
> <mailto:mp at ebi.ac.uk>> wrote:
>
>     Hi Steve,
>
>     It is a bit frustrating writing a parser for the CAFE output format
>     just to realize that the users want the raw textual output :-S
>
>     Btw, it is not easy for me to understand why you want the raw
>     un-parsed CAFE output. I would say that you would need to parse it
>     anyway to make sense of all those per-node-pair-information, etc...
>
>     Anyway, you can get something similar using the following code:
>
>     my $gene_stable_id = 'ENSFM00250000006933';
>     my $member =
>     $gene_member_adaptor->fetch___by_source_stable_id(undef,
>     $gene_stable_id);
>     my $gene_tree = $gene_tree_adaptor->fetch___default_for_Member($member);
>     my $cafe_tree = $cafe_tree_adaptor->fetch_by___GeneTree($gene_tree);
>
>     print $member->stable_id, "\t";
>     print $gene_tree->stable_id, "\t";
>
>     my $tree_fmt = '%{-s}%{x-}_%{N}:%{d}';
>     print $cafe_tree->newick_format('__ryo', $tree_fmt), "\t";
>     print $cafe_tree->pvalue_avg, "\n";
>
>     For the per-internal-node information you can use something like:
>
>     for my $node (@{$cafe_tree->get_all_nodes}) {
>        my $node_name = $node->is_leaf ? $node->genome_db->short_name :
>     $node->taxon_id;
>        my $node_n_members = $node->n_members;
>        my $node_pvalue = $node->pvalue || "birth";
>        my $dynamics = "[no change]";
>        if ($node->is_contraction) {
>          $dynamics = "[contraction]";
>        } elsif ($node->is_expansion) {
>          $dynamics = "[expansion]";
>        }
>        print "$node_name => $node_n_members ($node_pvalue) $dynamics\n";
>     }
>
>     If you prefer to get the nodes that are significantly expanded or
>     contracted instead of having to traverse the whole tree, you can use
>     the specialized methods on the API instead.
>
>     Please, let me know if this solves your issue,
>
>     Cheers,
>
>     M;
>
>     On 06/06/13 13:55, Steve Moss wrote:
>
>         Dear EnsEMBL developers,
>
>         I'm trying to work out how best to retrieve a tree (Newick format?)
>         showing the significant expansions and contractions, as you can see
>         here, for example
>         http://www.ensembl.org/Homo___sapiens/Gene/SpeciesTree?db=__core;g=ENSG00000159917;r=19:__44782947-44813601;t=__ENST00000291182
>         <http://www.ensembl.org/Homo_sapiens/Gene/SpeciesTree?db=core;g=ENSG00000159917;r=19:44782947-44813601;t=ENST00000291182>.
>
>         I'm playing around with the API at the moment and pulling some
>         data out,
>         but it isn't that intuitive. The current CAFEGeneFamily and
>         CAFEGeneFamilyAdaptor code doesn't seem to have any examples on
>         this and
>         I can't find anything in the EnsEMBL tutorial information.
>
>         I've been through all my candidate genes, pulled the CAFE gene
>         tree root
>         IDs for those data, that have a significant CAFE gene gain/loss
>         tree,
>         but am struggling with where to go from there to build the final
>         "product" i.e. a text representation of the above graphic?
>
>         Ideally, I would like to be able to output this in a CAFE style
>         output
>         format (EnsEMBL Family ID, Gene Tree (with species name and gene
>         count),
>         Family P Value, Nodes P Values), e.g.:
>
>         ENSFM00250000006933(((((((((__Homosapiens_7:6.4,__Pantroglodytes_1:6.4)_1:2.4,__Gorillagorilla_1:8.8)_1:6.9,__Pongoabelii_1:15.7)_1:4.7,__Nomascusleucogenys_1:20.4)_1:__8.8,Macacamulatta_1:29.2)_1:__13.4,Callithrixjacchus_1:42.6)___1:22.6,Tarsiussyrichta_0:65.__2)_1:8.8,(Microcebusmurinus_1:__57.9,Otolemurgarnettii_1:57.9)___1:16.1)_1:16,Tupaiabelangeri___1:90)_10.004000((0.000000,0.__538467),(0.513088,0.543748),(__0.534524,0.597471),(0.524517,__0.585395),(0.544478,0.583311),__(0.568740,0.701965),(0.613489,__0.272744),(0.541873,0.581644),__(0.675047,0.563343),(0.591851,__0.503703))
>
>         Has anyone done this already?
>
>         Kindest regards,
>
>         Steve Moss
>         http://about.me/gawbul
>
>         _________________________________________________
>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>         Posting guidelines and subscribe/unsubscribe info:
>         http://lists.ensembl.org/__mailman/listinfo/dev
>         <http://lists.ensembl.org/mailman/listinfo/dev>
>         Ensembl Blog: http://www.ensembl.info/
>
>
>     --
>
>     Miguel Pignatelli, PhD
>
>     Ensembl Developer - Comparative Genomics
>     European Bioinformatics Institute (EMBL-EBI)
>     Wellcome Trust Genome Campus, Hinxton
>     Cambridge - CB10 1SD - UK
>     Room A3-33
>     Phone + 44 (0) 1223 494 598
>     Fax   + 44 (0) 1223 494 468
>

-- 

Miguel Pignatelli, PhD

Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
Room A3-33
Phone + 44 (0) 1223 494 598
Fax   + 44 (0) 1223 494 468




More information about the Dev mailing list