[ensembl-dev] Fwd: Retrieving CAFE gene tree data...
Miguel Pignatelli
mp at ebi.ac.uk
Fri Jun 7 10:29:07 BST 2013
Hi Steve,
On 07/06/13 10:17, Steve Moss wrote:
> Dear Miguel,
>
> That is fantastic, thank you so much for your help!
>
> I'm perhaps an exception to the rule. I had run a CAFE analysis on these
> data myself, before this was included in the Compara pipeline/API, and
> so wanted to check and see if my results tallied with the results that
> you get.
>
Yes, that makes sense.
It would be great if you get back to us with the results of your
comparison. Any help in improving our analysis is always welcome.
> I have written a parser for the CAFE data already and so this code will
> help me to output your data in a format that can run through my parser
> and allow me to make the direct comparisons, that I discuss above.
>
> Many thanks for that code, it works beautifully and with a little
> modification to the per-internal-node section, it is just what I need!
>
Glad to hear that.
> Yes, I had seen the $cafe_tree->get_expansions and
> $cafe_tree->get_contractions methods, which look very useful.
>
> Once again, many thanks for your assistance!
>
You're very welcome,
Cheers,
M;
> Kindest regards,
>
> Steve Moss
> http://about.me/gawbul
>
>
> On 6 June 2013 16:38, Miguel Pignatelli <mp at ebi.ac.uk
> <mailto:mp at ebi.ac.uk>> wrote:
>
> Hi Steve,
>
> It is a bit frustrating writing a parser for the CAFE output format
> just to realize that the users want the raw textual output :-S
>
> Btw, it is not easy for me to understand why you want the raw
> un-parsed CAFE output. I would say that you would need to parse it
> anyway to make sense of all those per-node-pair-information, etc...
>
> Anyway, you can get something similar using the following code:
>
> my $gene_stable_id = 'ENSFM00250000006933';
> my $member =
> $gene_member_adaptor->fetch___by_source_stable_id(undef,
> $gene_stable_id);
> my $gene_tree = $gene_tree_adaptor->fetch___default_for_Member($member);
> my $cafe_tree = $cafe_tree_adaptor->fetch_by___GeneTree($gene_tree);
>
> print $member->stable_id, "\t";
> print $gene_tree->stable_id, "\t";
>
> my $tree_fmt = '%{-s}%{x-}_%{N}:%{d}';
> print $cafe_tree->newick_format('__ryo', $tree_fmt), "\t";
> print $cafe_tree->pvalue_avg, "\n";
>
> For the per-internal-node information you can use something like:
>
> for my $node (@{$cafe_tree->get_all_nodes}) {
> my $node_name = $node->is_leaf ? $node->genome_db->short_name :
> $node->taxon_id;
> my $node_n_members = $node->n_members;
> my $node_pvalue = $node->pvalue || "birth";
> my $dynamics = "[no change]";
> if ($node->is_contraction) {
> $dynamics = "[contraction]";
> } elsif ($node->is_expansion) {
> $dynamics = "[expansion]";
> }
> print "$node_name => $node_n_members ($node_pvalue) $dynamics\n";
> }
>
> If you prefer to get the nodes that are significantly expanded or
> contracted instead of having to traverse the whole tree, you can use
> the specialized methods on the API instead.
>
> Please, let me know if this solves your issue,
>
> Cheers,
>
> M;
>
> On 06/06/13 13:55, Steve Moss wrote:
>
> Dear EnsEMBL developers,
>
> I'm trying to work out how best to retrieve a tree (Newick format?)
> showing the significant expansions and contractions, as you can see
> here, for example
> http://www.ensembl.org/Homo___sapiens/Gene/SpeciesTree?db=__core;g=ENSG00000159917;r=19:__44782947-44813601;t=__ENST00000291182
> <http://www.ensembl.org/Homo_sapiens/Gene/SpeciesTree?db=core;g=ENSG00000159917;r=19:44782947-44813601;t=ENST00000291182>.
>
> I'm playing around with the API at the moment and pulling some
> data out,
> but it isn't that intuitive. The current CAFEGeneFamily and
> CAFEGeneFamilyAdaptor code doesn't seem to have any examples on
> this and
> I can't find anything in the EnsEMBL tutorial information.
>
> I've been through all my candidate genes, pulled the CAFE gene
> tree root
> IDs for those data, that have a significant CAFE gene gain/loss
> tree,
> but am struggling with where to go from there to build the final
> "product" i.e. a text representation of the above graphic?
>
> Ideally, I would like to be able to output this in a CAFE style
> output
> format (EnsEMBL Family ID, Gene Tree (with species name and gene
> count),
> Family P Value, Nodes P Values), e.g.:
>
> ENSFM00250000006933(((((((((__Homosapiens_7:6.4,__Pantroglodytes_1:6.4)_1:2.4,__Gorillagorilla_1:8.8)_1:6.9,__Pongoabelii_1:15.7)_1:4.7,__Nomascusleucogenys_1:20.4)_1:__8.8,Macacamulatta_1:29.2)_1:__13.4,Callithrixjacchus_1:42.6)___1:22.6,Tarsiussyrichta_0:65.__2)_1:8.8,(Microcebusmurinus_1:__57.9,Otolemurgarnettii_1:57.9)___1:16.1)_1:16,Tupaiabelangeri___1:90)_10.004000((0.000000,0.__538467),(0.513088,0.543748),(__0.534524,0.597471),(0.524517,__0.585395),(0.544478,0.583311),__(0.568740,0.701965),(0.613489,__0.272744),(0.541873,0.581644),__(0.675047,0.563343),(0.591851,__0.503703))
>
> Has anyone done this already?
>
> Kindest regards,
>
> Steve Moss
> http://about.me/gawbul
>
> _________________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/__mailman/listinfo/dev
> <http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/
>
>
> --
>
> Miguel Pignatelli, PhD
>
> Ensembl Developer - Comparative Genomics
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge - CB10 1SD - UK
> Room A3-33
> Phone + 44 (0) 1223 494 598
> Fax + 44 (0) 1223 494 468
>
--
Miguel Pignatelli, PhD
Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
Room A3-33
Phone + 44 (0) 1223 494 598
Fax + 44 (0) 1223 494 468
More information about the Dev
mailing list