[ensembl-dev] Fwd: Retrieving CAFE gene tree data...

Steve Moss gawbul at gmail.com
Fri Jun 7 10:17:29 BST 2013


Dear Miguel,

That is fantastic, thank you so much for your help!

I'm perhaps an exception to the rule. I had run a CAFE analysis on these
data myself, before this was included in the Compara pipeline/API, and so
wanted to check and see if my results tallied with the results that you get.

I have written a parser for the CAFE data already and so this code will
help me to output your data in a format that can run through my parser and
allow me to make the direct comparisons, that I discuss above.

Many thanks for that code, it works beautifully and with a little
modification to the per-internal-node section, it is just what I need!

Yes, I had seen the $cafe_tree->get_expansions and $cafe_tree->get_contractions
methods, which look very useful.

Once again, many thanks for your assistance!

Kindest regards,

Steve Moss
http://about.me/gawbul


On 6 June 2013 16:38, Miguel Pignatelli <mp at ebi.ac.uk> wrote:

> Hi Steve,
>
> It is a bit frustrating writing a parser for the CAFE output format just
> to realize that the users want the raw textual output :-S
>
> Btw, it is not easy for me to understand why you want the raw un-parsed
> CAFE output. I would say that you would need to parse it anyway to make
> sense of all those per-node-pair-information, etc...
>
> Anyway, you can get something similar using the following code:
>
> my $gene_stable_id = 'ENSFM00250000006933';
> my $member = $gene_member_adaptor->fetch_**by_source_stable_id(undef,
> $gene_stable_id);
> my $gene_tree = $gene_tree_adaptor->fetch_**default_for_Member($member);
> my $cafe_tree = $cafe_tree_adaptor->fetch_by_**GeneTree($gene_tree);
>
> print $member->stable_id, "\t";
> print $gene_tree->stable_id, "\t";
>
> my $tree_fmt = '%{-s}%{x-}_%{N}:%{d}';
> print $cafe_tree->newick_format('**ryo', $tree_fmt), "\t";
> print $cafe_tree->pvalue_avg, "\n";
>
> For the per-internal-node information you can use something like:
>
> for my $node (@{$cafe_tree->get_all_nodes}) {
>   my $node_name = $node->is_leaf ? $node->genome_db->short_name :
> $node->taxon_id;
>   my $node_n_members = $node->n_members;
>   my $node_pvalue = $node->pvalue || "birth";
>   my $dynamics = "[no change]";
>   if ($node->is_contraction) {
>     $dynamics = "[contraction]";
>   } elsif ($node->is_expansion) {
>     $dynamics = "[expansion]";
>   }
>   print "$node_name => $node_n_members ($node_pvalue) $dynamics\n";
> }
>
> If you prefer to get the nodes that are significantly expanded or
> contracted instead of having to traverse the whole tree, you can use the
> specialized methods on the API instead.
>
> Please, let me know if this solves your issue,
>
> Cheers,
>
> M;
>
> On 06/06/13 13:55, Steve Moss wrote:
>
>> Dear EnsEMBL developers,
>>
>> I'm trying to work out how best to retrieve a tree (Newick format?)
>> showing the significant expansions and contractions, as you can see
>> here, for example
>> http://www.ensembl.org/Homo_**sapiens/Gene/SpeciesTree?db=**
>> core;g=ENSG00000159917;r=19:**44782947-44813601;t=**ENST00000291182<http://www.ensembl.org/Homo_sapiens/Gene/SpeciesTree?db=core;g=ENSG00000159917;r=19:44782947-44813601;t=ENST00000291182>
>> .
>>
>> I'm playing around with the API at the moment and pulling some data out,
>> but it isn't that intuitive. The current CAFEGeneFamily and
>> CAFEGeneFamilyAdaptor code doesn't seem to have any examples on this and
>> I can't find anything in the EnsEMBL tutorial information.
>>
>> I've been through all my candidate genes, pulled the CAFE gene tree root
>> IDs for those data, that have a significant CAFE gene gain/loss tree,
>> but am struggling with where to go from there to build the final
>> "product" i.e. a text representation of the above graphic?
>>
>> Ideally, I would like to be able to output this in a CAFE style output
>> format (EnsEMBL Family ID, Gene Tree (with species name and gene count),
>> Family P Value, Nodes P Values), e.g.:
>>
>> ENSFM00250000006933(((((((((**Homosapiens_7:6.4,**
>> Pantroglodytes_1:6.4)_1:2.4,**Gorillagorilla_1:8.8)_1:6.9,**
>> Pongoabelii_1:15.7)_1:4.7,**Nomascusleucogenys_1:20.4)_1:**
>> 8.8,Macacamulatta_1:29.2)_1:**13.4,Callithrixjacchus_1:42.6)**
>> _1:22.6,Tarsiussyrichta_0:65.**2)_1:8.8,(Microcebusmurinus_1:**
>> 57.9,Otolemurgarnettii_1:57.9)**_1:16.1)_1:16,Tupaiabelangeri_**
>> 1:90)_10.004000((0.000000,0.**538467),(0.513088,0.543748),(**
>> 0.534524,0.597471),(0.524517,**0.585395),(0.544478,0.583311),**
>> (0.568740,0.701965),(0.613489,**0.272744),(0.541873,0.581644),**
>> (0.675047,0.563343),(0.591851,**0.503703))
>>
>> Has anyone done this already?
>>
>> Kindest regards,
>>
>> Steve Moss
>> http://about.me/gawbul
>>
>> ______________________________**_________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/**mailman/listinfo/dev<http://lists.ensembl.org/mailman/listinfo/dev>
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> --
>
> Miguel Pignatelli, PhD
>
> Ensembl Developer - Comparative Genomics
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge - CB10 1SD - UK
> Room A3-33
> Phone + 44 (0) 1223 494 598
> Fax   + 44 (0) 1223 494 468
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130607/81caddae/attachment.html>


More information about the Dev mailing list