[ensembl-dev] documentation missing

Miguel Pignatelli mp at ebi.ac.uk
Wed Feb 13 10:04:50 GMT 2013


Hi Joe,

On 12/02/13 17:18, Joe Carl wrote:
> I could use a little help with this project.  I'm new to perl and new to
> the api for compara.  I'm better at python :-)
>

I hope you enjoy the experience :-)

> I'll outline the project I'm trying to produce a script.  I'm using
> PROCR as my example gene.  If you can identify the API functions I will
> need to use, then I can figure out the syntax to write the script.
> Knowing the write API functions is half the battle :)
>
> On the page
> http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000101000;r=20:33759876-33765165;t=ENST00000216968
> you can select the Gene tree (text) from the pulldown menu to the left.
>
> I want to use the API to fetch this Newick table.   (What functions do I
> need to use to fetch this data?)

Since you already know how to get adaptors...

my $member = $member_Adaptor->fetch_by_source_stable_id(undef, 
'ENSG00000101000');

my $geneTree = $geneTree_Adaptor->fetch_default_for_Member($member);

my $newick_tree = $geneTree->root->newick_format('full_web');

You can get different formats for the newick tree, I just used 
'full_web' because this is the format that you see in the page you 
pointed to (take a look at the documentation for 
Bio::EnsEMBL::Compara::NestedSet for more on this).


>  From the Newick table I can get the species gene names. With the gene
> name I can get the cDNA  -- Using the web page I do this by navigating
> to the gene of interest and then export the cDNA  getting both the cDNA
> and the genomic DNA.  (what functions do I need to fetch this data?)

for my $member (@{$geneTree->get_all_Members()}) {
     my $cdna = $member->sequence_cds();
     [...]
}

>
> Using the cDNA for each gene, and the Newick I can then run PAML to get
> the DN/DS for each branch.  Am I able to get the DN/DS through an API?

We compute dn and ds using PAML only for closely related pairs of 
species, so you may need to run it on your own for more distance pairs.

To retrieve the dn and ds values you may try something like this...

for my $member (@{$geneTree->get_all_Members()}) {
     my $seq = $member->sequence_cds();

     for my $homology (@{$homology_Adaptor->fetch_all_by_Member($member)}) {
         my $dn = $homology->dn();
         my $ds = $homology->ds();
	my $taxon = $homology->subtype();
      [...]
     }
}

The "subtype" method gives you the "LCA" of the homology relationship.
You may also want to filter by type of relationship (the "description" 
method).
Please, take a look at 
http://www.ensembl.org/info/docs/compara/homology_method.html if you 
haven't done so.


> I intend to generate the LCA for each node so I can calculate DN/DS on
> each branch length from LCA of human and Pig to each species.

If I understand this correctly, you can achieve this with the methods I 
posted above.

> Is there
> an API call to get the this LCA sequence already?  Is there an API call
> to get the DN/DS from any particular node to the next?  i.e. LCA
> (pig-human) to some intermediate LCA node?  (did that question make
> sense? :))

I don't know if I understand this correctly. Maybe you can figure out 
the answer from the methods I posted above. Don't hesitate to get back 
to us if something is still not clear.

Cheers,

M;



>
> I have already successfully used Bio::EnsEMBL::Registry->get_adaptor,
> and with that I have fetch genes of interest for various species (both
> nt and protein), so I'm not a total noob :)
>
> Thanks for the help.
>
> Joe
>
>
> On Tue, Feb 12, 2013 at 11:43 AM, Miguel Pignatelli <mp at ebi.ac.uk
> <mailto:mp at ebi.ac.uk>> wrote:
>
>     Hi Joe,
>
>
>     On 12/02/13 16:02, Joe Carl wrote:
>
>         I'm reading the compara database scheme webpage
>         (http://www.ensembl.org/info/__docs/api/compara/compara___schema.html
>         <http://www.ensembl.org/info/docs/api/compara/compara_schema.html>),
>         and
>         noticed the three most intersting functions (to me) are missing.
>
>         where is the documentation for the following
>
>            * species_tree_node
>
>         <http://www.ensembl.org/info/__docs/api/compara/compara___schema.html#species_tree_node
>         <http://www.ensembl.org/info/docs/api/compara/compara_schema.html#species_tree_node>>
>            * species_tree_node_tag
>
>         <http://www.ensembl.org/info/__docs/api/compara/compara___schema.html#species_tree_node___tag
>         <http://www.ensembl.org/info/docs/api/compara/compara_schema.html#species_tree_node_tag>>
>            * species_tree_roo
>
>         <http://www.ensembl.org/info/__docs/api/compara/compara___schema.html#species_tree_root
>         <http://www.ensembl.org/info/docs/api/compara/compara_schema.html#species_tree_root>>__t
>
>
>     Thanks for reporting this. I will make sure that the documentation
>     for those tables are properly displayed in the next release.
>
>     Those tables contain the species tree used in the gene gain/loss
>     analysis (http://www.ensembl.org/Help/__View?id=379
>     <http://www.ensembl.org/Help/View?id=379>), i.e, they contain the
>     full Ensembl species tree but in a ultrametric, binary format.
>
>     If I understand your problem correctly, you need to look at the
>     gene_tree_node, gene_tree_root and homology (and perhaps
>     homology_member) tables. Or better still, use the API to fetch the data.
>
>     Let us know if you need any help with this.
>
>     Cheers,
>
>     M;
>
>
>         My goal is the following:
>
>         1) Grab the phylgentic trees already generated by ensemble (for
>         specific
>         Gene ID)
>                a) Newicks
>                b) Sequences by species
>         2) Grab the DN/DS for various branch points from Pig to Human
>
>         If I need to I will grab the gene ID for each species and then
>         determine
>         the proper nucleotide sequence and grab those, using the newick
>         calculate Dn/DS using PAML to get descendant predicted LCA
>         sequences.
>
>         Is there a better way to do this than I have outlined?
>
>         Joe
>
>
>         _________________________________________________
>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>         Posting guidelines and subscribe/unsubscribe info:
>         http://lists.ensembl.org/__mailman/listinfo/dev
>         <http://lists.ensembl.org/mailman/listinfo/dev>
>         Ensembl Blog: http://www.ensembl.info/
>
>
>     --
>
>     Miguel Pignatelli, PhD
>
>     Ensembl Developer - Comparative Genomics
>     European Bioinformatics Institute (EMBL-EBI)
>     Wellcome Trust Genome Campus, Hinxton
>     Cambridge - CB10 1SD - UK
>     Room A3-33
>     Phone + 44 (0) 1223 494 598 <tel:%2B%2044%20%280%29%201223%20494%20598>
>     Fax + 44 (0) 1223 494 468 <tel:%2B%2044%20%280%29%201223%20494%20468>
>
>

-- 

Miguel Pignatelli, PhD

Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
Room A3-33
Phone + 44 (0) 1223 494 598
Fax   + 44 (0) 1223 494 468




More information about the Dev mailing list