[ensembl-dev] TreeBEST-compatible species tree

Greg Slodkowicz gslodko at mrc-lmb.cam.ac.uk
Fri Dec 6 17:45:58 GMT 2019


Hi Matthieu
Thanks very much for the quick reply. I can see that there’s a bunch of parameters that can be passed to the script you mentioned but, as far as I can tell, these are not documented:
       'url=s'          => \$url,
       'mlss_id=s'      => \$mlss_id,
       'method=s'       => \$method,
       'ss_name=s'      => \$ss_name,
       'label=s'        => \$label,
       'stn_root_id=i'  => \$stn_root_id,
       'with_distances' => \$with_distances,
       'ascii_scale=f'  => \$ascii_scale,
       'reg_conf=s'     => \$reg_conf,
       'compara_db=s'   => \$compara_db

I couldn’t find a combination of url, reg_conf and compara_db that would work so I just hardcoded a DBA that seems to work. Then, by experimenting with combinations of species tree root and species set ID listed here: http://ensembl.org/info/docs/api/compara/compara_schema.html#species_tree_root <http://ensembl.org/info/docs/api/compara/compara_schema.html#species_tree_root> I think I got the tree I wanted. I was also able to modify the ‘roll your own’ parameters without any problems.

Thanks again,
Greg
-- 
Dr Greg Slodkowicz
Structural Studies (Babu Group)
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge, CB2 0QH, UK

> On 5 Dec 2019, at 10:22, Matthieu Muffato <muffato at ebi.ac.uk> wrote:
> 
> Hi Greg
> 
> When we run our pipelines, e.g. TreeBest, we use internal node identifiers instead of species names. It's shorter and less ambiguous to parse. It's all part of the pipeline, but you can adapt the ensembl-compara/scripts/examples/species_getSpeciesTree.pl script to get it. Replace the string %{n} with %{o}%{-E"*"} .
> 
> "o" tells the formatter to use node IDs, and the star character must be added for TreeBest not to penalise the species. Initially this was meant to cater for the "low-coverage" mammals assembly, but in our tests a few years ago it didn't seem useful any more, so we flag everything as "fully sequenced" by adding the star character.
> 
> This script is not compatible with the e78 API / database schema, but I think you can use the latest species-tree and prune it. For protein-trees we use the NCBI taxonomy, and I don't think it's changed much (I remember the relationship between birds, turtles and other reptiles changed at some point, but can't remember when)
> 
> Hope this helps,
> Matthieu
> 
> On 03/12/2019 11:05, Greg Slodkowicz wrote:
>> Dear developers,
>> I was wondering if there is a way to access the species tree that is used for running TreeBEST? I have the ’normal’ Newick species tree from the Compara GitHub repository but it seems like TreeBEST is quite picky about the labelling of the tree nodes.
>> 
>> What I would like to do is re-run gene-species tree reconciliation for a few gene trees of interest, get bootstrap replicates for that tree and then run some downstream analysis on them to get a measure of uncertainty introduced by the differences in the tree.I would ideally like the species tree from an archival (release 78) version of Ensembl (though I imagine it hasn’t changed that much).
>> 
>> Many thanks,
>> Greg
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20191206/e66fa787/attachment.html>


More information about the Dev mailing list