[ensembl-dev] TreeBEST-compatible species tree
Matthieu Muffato
muffato at ebi.ac.uk
Thu Dec 5 10:22:12 GMT 2019
Hi Greg
When we run our pipelines, e.g. TreeBest, we use internal node
identifiers instead of species names. It's shorter and less ambiguous to
parse. It's all part of the pipeline, but you can adapt the
ensembl-compara/scripts/examples/species_getSpeciesTree.pl script to get
it. Replace the string %{n} with %{o}%{-E"*"} .
"o" tells the formatter to use node IDs, and the star character must be
added for TreeBest not to penalise the species. Initially this was meant
to cater for the "low-coverage" mammals assembly, but in our tests a few
years ago it didn't seem useful any more, so we flag everything as
"fully sequenced" by adding the star character.
This script is not compatible with the e78 API / database schema, but I
think you can use the latest species-tree and prune it. For
protein-trees we use the NCBI taxonomy, and I don't think it's changed
much (I remember the relationship between birds, turtles and other
reptiles changed at some point, but can't remember when)
Hope this helps,
Matthieu
On 03/12/2019 11:05, Greg Slodkowicz wrote:
> Dear developers,
> I was wondering if there is a way to access the species tree that is
> used for running TreeBEST? I have the ’normal’ Newick species tree
> from the Compara GitHub repository but it seems like TreeBEST is quite
> picky about the labelling of the tree nodes.
>
> What I would like to do is re-run gene-species tree reconciliation for
> a few gene trees of interest, get bootstrap replicates for that tree
> and then run some downstream analysis on them to get a measure of
> uncertainty introduced by the differences in the tree.I would ideally
> like the species tree from an archival (release 78) version of Ensembl
> (though I imagine it hasn’t changed that much).
>
> Many thanks,
> Greg
More information about the Dev
mailing list