[ensembl-dev] trees

Joseph Steinberger joseph.steinberger at weizmann.ac.il
Wed Sep 11 16:18:43 BST 2019


Dear Ensembl Development community,

I previously asked the Ensembl Helpdesk the following question
Hello Ensembl help desk -

Thank you for your essential resource.

I would like to study vertebrate paralogues arising from the original two whole genome duplications in the ancestral vertebrate ancestor, as well as related research questions.
However, while Ensembl has supertrees where each paralogue is found in a separate genetree, it is impossible to know how the genetrees are related -
the supertree is not a real tree.

What is the Ensembl recommendation for turning supertrees into real trees? Has Ensembl already done this, or know of and recommend an outside effort/s?
What do you think about only taking genes from the ten species with best quality genomes in the supertree, and then running the protein tree algorithm on these ten percent of the total sequences - maybe this way the number will not be above 400 sequences, and the computation will be practical to accomplish;  Or, maybe I could make a consensus sequence for each genetree, and then run the protein tree algorithm on the consensus sequences?

and got the following answer
I have discussed this with our developers and they advise the following.

Some of our "supertrees" are indeed flat.

There are many ways of building supertrees. Both ways that you suggest are
possible, but we cannot advise on what would be better. There are other
methods based on sequence concatenation, matrices, etc. We cannot give a
recommendation on the best approach.

Please note that our tree-size limit is 1,500 genes, not 400.

Finally, the most important limitation for us is how much computation we can
afford during our production time in the Ensembl release cycle.

We use TreeBest, which is not multi-threaded, but there are alternatives such
as RAxML, ExAML that can run on dozens / hundreds of CPUs. Mafft has options
to deal with large multiple alignments, and there are other tools such as
PASTA. You may be able to build the tree and the alignment without using
supertree methods.

What is good about the response, and what would you further suggest?

Sincerely,
Joseph (Yossi) Steinberger

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190911/38de8e4e/attachment.html>


More information about the Dev mailing list