[ensembl-dev] Question on compara gene trees

Javier Herrero (TGAC) Javier.Herrero at tgac.ac.uk
Fri Nov 15 22:44:05 GMT 2013


Dear Sunita

You can check the activity on the list in here: http://lists.ensembl.org/pipermail/dev/2013-November/thread.html#9446. You will see that your last two emails have been received correctly.

I will try to answer some of your questions, but please have a look at this page: http://www.ensembl.org/info/genome/compara/homology_method.html where you will find a few details about the methodology used to build the phylogenetic trees.

The trees are typically rooted using outgroups. This is done internally by TreeBeST, the software developed by the Heng Li (TreeFam) and currently used in Ensembl. The branch lengths represent an estimate of the number of mutations based on the back-translated alignment, using the HKY model in PHYML. Therefore, the trees are phylograms. As far as I remember, the bootstrap support comes from a 100 resampling replicates (i.e. Felsenstein 1985).

The alignments are available in any of the other files in the same FTP directory. The file you have downloaded is smaller because it only lists the trees.

As a general rule, the orthologs in Ensembl do not have a confidence value as of now. There is a low-confidence set of orthologs call “possible orthologs” which represents the closest homolog when no other ortholog is found. Please refer to the aforementioned URL for more details on this.

Kind regards

Javier

On 15 Nov 2013, at 20:26, Kumari, Sunita <kumari at cshl.edu<mailto:kumari at cshl.edu>> wrote:

Hi Ensembl team,

I will really appreciate if someone can answer my questions quickly.

I did not get any response so far. I am not sure even if you are getting my emails.

Thanks much.

Sunita




========================

From: Kumari, Sunita
Sent: Thursday, November 14, 2013 3:47 PM
To: dev at ensembl.org<mailto:dev at ensembl.org>
Subject: quick questions for gene trees

Hi Ensembl compara team,

I am using this ensemble ftp site to get alignment files and gene
trees in newick format:

ftp://ftp.ensemblgenomes.org/pub/plants/release-20/emf/ensembl-compara/homologies/

I am using  Compara.gene_trees.20.emf.gz and Compara.newinck_trees.20.emf.gz files

I have couple of questions. I would appreciate if you can please provide me some information.

1. metadata information on gene trees:

a) Are the trees outgroup OR midpoint rooted?

b) The branch length unit is replacements per position, arbitrary
units or million years?

c) Tree style is cladogram, phylogram, or phenogram?

d) bootstrap type is felsenstein 1985, aLRT SH-like branch support, or
bayesian posterior probability?


2. For alignments (Compara.gene_trees.20.emf.gz):

Where can I get the alignment ID, i.e. the 'source DB alignment ID'?
e.g. What is the unique identifier for the alignment at the source
database?


3. InParanoid7 provides scoring values to orthologs. e.g.
http://inparanoid.sbc.su.se/cgi-bin/e.cgi?species1=93&species2=98&clusters_per_page=50&.submit=Submit+Query&clusterlowerlimit=1

Do we also provide scoring value to orthologs using Compara pipeline?
If not, any plan to provide this value in next release?

Looking forward to your reply.

Thanks.

Sunita
________________________________________

Sunita Kumari, PhD
Bioinformatics Scientist,
Ware Lab,
Cold Spring Harbor Labs,
Cold Spring Harbor, NY -11724

________________________________________
From: Kumari, Sunita
Sent: Tuesday, November 12, 2013 3:37 PM
To: dev at ensembl.org
Subject: Question on compara gene trees

Dear Ensembl compara team,


I have couple of questions on metadata for gene trees. I am using this ensemble ftp site to get alignment files and gene trees in newick format:
ftp://ftp.ensemblgenomes.org/pub/plants/release-20/emf/ensembl-compara/homologies/

Q1.  For each tree, can we get the following information; pl confirm the answer given below each comment.

a) If the tree is Outgroup_OR_Midpoint rooted;
-----Probably Outgroup

b) branch_length        unit is "Replacements per position" OR "Arbitrary units" OR "Million years";
---Probably arbitrary

c) tree style is "Cladogram" OR "Phylogram" OR "Phenogram";
-- Phylogram

d) bootstrap_type       is "Felsenstein 1985" OR "aLRT SH-like branch support" OR "Bayesian posterior probability"

please provide the correct bootstrap type.


Q2. Is it possible to get conservation score in next compara release for Ensembl plant genomes?
What will be the probable timeline to get scoring available?


Thanks.

Sunita

Sunita Kumari, PhD
Bioinformatics Scientist,
Ware Lab,
Cold Spring Harbor Labs,
Cold Spring Harbor, NY - 11724

_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

--
Javier Herrero, PhD
Comparative Genomics Project Leader
TGAC, Norwich Research Park
Norwich, NR4 7UH, UK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131115/ca55a579/attachment.html>


More information about the Dev mailing list