[ensembl-dev] quick questions for gene trees

Kumari, Sunita kumari at cshl.edu
Thu Nov 14 20:47:24 GMT 2013


Hi Ensembl compara team, 

I am using this ensemble ftp site to get alignment files and gene
trees in newick format:

ftp://ftp.ensemblgenomes.org/pub/plants/release-20/emf/ensembl-compara/homologies/

I am using  Compara.gene_trees.20.emf.gz and Compara.newinck_trees.20.emf.gz files

I have couple of questions. I would appreciate if you can please provide me some information. 

1. metadata information on gene trees:

a) Are the trees outgroup OR midpoint rooted?

b) The branch length unit is replacements per position, arbitrary
units or million years?

c) Tree style is cladogram, phylogram, or phenogram?

d) bootstrap type is felsenstein 1985, aLRT SH-like branch support, or
bayesian posterior probability?


2. For alignments (Compara.gene_trees.20.emf.gz):

Where can I get the alignment ID, i.e. the 'source DB alignment ID'?
e.g. What is the unique identifier for the alignment at the source
database?


3. InParanoid7 provides scoring values to orthologs. e.g.
http://inparanoid.sbc.su.se/cgi-bin/e.cgi?species1=93&species2=98&clusters_per_page=50&.submit=Submit+Query&clusterlowerlimit=1

Do we also provide scoring value to orthologs using Compara pipeline?
If not, any plan to provide this value in next release?

Looking forward to your reply.

Thanks.

Sunita
________________________________________

Sunita Kumari, PhD
Bioinformatics Scientist,
Ware Lab,
Cold Spring Harbor Labs,
Cold Spring Harbor, NY -11724



More information about the Dev mailing list