[ensembl-dev] How to get compara trees for one-to-one orthologs only

Javier Herrero jherrero at ebi.ac.uk
Mon Jan 21 16:12:36 GMT 2013


Hi Sébastien

This is a common question, but the answer is complex and usually differs 
depending on what you want to achieve.

It seems you are interested in a specific clade. Probably the best way 
to get the sub-families you are after is to walk each gene tree from the 
root to the leaves until you find a node for that clade. At that point, 
count the number of genes in each species.

Now, the big question is whether you really want to be strict about the 
1-to-1 rule. Say you are looking a mammalian species. We have different 
genomes of different quality in that group. So if you only want 1-to-1 
orthologues, you may miss a gene sub-family only because there is a gap 
in the dolphin assembly where that gene should be. Similarly, you may 
miss another gene simply because an assembly error that has created an 
artificial duplication of that gene (sometimes in an unplaced scaffold). 
So, as you consider more and more species and as the quality of these 
genomes drop, you will end up with a very short list of genes.

My advise would be to use a limited list of species and be a bit lenient 
about the one-to-one rule, unless this is a hard requirement of your 
analysis.

You also need to decide what you want to do if you have two sets of 
closely related 1-to-1, like FRY and FRYL in mammals for instance. 
Depending on your use of the 1-to-1 set, you might be happy considering 
them both or you might prefer to disregard them instead. Yet again, you 
might want to look at the actual similarity between both genes and 
decide one way or the other in each case.

I hope this helps (but I fear I might have confused you)

Javier

On 21/01/13 15:54, Moretti Sébastien wrote:
> Hi
> I wonder if, from the ensembl api, I can get only one-to-one ortholog 
> trees, and how.
>
> Usually I get trees with a taxonomic filter, and would like to get 
> only one-to-one ortholog trees from there.
>
> Thanks
>

-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK





More information about the Dev mailing list