[ensembl-dev] How to get compara trees for one-to-one orthologs only
Javier Herrero
jherrero at ebi.ac.uk
Mon Jan 21 16:12:36 GMT 2013
Hi Sébastien
This is a common question, but the answer is complex and usually differs
depending on what you want to achieve.
It seems you are interested in a specific clade. Probably the best way
to get the sub-families you are after is to walk each gene tree from the
root to the leaves until you find a node for that clade. At that point,
count the number of genes in each species.
Now, the big question is whether you really want to be strict about the
1-to-1 rule. Say you are looking a mammalian species. We have different
genomes of different quality in that group. So if you only want 1-to-1
orthologues, you may miss a gene sub-family only because there is a gap
in the dolphin assembly where that gene should be. Similarly, you may
miss another gene simply because an assembly error that has created an
artificial duplication of that gene (sometimes in an unplaced scaffold).
So, as you consider more and more species and as the quality of these
genomes drop, you will end up with a very short list of genes.
My advise would be to use a limited list of species and be a bit lenient
about the one-to-one rule, unless this is a hard requirement of your
analysis.
You also need to decide what you want to do if you have two sets of
closely related 1-to-1, like FRY and FRYL in mammals for instance.
Depending on your use of the 1-to-1 set, you might be happy considering
them both or you might prefer to disregard them instead. Yet again, you
might want to look at the actual similarity between both genes and
decide one way or the other in each case.
I hope this helps (but I fear I might have confused you)
Javier
On 21/01/13 15:54, Moretti Sébastien wrote:
> Hi
> I wonder if, from the ensembl api, I can get only one-to-one ortholog
> trees, and how.
>
> Usually I get trees with a taxonomic filter, and would like to get
> only one-to-one ortholog trees from there.
>
> Thanks
>
--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
More information about the Dev
mailing list