[ensembl-dev] How to get compara trees for one-to-one orthologs only

Moretti Sébastien sebastien.moretti at unil.ch
Mon Jan 21 16:29:22 GMT 2013


Hi

we have hard constraints, we only want 1-to-1 ortholog trees.
This will decrease the number of trees at the end for sure.
We are interested by fishes, so not so large trees.

I thought to get all trees with a fish root and filter for those that 
contain only speciation nodes. But some trees with fish root will be 
duplication of several 1-to-1 interesting sub-trees.

Do you think that something smarter and faster could be done, such as 
with an attribute of tree object ?

Regards
Sébastien

> Hi Sébastien
>
> This is a common question, but the answer is complex and usually differs
> depending on what you want to achieve.
>
> It seems you are interested in a specific clade. Probably the best way
> to get the sub-families you are after is to walk each gene tree from the
> root to the leaves until you find a node for that clade. At that point,
> count the number of genes in each species.
>
> Now, the big question is whether you really want to be strict about the
> 1-to-1 rule. Say you are looking a mammalian species. We have different
> genomes of different quality in that group. So if you only want 1-to-1
> orthologues, you may miss a gene sub-family only because there is a gap
> in the dolphin assembly where that gene should be. Similarly, you may
> miss another gene simply because an assembly error that has created an
> artificial duplication of that gene (sometimes in an unplaced scaffold).
> So, as you consider more and more species and as the quality of these
> genomes drop, you will end up with a very short list of genes.
>
> My advise would be to use a limited list of species and be a bit lenient
> about the one-to-one rule, unless this is a hard requirement of your
> analysis.
>
> You also need to decide what you want to do if you have two sets of
> closely related 1-to-1, like FRY and FRYL in mammals for instance.
> Depending on your use of the 1-to-1 set, you might be happy considering
> them both or you might prefer to disregard them instead. Yet again, you
> might want to look at the actual similarity between both genes and
> decide one way or the other in each case.
>
> I hope this helps (but I fear I might have confused you)
>
> Javier
>
> On 21/01/13 15:54, Moretti Sébastien wrote:
>> Hi
>> I wonder if, from the ensembl api, I can get only one-to-one ortholog
>> trees, and how.
>>
>> Usually I get trees with a taxonomic filter, and would like to get
>> only one-to-one ortholog trees from there.
>>
>> Thanks

-- 
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4079
http://selectome.unil.ch/ http://bgee.unil.ch/




More information about the Dev mailing list