[ensembl-dev] How to get compara trees for one-to-one orthologs only

Moretti Sébastien sebastien.moretti at unil.ch
Tue Jan 22 16:19:22 GMT 2013


Hi Javier,

I want subtrees with fish sequences only.
I extract all subtrees with Clupeocephala, or descent, as root, and this 
root must be tagged as speciation.
Then I keep trees that do only contain speciation nodes (D=N).

It looks to do the job.
I will check tomorrow on a larger set (my script is still running).

> Hi Sébastien
>
> Do you mean whole trees with fish sequences only or would you be happy
> with subtrees?
>
> There are a couple of issues about looking for tree with speciation event:
> (1) you may not have such gene in one of the species
> (2) it might be that the topology we have inferred is inexact.
>
> Unfortunately, telling (2) from cases where there has been a duplication
> followed by extensive gene losses is quite a challenge in the fish lineage.
>
> Javier
>
> On 21/01/13 16:29, Moretti Sébastien wrote:
>> Hi
>>
>> we have hard constraints, we only want 1-to-1 ortholog trees.
>> This will decrease the number of trees at the end for sure.
>> We are interested by fishes, so not so large trees.
>>
>> I thought to get all trees with a fish root and filter for those that
>> contain only speciation nodes. But some trees with fish root will be
>> duplication of several 1-to-1 interesting sub-trees.
>>
>> Do you think that something smarter and faster could be done, such as
>> with an attribute of tree object ?
>>
>> Regards
>> Sébastien
>>
>>> Hi Sébastien
>>>
>>> This is a common question, but the answer is complex and usually differs
>>> depending on what you want to achieve.
>>>
>>> It seems you are interested in a specific clade. Probably the best way
>>> to get the sub-families you are after is to walk each gene tree from the
>>> root to the leaves until you find a node for that clade. At that point,
>>> count the number of genes in each species.
>>>
>>> Now, the big question is whether you really want to be strict about the
>>> 1-to-1 rule. Say you are looking a mammalian species. We have different
>>> genomes of different quality in that group. So if you only want 1-to-1
>>> orthologues, you may miss a gene sub-family only because there is a gap
>>> in the dolphin assembly where that gene should be. Similarly, you may
>>> miss another gene simply because an assembly error that has created an
>>> artificial duplication of that gene (sometimes in an unplaced scaffold).
>>> So, as you consider more and more species and as the quality of these
>>> genomes drop, you will end up with a very short list of genes.
>>>
>>> My advise would be to use a limited list of species and be a bit lenient
>>> about the one-to-one rule, unless this is a hard requirement of your
>>> analysis.
>>>
>>> You also need to decide what you want to do if you have two sets of
>>> closely related 1-to-1, like FRY and FRYL in mammals for instance.
>>> Depending on your use of the 1-to-1 set, you might be happy considering
>>> them both or you might prefer to disregard them instead. Yet again, you
>>> might want to look at the actual similarity between both genes and
>>> decide one way or the other in each case.
>>>
>>> I hope this helps (but I fear I might have confused you)
>>>
>>> Javier
>>>
>>> On 21/01/13 15:54, Moretti Sébastien wrote:
>>>> Hi
>>>> I wonder if, from the ensembl api, I can get only one-to-one ortholog
>>>> trees, and how.
>>>>
>>>> Usually I get trees with a taxonomic filter, and would like to get
>>>> only one-to-one ortholog trees from there.
>>>>
>>>> Thanks

-- 
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4079
http://selectome.unil.ch/ http://bgee.unil.ch/




More information about the Dev mailing list