[ensembl-dev] How to get compara trees for one-to-one orthologs only

Javier Herrero jherrero at ebi.ac.uk
Tue Jan 22 16:02:23 GMT 2013


Hi Sébastien

Do you mean whole trees with fish sequences only or would you be happy 
with subtrees?

There are a couple of issues about looking for tree with speciation event:
(1) you may not have such gene in one of the species
(2) it might be that the topology we have inferred is inexact.

Unfortunately, telling (2) from cases where there has been a duplication 
followed by extensive gene losses is quite a challenge in the fish lineage.

Javier

On 21/01/13 16:29, Moretti Sébastien wrote:
> Hi
>
> we have hard constraints, we only want 1-to-1 ortholog trees.
> This will decrease the number of trees at the end for sure.
> We are interested by fishes, so not so large trees.
>
> I thought to get all trees with a fish root and filter for those that 
> contain only speciation nodes. But some trees with fish root will be 
> duplication of several 1-to-1 interesting sub-trees.
>
> Do you think that something smarter and faster could be done, such as 
> with an attribute of tree object ?
>
> Regards
> Sébastien
>
>> Hi Sébastien
>>
>> This is a common question, but the answer is complex and usually differs
>> depending on what you want to achieve.
>>
>> It seems you are interested in a specific clade. Probably the best way
>> to get the sub-families you are after is to walk each gene tree from the
>> root to the leaves until you find a node for that clade. At that point,
>> count the number of genes in each species.
>>
>> Now, the big question is whether you really want to be strict about the
>> 1-to-1 rule. Say you are looking a mammalian species. We have different
>> genomes of different quality in that group. So if you only want 1-to-1
>> orthologues, you may miss a gene sub-family only because there is a gap
>> in the dolphin assembly where that gene should be. Similarly, you may
>> miss another gene simply because an assembly error that has created an
>> artificial duplication of that gene (sometimes in an unplaced scaffold).
>> So, as you consider more and more species and as the quality of these
>> genomes drop, you will end up with a very short list of genes.
>>
>> My advise would be to use a limited list of species and be a bit lenient
>> about the one-to-one rule, unless this is a hard requirement of your
>> analysis.
>>
>> You also need to decide what you want to do if you have two sets of
>> closely related 1-to-1, like FRY and FRYL in mammals for instance.
>> Depending on your use of the 1-to-1 set, you might be happy considering
>> them both or you might prefer to disregard them instead. Yet again, you
>> might want to look at the actual similarity between both genes and
>> decide one way or the other in each case.
>>
>> I hope this helps (but I fear I might have confused you)
>>
>> Javier
>>
>> On 21/01/13 15:54, Moretti Sébastien wrote:
>>> Hi
>>> I wonder if, from the ensembl api, I can get only one-to-one ortholog
>>> trees, and how.
>>>
>>> Usually I get trees with a taxonomic filter, and would like to get
>>> only one-to-one ortholog trees from there.
>>>
>>> Thanks
>

-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK





More information about the Dev mailing list