[ensembl-dev] MCoffee score ?

Sébastien MORETTI sebastien.moretti at unil.ch
Fri Feb 18 08:47:13 GMT 2011


Dear Javier,
thanks for all these details.
I will re-test that with other families using mcoffee and not mafft only.

Not sure that you cannot get scores if you run mafft as single method in 
mcoffee ?


With your 200 genes threshold it means that the more species you will 
add, the more mafft only will be run.
Do you plan to refine families at some point ?
To cluster genes with more stringent criteria to get smaller families ?

Sébastien

> Dear Sébastien
>
> Your query is correct.
>
> We use MCoffee by default to generate the alignments. For large alignments or
> when MCoffee fails to run, we use mafft instead. Mafft does not generate scores.
>
> You can tell which alignment has been used by looking at the protein_tree_tags
> for the root of the tree:
>
> [ensembl_compara_61]>  select * from protein_tree_tag where node_id = 688 and
> tag = "aln_method";
> +---------+------------+-------+
> | node_id | tag        | value |
> +---------+------------+-------+
> |     688 | aln_method | mafft |
> +---------+------------+-------+
>
> The three different alignment methods you will find are:
>
> - cmcoffee: M-Coffee consistency alignment using mafftgins_msa, muscle_msa,
> kalign_msa and t_coffee_msa
>
> - fmcoffee: M-Coffee consistency alignment using mafft_msa, muscle_msa,
> clustalw_msa, kalign_msa
>
> - mafft: plain mafft
>
> We try cmcoffee first unless the family contains more than 200 genes. In that
> case, we go with mafft directly. If MCoffee fails twice in a row, we switch to
> fmcoffee mode. If this fails again, we fall back on mafft.
>
> I hope this helps.
>
> Javier
>
> On Tuesday 15 Feb 2011 14:00:07 Sébastien Moretti wrote:
>> Hi
>>
>> I thought it should be easy to retrieve MCoffee scores from the
>> protein_tree_member_score table but it is not.
>>
>> I cannot find how to link protein_tree_member and
>> protein_tree_member_score tables.
>> The first table is the one with gene tree family identifiers and protein
>> identifiers.
>> The second is the one with MCoffee score.
>>
>> Similar field names do not seem to be related between both tables.
>>
>> For example
>>       SELECT s.cigar_line
>>       FROM protein_tree_member m, protein_tree_member_score s
>>       WHERE m.root_id=688 AND s.member_id=m.member_id;
>> returns nothing.
>>
>>
>> Have I misunderstood something ?
>> Should I join these tables with a third one ?
>>
>>
>> Regards
>>
>>> Could be useful to get this information in the Gene Tree (alignment)
>>> pages.
>>>
>>> Will wait for a support in the API but will also try to get it from the
>>> database directly.
>>>
>>> Thanks
>>>
>>> Sébastien
>>>
>>>> Hi Sébastien
>>>>
>>>> I am afraid there is no support in the compara API to retrieve the
>>>> MCoffee
>>>> scores at the moment.
>>>>
>>>> We will add this to our todo list.
>>>>
>>>> Javier
>>>>
>>>> On Monday 24 Jan 2011 15:33:05 Sébastien Moretti wrote:
>>>>> Hi
>>>>>
>>>>> which method of the compara API should I use to retrieve MCoffee scores
>>>>> for a gene tree family alignment ?
>>>>>
>>>>> MCoffee scores seem to be stored in the protein_tree_member_score MySQL
>>>>> table now. But I cannot find the Perl method to access it.
>>>>>
>>>>> Regards

-- 
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4079
http://bioinfo.unil.ch/




More information about the Dev mailing list