[ensembl-dev] MCoffee score ?

Javier Herrero jherrero at ebi.ac.uk
Fri Feb 18 09:31:51 GMT 2011


Hi Sébastien

By plain mafft, I meant we run mafft directly, not through mcoffee.

We are always working on improving the pipelines. We are currently considering 
several options to better handle large families. Hopefully we will avoid 
getting smaller families as the number of taxa increases.

Javier

On Friday 18 Feb 2011 08:47:13 Sébastien MORETTI wrote:
> Dear Javier,
> thanks for all these details.
> I will re-test that with other families using mcoffee and not mafft only.
> 
> Not sure that you cannot get scores if you run mafft as single method in
> mcoffee ?
> 
> 
> With your 200 genes threshold it means that the more species you will
> add, the more mafft only will be run.
> Do you plan to refine families at some point ?
> To cluster genes with more stringent criteria to get smaller families ?
> 
> Sébastien
> 
> > Dear Sébastien
> > 
> > Your query is correct.
> > 
> > We use MCoffee by default to generate the alignments. For large
> > alignments or when MCoffee fails to run, we use mafft instead. Mafft
> > does not generate scores.
> > 
> > You can tell which alignment has been used by looking at the
> > protein_tree_tags for the root of the tree:
> > 
> > [ensembl_compara_61]>  select * from protein_tree_tag where node_id = 688
> > and tag = "aln_method";
> > +---------+------------+-------+
> > 
> > | node_id | tag        | value |
> > 
> > +---------+------------+-------+
> > 
> > |     688 | aln_method | mafft |
> > 
> > +---------+------------+-------+
> > 
> > The three different alignment methods you will find are:
> > 
> > - cmcoffee: M-Coffee consistency alignment using mafftgins_msa,
> > muscle_msa, kalign_msa and t_coffee_msa
> > 
> > - fmcoffee: M-Coffee consistency alignment using mafft_msa, muscle_msa,
> > clustalw_msa, kalign_msa
> > 
> > - mafft: plain mafft
> > 
> > We try cmcoffee first unless the family contains more than 200 genes. In
> > that case, we go with mafft directly. If MCoffee fails twice in a row,
> > we switch to fmcoffee mode. If this fails again, we fall back on mafft.
> > 
> > I hope this helps.
> > 
> > Javier
> > 
> > On Tuesday 15 Feb 2011 14:00:07 Sébastien Moretti wrote:
> >> Hi
> >> 
> >> I thought it should be easy to retrieve MCoffee scores from the
> >> protein_tree_member_score table but it is not.
> >> 
> >> I cannot find how to link protein_tree_member and
> >> protein_tree_member_score tables.
> >> The first table is the one with gene tree family identifiers and protein
> >> identifiers.
> >> The second is the one with MCoffee score.
> >> 
> >> Similar field names do not seem to be related between both tables.
> >> 
> >> For example
> >> 
> >>       SELECT s.cigar_line
> >>       FROM protein_tree_member m, protein_tree_member_score s
> >>       WHERE m.root_id=688 AND s.member_id=m.member_id;
> >> 
> >> returns nothing.
> >> 
> >> 
> >> Have I misunderstood something ?
> >> Should I join these tables with a third one ?
> >> 
> >> 
> >> Regards
> >> 
> >>> Could be useful to get this information in the Gene Tree (alignment)
> >>> pages.
> >>> 
> >>> Will wait for a support in the API but will also try to get it from the
> >>> database directly.
> >>> 
> >>> Thanks
> >>> 
> >>> Sébastien
> >>> 
> >>>> Hi Sébastien
> >>>> 
> >>>> I am afraid there is no support in the compara API to retrieve the
> >>>> MCoffee
> >>>> scores at the moment.
> >>>> 
> >>>> We will add this to our todo list.
> >>>> 
> >>>> Javier
> >>>> 
> >>>> On Monday 24 Jan 2011 15:33:05 Sébastien Moretti wrote:
> >>>>> Hi
> >>>>> 
> >>>>> which method of the compara API should I use to retrieve MCoffee
> >>>>> scores for a gene tree family alignment ?
> >>>>> 
> >>>>> MCoffee scores seem to be stored in the protein_tree_member_score
> >>>>> MySQL table now. But I cannot find the Perl method to access it.
> >>>>> 
> >>>>> Regards

-- 
Javier Herrero, PhD
Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK




More information about the Dev mailing list