[ensembl-dev] Erroneous duplications in gene trees

Matthieu Muffato muffato at ebi.ac.uk
Tue Mar 10 16:00:17 GMT 2015


Dear Julien

We only flag as "dubious" the duplications with a score of 0 because it 
has a very distinctive pattern: a duplication event must be called 
because of the taxon annotation of the subtrees, but the intersection of 
both trees' species is empty. Duplications with a non-0 score have a 
least 1 species shared by both subtrees.

I nevertheless agree with you that the threshold is arbitrary and that 
scores close to 0 are also dubious. I would recommend a threshold of 
25-30%: it gave sensible results in our tests.

Yep, the cave fish gene is misplaced, and it should also be merged with 
the other gene at the same locus: 
http://www.ensembl.org/Astyanax_mexicanus/Location/View?db=core;r=KB882106.1:4252611-4268612
We flag some gene-split events but these are not yet used to improve the 
gene annotation

Regards,
Matthieu

On 10/03/15 13:10, Julien Roux wrote:
> Dear Ensembl team,
> I wanted to report a strange behavior/feature of the Compara Gene Trees.
> I often see in gene trees that some genes do not branch where they are
> supposed to, which is inevitable I guess. However in some cases, this
> leads to inference of false duplications. I find it very useful that
> Ensembl provides a confidence score for duplications, and labels
> "dubious" duplications, those with a score of 0.
> But I find surprising that a duplication with a score of 1% is labeled
> as "real" duplication. I see this happen quite often, and it seems to me
> that the threshold used to call a dubious duplication should be increased.
>
> See for example the ENSAMXG00000008930 gene, which clusters outside of
> the fish clade, leading to a false duplication at the basis of the
> vertebrate lineage (besides, this gene seems to be only a fragment of
> the gene model, which should be labeled as a gene_split event):
> http://www.ensembl.org/Astyanax_mexicanus/Gene/Compara_Tree?db=core;g=ENSAMXG00000008930;r=KB882106.1:4258864-4262355;t=ENSAMXT00000009178;collapse=2625743,2625667,2625666,2625573
>
> Maybe there is some justification for this choice, please let me know
> what you think.
> Best regards
> Julien

-- 
Matthieu Muffato, Ph.D.
Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468




More information about the Dev mailing list