[ensembl-dev] Erroneous duplications in gene trees

Julien Roux julien.roux at unil.ch
Tue Mar 10 13:10:56 GMT 2015


Dear Ensembl team,
I wanted to report a strange behavior/feature of the Compara Gene Trees.
I often see in gene trees that some genes do not branch where they are 
supposed to, which is inevitable I guess. However in some cases, this 
leads to inference of false duplications. I find it very useful that 
Ensembl provides a confidence score for duplications, and labels 
"dubious" duplications, those with a score of 0.
But I find surprising that a duplication with a score of 1% is labeled 
as "real" duplication. I see this happen quite often, and it seems to me 
that the threshold used to call a dubious duplication should be increased.

See for example the ENSAMXG00000008930 gene, which clusters outside of 
the fish clade, leading to a false duplication at the basis of the 
vertebrate lineage (besides, this gene seems to be only a fragment of 
the gene model, which should be labeled as a gene_split event):
http://www.ensembl.org/Astyanax_mexicanus/Gene/Compara_Tree?db=core;g=ENSAMXG00000008930;r=KB882106.1:4258864-4262355;t=ENSAMXT00000009178;collapse=2625743,2625667,2625666,2625573

Maybe there is some justification for this choice, please let me know 
what you think.
Best regards
Julien

-- 
Julien Roux
Marie-Curie postdoctoral fellow
Department of Ecology and Evolution, University of Lausanne, Switzerland
http://www.unil.ch/dee/home/menuinst/people/post-docs--associates/dr-julien-roux.html
Tel: +41 78 700 2931

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150310/121441c6/attachment.html>


More information about the Dev mailing list