[ensembl-dev] Erroneous duplications in gene trees
Michael Paulini
mh6 at sanger.ac.uk
Tue Mar 10 16:09:27 GMT 2015
Hi Matthieu,
you will be happy to know, that we used the gene-split event annotation
provided by compara gene-trees provided by ensembl and ensembl-genomes
on C.elegans, B.malayi and O.volvulus to improve our gene models.
Michael
On 10/03/15 16:00, Matthieu Muffato wrote:
> Dear Julien
>
> We only flag as "dubious" the duplications with a score of 0 because
> it has a very distinctive pattern: a duplication event must be called
> because of the taxon annotation of the subtrees, but the intersection
> of both trees' species is empty. Duplications with a non-0 score have
> a least 1 species shared by both subtrees.
>
> I nevertheless agree with you that the threshold is arbitrary and that
> scores close to 0 are also dubious. I would recommend a threshold of
> 25-30%: it gave sensible results in our tests.
>
> Yep, the cave fish gene is misplaced, and it should also be merged
> with the other gene at the same locus:
> http://www.ensembl.org/Astyanax_mexicanus/Location/View?db=core;r=KB882106.1:4252611-4268612
> We flag some gene-split events but these are not yet used to improve
> the gene annotation
>
> Regards,
> Matthieu
>
> On 10/03/15 13:10, Julien Roux wrote:
>> Dear Ensembl team,
>> I wanted to report a strange behavior/feature of the Compara Gene Trees.
>> I often see in gene trees that some genes do not branch where they are
>> supposed to, which is inevitable I guess. However in some cases, this
>> leads to inference of false duplications. I find it very useful that
>> Ensembl provides a confidence score for duplications, and labels
>> "dubious" duplications, those with a score of 0.
>> But I find surprising that a duplication with a score of 1% is labeled
>> as "real" duplication. I see this happen quite often, and it seems to me
>> that the threshold used to call a dubious duplication should be
>> increased.
>>
>> See for example the ENSAMXG00000008930 gene, which clusters outside of
>> the fish clade, leading to a false duplication at the basis of the
>> vertebrate lineage (besides, this gene seems to be only a fragment of
>> the gene model, which should be labeled as a gene_split event):
>> http://www.ensembl.org/Astyanax_mexicanus/Gene/Compara_Tree?db=core;g=ENSAMXG00000008930;r=KB882106.1:4258864-4262355;t=ENSAMXT00000009178;collapse=2625743,2625667,2625666,2625573
>>
>>
>> Maybe there is some justification for this choice, please let me know
>> what you think.
>> Best regards
>> Julien
>
More information about the Dev
mailing list