[ensembl-dev] Erroneous duplications in gene trees

Matthieu Muffato muffato at ebi.ac.uk
Tue Mar 10 16:31:36 GMT 2015


I'm now happy, thank you :)

On 10/03/15 16:09, Michael Paulini wrote:
> Hi Matthieu,
>
> you will be happy to know, that we used the gene-split event annotation
> provided by compara gene-trees provided by ensembl and ensembl-genomes
> on C.elegans, B.malayi and O.volvulus to improve our gene models.
>
> Michael
>
> On 10/03/15 16:00, Matthieu Muffato wrote:
>> Dear Julien
>>
>> We only flag as "dubious" the duplications with a score of 0 because
>> it has a very distinctive pattern: a duplication event must be called
>> because of the taxon annotation of the subtrees, but the intersection
>> of both trees' species is empty. Duplications with a non-0 score have
>> a least 1 species shared by both subtrees.
>>
>> I nevertheless agree with you that the threshold is arbitrary and that
>> scores close to 0 are also dubious. I would recommend a threshold of
>> 25-30%: it gave sensible results in our tests.
>>
>> Yep, the cave fish gene is misplaced, and it should also be merged
>> with the other gene at the same locus:
>> http://www.ensembl.org/Astyanax_mexicanus/Location/View?db=core;r=KB882106.1:4252611-4268612
>>
>> We flag some gene-split events but these are not yet used to improve
>> the gene annotation
>>
>> Regards,
>> Matthieu
>>
>> On 10/03/15 13:10, Julien Roux wrote:
>>> Dear Ensembl team,
>>> I wanted to report a strange behavior/feature of the Compara Gene Trees.
>>> I often see in gene trees that some genes do not branch where they are
>>> supposed to, which is inevitable I guess. However in some cases, this
>>> leads to inference of false duplications. I find it very useful that
>>> Ensembl provides a confidence score for duplications, and labels
>>> "dubious" duplications, those with a score of 0.
>>> But I find surprising that a duplication with a score of 1% is labeled
>>> as "real" duplication. I see this happen quite often, and it seems to me
>>> that the threshold used to call a dubious duplication should be
>>> increased.
>>>
>>> See for example the ENSAMXG00000008930 gene, which clusters outside of
>>> the fish clade, leading to a false duplication at the basis of the
>>> vertebrate lineage (besides, this gene seems to be only a fragment of
>>> the gene model, which should be labeled as a gene_split event):
>>> http://www.ensembl.org/Astyanax_mexicanus/Gene/Compara_Tree?db=core;g=ENSAMXG00000008930;r=KB882106.1:4258864-4262355;t=ENSAMXT00000009178;collapse=2625743,2625667,2625666,2625573
>>>
>>>
>>> Maybe there is some justification for this choice, please let me know
>>> what you think.
>>> Best regards
>>> Julien
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Matthieu Muffato, Ph.D.
Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468




More information about the Dev mailing list