[ensembl-dev] Ensembl-Condel Scoring

Graham Ritchie grsr at ebi.ac.uk
Thu Jun 2 17:13:30 BST 2011


Hi Paige,

I have now heard back from the developer of Condel (Abel Gonzalez Perez) and he confirms that the odd prediction you found is a shortcoming of the way in which the consensus score is currently calculated for mutations with low SIFT and PolyPhen scores (please see his email below for details - forwarded with his permission). We will work with Abel to try to address this issue for future ensembl releases.

Thanks for bringing this to our attention.

Best regards,

Graham 

Ensembl variation


> Hi Graham,
> 
> I checked and you're both absolutely right. This case would be a condel false positive. The reason is in the way we compute the denominator of the weighted average score. If the scores of the methods are low, the probability of the SNP being a false negative of each method is very low, and so is their summation (the denominator of the weighted average score). As a result, the condel score of low-scored SNPs are artificially high. This effect is mitigated when the scores of more methods are taken into account, because their scores tend to be contradictory, and therefore, the result is more balanced. The bottom line is we need to rethink the way we obtain the denominator of the weighted average score for only two methods.
> 
> One possibility to solve the problem is the following. Use a simple counter as denominator in this case, instead of the probabilities that are used to weight the scores. This way, the weighted summation would be divided simply by the number of methods that contribute to the integrated score, that is 1, or 2.
> 
> I have just implemented a new version of condel_SP that computes the denominator in this way, and tested it on HumVar. The area under the ROC curve is very similar to that produced by the original condel_SP, but it produces more accurate results for SNPs with low SIFT and Polyphen scores. I attach the modified was_SP.pl script.
> 
> So, to summarize, we have discovered that our calculation of the weighted average score designed to be used with five methods, when tested only with SIFT and Polyphen tends to overestimate the deleteriousness of SNPs with low SIFT and Polyphen scores, because it produces erroneously small denominators. To solve this we propose this new version of the weighted average score that corrects this problem for low-scored SNPs, possibly at the expense of misclassifying SNPs whose scores are close to the cutoffs of the individual methods.
> 
> Best regards,
> 
> Abel
> 
> -- 
> Abel González Pérez, PhD
> Bioinformatician
> http://abeldavidgp.synthasite.com/
> http://publicationslist.org/abeldavidgp
> Research Unit on Biomedical Informatics - GRIB
> http://bg.imim.es/
> Parc de Recerca Biomèdica de Barcelona (PRBB)
> c/ Dr. Aiguader, 88
> E-08003 Barcelona

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110602/5fd9a2ae/attachment.html>


More information about the Dev mailing list