[ensembl-dev] Fwd: Discrepancies between VEP and Condel, PPH2 and SIFT

Thu Feb 2 13:00:15 GMT 2012

Sorry, forgot to CC the list in on my reply.

Begin forwarded message:

> From: Graham Ritchie <grsr at ebi.ac.uk>
> Date: 2 February 2012 12:55:08 GMT
> To: A. P. Levine <a.levine at ucl.ac.uk>
> Subject: Re: [ensembl-dev] Discrepancies between VEP and Condel, PPH2 and SIFT
> 
> Hi Adam,
> 
> We are aware that there are frequently differences between the predictions produced by ensembl and the web versions of these various tools. You can find out how we run each tool on the variation documentation page here:
> 
> http://www.ensembl.org/info/docs/variation/index.html#nsSNP
> 
> Here is a copy paste from an earlier message to this list explaining some of the reasons why these differences may occur:
> 
> When pre-computing the SIFT and PolyPhen scores we download the software and run the tools locally using our own copies of the UniProt, Pfam, PDB, DSSP (etc.) databases which we update when we start the pipeline, so it is unlikely we will run the tools using exactly the same source data as the web applications (though we follow the author's instructions as closely as possible to try to ensure that results are reproducible). We also always supply the ensembl translation for each transcript as the reference protein, we don't try to find the closest UniProt protein or anything, and the web versions may sometimes use a different reference transcript. The web applications may also use newer versions of the software than we ran.
> 
> All of these factors mean we are unlikely to produce exactly the same scores as the web applications, but if you are finding significant differences then this is something we should look into, in particular if you find that you have lots of cases where the qualitative predictions ('benign', 'tolerated', 'probably damaging' etc.) differ between ensembl and the web applications then we will certainly investigate.
> 
> Hope that makes things clearer.
> 
> Cheers,
> 
> Graham
> 
> Ensembl Variation
> 
> 
> On 2 Feb 2012, at 12:14, A. P. Levine wrote:
> 
>> I have noticed some discrepancies between the PolyPhen, SIFT and Condel scores as reported by the VEP compared with the scores reported by the three programs independently.
>> 
>> The variant I have been looking at is "chr7: 140501302 T/C" (build 37).
>> 
>> Using the VEP (either online or with the perl script):
>> PolyPhen    benign    0.382
>> Condel    neutral    0.418
>> SIFT    tolerated    0.09
>> 
>> Using the Condel server (http://bg.upf.edu/condel/analysis):
>> PPH2    0.6
>> Condel    0.673    deleterious
>> SIFT    0.09
>> 
>> Using SIFT (http://sift.jcvi.org/www/SIFT_chr_coords_submit.html):
>> SNP Type    Nonsynonymous
>> Prediction    DAMAGING
>> SIFT Score    0.04
>> Median Information Content    2.93
>> Gene ID
>> 
>> And finally, using PolyPhen-2 with both HumDiv and HumVar classifier models (http://genetics.bwh.harvard.edu/pph2/bgi.shtml):
>> HumVar    possibly damaging    pph2_prob    0.791
>> HumDiv    possibly damaging    pph2_prob    0.942
>> 
>> What do you think might be happening here? Which versions of the various programs are being used by the VEP?
>> 
>> Thank you,
>> 
>> Adam
>> 
>> Adam P. Levine
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120202/a0a5a407/attachment.html>