[ensembl-dev] PolyPhen2 prediction discrepancies between VEP 88 and dbNSFP v3.0

William McLaren wm2 at ebi.ac.uk
Wed Jun 28 15:14:50 BST 2017


Hi Brad,

Did you download the database as listed in the plugin docs or did you generate it yourself?

The database listed in the plugin docs was generated from release 85 of Ensembl (GRCh38), which used PolyPhen 2.2.2 (release 405, see http://jul2016.archive.ensembl.org/info/genome/variation/predicted_data.html); it has not been updated since then. We can look into generating this database each Ensembl release, for example, if there is enough demand.

The date of the database’s generation is unlikely to have any effect on your analysis as the database is indexed on a digest of the reference protein sequence, so if the protein sequence VEP is analysing exists in the database you will get scores for it. When new proteins are added to the Ensembl gene set, they are inserted into the database additively, so the scores for existing sequences will not change.

Hope that’s clearer

Will McLaren
Ensembl Variation


On 28 June 2017 at 14:13:51, Crone, Bradley (bradley-crone at uiowa.edu) wrote:

Hello,



I'm resending this question, since I'm not seeing an answer to this in the dev-list.


In regards to PolyPhen2 discrepancies between VEP 88 and dbNSFP v3.0, I do have more questions about this.
What version of PolyPhen is VEP 88 utilizing for GRCh37? I thought I saw somewhere PolyPhen 2.2.2 was used, which should match dbNSFP v3.0.
Additionally, I am running VEP with the PolyPhen-SIFT plugin. What effect would this have on discrepancies?

Thank you,
Brad

From: Crone, Bradley
Sent: Wednesday, June 14, 2017 3:29:29 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] PolyPhen2 prediction discrepancies between VEP 88 and dbNSFP v3.0
 
After thinking about this more, I do have more questions about these discrepancies.

What version of PolyPhen is VEP 88 utilizing for GRCh37? I thought I saw somewhere PolyPhen 2.2.2 was used, which should match dbNSFP v3.0.

Additionally, I am running VEP with the PolyPhen-SIFT plugin. What effect would this have on discrepancies?



Brad

From: Dev <dev-bounces at ensembl.org> on behalf of Crone, Bradley <bradley-crone at uiowa.edu>
Sent: Tuesday, June 13, 2017 8:08:01 AM
To: Ensembl developers list
Subject: Re: [ensembl-dev] PolyPhen2 prediction discrepancies between VEP 88 and dbNSFP v3.0
 
Yes, I did inadvertently include those two examples in my list - predictions do not match (P vs. D), but scores do match across all three.



Thanks for the information, I'll look for the update to GRCh37 in July.



Brad

From: Dev <dev-bounces at ensembl.org> on behalf of Sarah Hunt <seh at ebi.ac.uk>
Sent: Tuesday, June 13, 2017 6:21:26 AM
To: Ensembl developers list
Subject: Re: [ensembl-dev] PolyPhen2 prediction discrepancies between VEP 88 and dbNSFP v3.0
 


Hi Brad,



We do find differences between different PolyPhen analyses, dependent on code version and protein databases used. Our GRCh37 database will be updated to the latest PolyPhen version in July, so do expect some changes. There are a number of genes returning unknown classifications in our GRCh37 databases, which have calls in our GRCh38 databases which have already been updated to the newer version, so we hope the update improves GRCh37 coverage. An example from your list:



http://grch37.ensembl.org/Homo_sapiens/Variation/Mappings?db=core;r=1:103427257-103428257;v=rs754273408;vdb=variation;vf=119985449
http://www.ensembl.org/Homo_sapiens/Variation/Mappings?db=core;r=1:102961701-102962701;v=rs754273408;vdb=variation;vf=119958041



Thanks for the examples, but I find them a little confusing. Don't 2 of them (10-73377145  & 1-103462662) show agreement across all three versions?



Best wishes,



Sarah



On 12/06/2017 17:05, Crone, Bradley wrote:
Hello,



I'm working with PolyPhen2 scores and predictions in VEP 88 and comparing these back to scores reported in dbNSFP v3.0.

I find a large number of discrepancies between HumDiv scores from VEP and dbNSFP. I've looked at a small subset of 10 mismatches.

Directly comparing scores/predictions with PolyPhen2's website, all PolyPhen2 scores match with a dbNSFP score, and not a VEP score:



CHROM    POS    REF    ALT    GENE    VEP-FEATURE    VEP-IMPACT    VEP_CSQ    VEP_POLYPHEN_SCORE    VEP_POLYPHEN_PRED    DBNSFP_POLYPHEN2_HDIV_SCORE    DBNSFP_POLYPHEN2_HDIV_PRED    pph2_prob      pph2_FPR      pph2_TPR
2    73679572    C    A    ALMS1    NM_015120.4    MODERATE    missense_variant    0.952    P    0.987,0.972,0.026    D,D,B    0.026         0.188         0.949
1    216011417    A    T    USH2A    NM_206933.2    MODERATE    missense_variant    0.155    B    0.933    P    0.933        0.0573         0.804
12    48398104    T    C    COL2A1    NM_001844.4    HIGH    start_lost    0    U    0.219,0.14    B    0.14         0.136         0.923
1    103427757    C    G    COL11A1    NM_080629.2    MODERATE    missense_variant    0    U    0.999    D    0.999       0.00574         0.136
12    48377197    G    T    COL2A1    NM_001844.4    MODERATE    missense_variant    0.784    P    0.001,0.0    B    0             1             1
6    70981396    C    A    COL9A1    NM_001851.4    MODERATE    missense_variant    0.555    P    0.31,0.0    B    0.31         0.112         0.904
10    73377145    G    A    CDH23    NM_001171930.1    MODERATE    missense_variant    1    P    1.0,1.0,0.998    D    1       0.00026       0.00018
1    216496954    T    C    USH2A    NM_206933.2    MODERATE    missense_variant    1    P    0.971,0.413    D,B    0.413         0.103         0.893
17    18064707    C    A    MYO15A    NM_016239.3    MODERATE    missense_variant    0    U    0.941,0.761,0.165,0.523,0.981,0.953    P,P,B,P,D,P    0.523        0.0959         0.882
1    103462662    C    T    COL11A1    NM_001190709.1    MODERATE    missense_variant    1    P    1.0    D    1       0.00026       0.00018

Any idea for this discrepancy?



Thanks,

Brad



Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521 and is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and delete or destroy all copies of the original message and attachments thereto. Email sent to or from UI Health Care may be retained as required by law or regulation. Thank you.


_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

_______________________________________________  
Dev mailing list Dev at ensembl.org  
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev  
Ensembl Blog: http://www.ensembl.info/  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170628/da7a20e0/attachment.html>


More information about the Dev mailing list