[ensembl-dev] Change in SIFT and PolyPhen score

Wallace Ko myko at l3-bioinfo.com
Thu Jan 24 10:21:55 GMT 2019


Hi Sarah,

I am not trying to put any pressure but just wonder if you guys are
checking the cause of inconsistency.

And one more observation is that the REST API is also returning score that
is same as that from VEP cache result for the aforementioned variant:

curl -sH 'Accept: application/json' '
http://grch37.rest.ensembl.org/vep/human/hgvs/NM_152486.2:c.83G%3EA?refseq=1&hgvs=1'
| python -m json.tool | grep polyphen_score

I would guess that VEP cache is being used by the REST server for result
lookup.

Thanks,
Wallace


On Tue, Jan 15, 2019 at 8:13 PM Wallace Ko <myko at l3-bioinfo.com> wrote:

> Hi Sarah,
>
> I have further tested different versions of VEP (GRCh37, RefSeq) and found
> something interesting for the variant NM_152486.2:c.83G>A (chr1_865545_G/A).
> The table below list PolyPhen2 scores for the variant from 3 versions of
> VEP, one from cache and the other from Ensembl database:
>
> Version Cache Database
> 92 - 0.778
> 94 - 0.778
> 95 0.913 0.778
>
> First, why the scores from cache and database are different?
>
> Assuming the scores from cache are correct, the score is changed from none
> in v94 to 0.913 in v95. However, according to the release history,
> <https://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#history> RefSeq
> transcript was updated in v94. Does this update affect GRCh37 cache? If so,
> why score is changed in v95 instead of v94? Otherwise, what causes the
> change (e.g. you have rerun the PolyPhen pipeline again) ?
> If the change in score from cache is expected, it would be great to have a
> way (e.g. a release log message) for us to know that there is chance for
> the scores to change in a new release.
>
> Thanks
>
> Regards,
> Wallace
>
>
> On Fri, Jan 4, 2019 at 7:37 PM Sarah Hunt <seh at ebi.ac.uk> wrote:
>
>>
>> Hi Wallace,
>>
>> Both NCBI and Ensembl stopped creating new gene annotations on GRCh37 a
>> long time ago - late 2013 and early 2014 respectively.
>>
>> NCBI released a mapping of the then current set of GRCh38-derived RefSeq
>> transcripts to GRCh37 in January 2017. (See:
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GRCh37.p13_interim_annotation/README
>> for details).  We added these to our GRCh37 database in Ensembl 90, in the
>> summer of 2017, with SIFT and PolyPhen2 results.
>>
>> We are not expecting any new transcript data for GRCh37, so there is
>> nothing new to analyse. If we switch to new versions of SIFT or Polyphen2
>> or find ways to reduce the failure rate, we may update the GRCh37 results.
>> We don't currently plan a version change and it would be announced if we
>> did this as results would change.
>>
>> Best wishes,
>>
>> Sarah
>> On 04/01/2019 02:16, Wallace Ko wrote:
>>
>> Hi Sarah,
>>
>> Thanks for your explanation.
>> Does it mean that SIFT and PolyPhen2 scores for RefSeq transcript for
>> GRCh37 are now also calculated (instead of mapping from Ensembl transcript)
>> but at a longer update interval?
>>
>> Best,
>> Wallace
>>
>>
>> On Thu, Jan 3, 2019 at 1:09 AM Sarah Hunt <seh at ebi.ac.uk> wrote:
>>
>>>
>>> Hi Wallace,
>>>
>>> Apologies for the partial response.
>>>
>>> Prior to release 90, we did not calculate SIFT and PolyPhen2 scores for
>>> RefSeq transcripts, though they were available for those with translations
>>> identical to an Ensembl transcript. Although the GRCh37 transcript sets
>>> have been frozen for some time, we update our GRCh37 variation data roughly
>>> annually, and results from the analysis of new Ensembl GRCh38 translations
>>> would be made available then.
>>>
>>> The difference in SIFT results - the change from no data to a prediction
>>> -  will be due to results for a transcript matching NM_000540.2 becoming
>>> available in release 88.
>>>
>>> We re-ran our PolyPhen pipeline for e!90 when we noticed slightly lower
>>> missingness rates after a change of hardware. We suspected changes in the
>>> cluster set up led to less high-memory alignment failures, but were not in
>>> a position to test this.
>>>
>>> We now routinely calculate SIFT and PolyPhen2 scores for RefSeq
>>> transcripts for our GRCh38 releases, which happen roughly quarterly. Due to
>>> the scheduling of when RefSeq transcript data becomes available in our
>>> release process, these scores are available in VEP the release after the
>>> transcripts are available.
>>>
>>> Best wishes,
>>>
>>> Sarah
>>>
>>> On 28/12/2018 11:12, Wallace Ko wrote:
>>>
>>> Hello,
>>>
>>> For the variant chr19:g.39075695C>T, the SIFT and PolyPhen score are
>>> changed in recent versions of VEP.
>>>
>>> VEP HGVSc SIFT PolyPhen
>>> 87 NM_000540.2:c.14759C>T - unknown(0)
>>> 88 NM_000540.2:c.14759C>T deleterious(0) unknown(0)
>>> 89 NM_000540.2:c.14759C>T deleterious(0) unknown(0)
>>> 90 NM_000540.2:c.14759C>T deleterious(0) probably_damaging(0.998)
>>>
>>> The above table shows that for NM_000540.2:c.14759C>T, SIFT is changed
>>> from  '-' to 'deleterious(0)' since VEP 88 and PolyPhen is changed from
>>> 'unknown(0)' to 'probably_damaging(0.998)'  since VEP 90.
>>>
>>> There is no change in software version of SIFT (sift5.2.2) and PolyPhen
>>> (2.2.2) in these versions of VEP, so I wonder what causes to the change in
>>> the prediction scores.
>>>
>>> Thank you.
>>>
>>> Regards,
>>> Wallace Ko
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190124/e69aa269/attachment.html>


More information about the Dev mailing list