[ensembl-dev] ClinVar clinical significance in VEP

Will McLaren will.mclaren at globalgenecorp.com
Fri Jun 21 11:07:28 BST 2019


Hi list,

VEP reports clinical significance states for known variants in the CLIN_SIG
field. There is potential for these to be mis-assigned to input variants
due to the way the data are mapped and stored in Ensembl's database. This
is best illustrated by example:

rs2228671 (http://www.ensembl.org/Homo_sapiens/Variation/Explore?v=rs2228671
<http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=19:11099736-11100736;v=rs2228671;vdb=variation;vf=142322991>)
is
a SNV listed with four alleles (C/A/G/T). It has five ClinVar annotations
mapped to it (
http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?v=rs2228671
<http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=19:11099736-11100736;v=rs2228671;vdb=variation;vf=142322991>),
with varying significance states. If your input to VEP matches any of the
three ALT alleles (A, G, or T) at this position, then the returned
CLIN_SIG field is a list of all of those states, since the ClinVar entries
are assigned at the variant level (rsID), rather than more precisely at the
allele level. Post-filtering your VEP results for pathogenic variants will
then match, regardless of whether your input ALT was a pathogenic (T) or
benign (A) allele at this position.

A good solution is to use VEP's custom annotation function along with the
VCF files made available by ClinVar (
https://www.ncbi.nlm.nih.gov/variation/docs/ClinVar_vcf_files/), something
like:

vep [options] -custom clinvar_20190609.vcf.gz,clinvar,vcf,exact,,CLNSIG

which will give correct allele-specific clinical significance states in the
clinvar_CLNSIG field of the VEP output.

I haven't reported this as a bug for VEP as really it's a bug in the way
Ensembl stores the data.

Cheers

Will

## commands to reproduce:

# the A allele is pathogenic
$ vep -id "19 11100236 test1 C A" -cache -o stdout -no_head -pick -tab
-check_ex -fields CLIN_SIG,clinvar_CLNSIG -custom
clinvar_20190609.vcf.gz,clinvar,vcf,exact,,CLNSIG
benign,pathogenic       Pathogenic

# the T allele is benign
$ vep -id "19 11100236 test1 C T" -cache -o stdout -no_head -pick -tab
-check_ex -fields CLIN_SIG,clinvar_CLNSIG -custom
clinvar_20190609.vcf.gz,clinvar,vcf,exact,,CLNSIG
benign,pathogenic       Benign/Likely_benign

-- 
-- 
*William McLaren*
Senior Bioinformatics Scientist
Global Gene Corp
will.mclaren at globalgenecorp.com
www.globalgenecorp.com
The BIC, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR

-- 
******************************************************************** This 
e-mail and any attachment hereto, is intended only for use by the 
addressee(s) named above and may contain legally privileged and/or 
confidential information. If you are not the intended recipient of this 
e-mail, any dissemination, distribution or copying of this email, or any 
attachment hereto, is strictly prohibited. If you receive this email in 
error please immediately notify me by return electronic mail and 
permanently delete this email and any attachment hereto, any copy of this 
e-mail and of any such attachment, and any printout thereof. Finally, 
please note that only authorized representatives of Global Gene Corporation 
PTE ltd have the power and authority to enter into business dealings with 
any third party. 

********************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190621/3a9334ce/attachment.html>


More information about the Dev mailing list