[ensembl-dev] ClinVar clinical significance in VEP
Will McLaren
will.mclaren at globalgenecorp.com
Fri Jun 21 11:07:28 BST 2019
Hi list,
VEP reports clinical significance states for known variants in the CLIN_SIG
field. There is potential for these to be mis-assigned to input variants
due to the way the data are mapped and stored in Ensembl's database. This
is best illustrated by example:
rs2228671 (http://www.ensembl.org/Homo_sapiens/Variation/Explore?v=rs2228671
<http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=19:11099736-11100736;v=rs2228671;vdb=variation;vf=142322991>)
is
a SNV listed with four alleles (C/A/G/T). It has five ClinVar annotations
mapped to it (
http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?v=rs2228671
<http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=19:11099736-11100736;v=rs2228671;vdb=variation;vf=142322991>),
with varying significance states. If your input to VEP matches any of the
three ALT alleles (A, G, or T) at this position, then the returned
CLIN_SIG field is a list of all of those states, since the ClinVar entries
are assigned at the variant level (rsID), rather than more precisely at the
allele level. Post-filtering your VEP results for pathogenic variants will
then match, regardless of whether your input ALT was a pathogenic (T) or
benign (A) allele at this position.
A good solution is to use VEP's custom annotation function along with the
VCF files made available by ClinVar (
https://www.ncbi.nlm.nih.gov/variation/docs/ClinVar_vcf_files/), something
like:
vep [options] -custom clinvar_20190609.vcf.gz,clinvar,vcf,exact,,CLNSIG
which will give correct allele-specific clinical significance states in the
clinvar_CLNSIG field of the VEP output.
I haven't reported this as a bug for VEP as really it's a bug in the way
Ensembl stores the data.
Cheers
Will
## commands to reproduce:
# the A allele is pathogenic
$ vep -id "19 11100236 test1 C A" -cache -o stdout -no_head -pick -tab
-check_ex -fields CLIN_SIG,clinvar_CLNSIG -custom
clinvar_20190609.vcf.gz,clinvar,vcf,exact,,CLNSIG
benign,pathogenic Pathogenic
# the T allele is benign
$ vep -id "19 11100236 test1 C T" -cache -o stdout -no_head -pick -tab
-check_ex -fields CLIN_SIG,clinvar_CLNSIG -custom
clinvar_20190609.vcf.gz,clinvar,vcf,exact,,CLNSIG
benign,pathogenic Benign/Likely_benign
--
--
*William McLaren*
Senior Bioinformatics Scientist
Global Gene Corp
will.mclaren at globalgenecorp.com
www.globalgenecorp.com
The BIC, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR
--
******************************************************************** This
e-mail and any attachment hereto, is intended only for use by the
addressee(s) named above and may contain legally privileged and/or
confidential information. If you are not the intended recipient of this
e-mail, any dissemination, distribution or copying of this email, or any
attachment hereto, is strictly prohibited. If you receive this email in
error please immediately notify me by return electronic mail and
permanently delete this email and any attachment hereto, any copy of this
e-mail and of any such attachment, and any printout thereof. Finally,
please note that only authorized representatives of Global Gene Corporation
PTE ltd have the power and authority to enter into business dealings with
any third party.
********************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190621/3a9334ce/attachment.html>
More information about the Dev
mailing list