[ensembl-dev] Excluding somatic variants from ExAC/ESP allele counts

Cyriac Kandoth kandothc at mskcc.org
Tue Oct 25 23:49:19 BST 2016


Hi Devs,

In Dec 2014, a string of papers showed a prevalence of subclonal somatic
mutations in the blood cells of older healthy individuals. You can find a
preliminary subset of such mutations here:
http://www.nature.com/nm/journal/v20/n12/fig_tab/nm.3733_T1.html

Many of these recurrent somatic mutations are reported in the official ExAC
VCF with FILTER=PASS, incorrectly indicating that they are recurrent
germline variants. Some of these have fairly high allele counts, and are
known recurrent hotspots in blood cancers. Such variants are also seen in
cohorts like ESP6500, but in much less frequency than across TCGA normals.
Here's two examples:
https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs77375493 - tagged as
Pathogenic in ClinVar
https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs147001633 - *not* tagged
as Pathogenic in ClinVar

The ExAC authors are aware of this, but there is no elegant way to identify
and tag such variants uniformly across their cohort. I believe they are
working on a related publication, but that will be a while. For now, they
have made available a subset VCF, that excludes TCGA samples:
ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/subsets/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz

This removes nearly all known somatic variant sites, and makes it a great
false-positive filter in our (MSKCC) somatic variant calling pipelines.
Instead of using ExAC AFs in VEP's cache, I use this nonTCGA VCF with VEP's
ExAC plugin, after a few modifications documented here:
https://gist.github.com/ckandoth/f265ea7c59a880e28b1e533a6e935697

Would you consider reporting ExAC allele counts from this nonTCGA VCF as
the default?

Thanks much!

~Cyriac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/e67eb854/attachment.html>


More information about the Dev mailing list