[ensembl-dev] Dataset of common SNPs/indels & CNVs/SVs

Anja Thormann anja at ebi.ac.uk
Tue Jan 2 15:34:18 GMT 2018


Hi Jessie,

For short variants including SNVs and indels:
You can use ftp://ftp.ensembl.org/pub/release-91/variation/vcf/homo_sapiens/1000GENOMES-phase_3.vcf.gz <ftp://ftp.ensembl.org/pub/release-91/variation/vcf/homo_sapiens/1000GENOMES-phase_3.vcf.gz> for filtering for common variants. For each variant we report the allele frequency for the super populations (EAS, EUR, AFR, AMR, SAS) studied in the 1000 Genomes phase 3 project.

Example row from the file:
19  10368804  rs373966690 TAAGTAA T . . dbSNP_150;TSA=deletion;E_Freq;E_1000G;MA=-;MAF=0.00439297;MAC=22;EAS_AF=0.0149;EUR_AF=0.002;AMR_AF=0.0014;SAS_AF=0.0031;AFR_AF=0.0008
9 10368969  rs12720251  C G,T . . dbSNP_150;TSA=SNV;E_Freq;E_1000G;MA=T;MAF=0.00479233;MAC=24;AA=C;EAS_AF=0,0;EUR_AF=0,0;AMR_AF=0,0.0014;SAS_AF=0,0;AFR_AF=0,0.0174

You can extract the frequencies from AFR_AF, AMR_AF, EUR_AF, EAS_AF, SAS_AF which are listed in the info column and report the frequency for the variant allele. Here, the variant allele is T.

For structural variants including CNVs:
We don't include frequencies to our structural variation data dumps. However, we compute structural variation allele frequencies based on samples from the 1000 Genomes project and display them on our website:

For example:
http://www.ensembl.org/Homo_sapiens/StructuralVariation/Evidence?db=core;r=12:131494150-131500971;sv=esv3631253;svf=118155531;vdb=variation <http://www.ensembl.org/Homo_sapiens/StructuralVariation/Evidence?db=core;r=12:131494150-131500971;sv=esv3631253;svf=118155531;vdb=variation>

You can access the frequencies with our perl API. As an alternative you can also use the VCF file from the 1000 Genomes website which contains the allele frequencies you are looking for:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/integrated_sv_map/ <ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/integrated_sv_map/>


For example:
tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/integrated_sv_map/ALL.wgs.mergedSV.v8.20130502.svs.genotypes.vcf.gz 12:131979195-131979196

12  131979195 DUP_gs_CNV_12_131979195_131985016 T <CN0>,<CN2> . PASS  AC=1,8;AF=0.00019968,0.00159744;AFR_AF=0,0.0015;AMR_AF=0.0014,0.0029;AN=5008;CS=DUP_gs;EAS_AF=0,0.002;END=131985016;EUR_AF=0,0.001;NS=2504;SAS_AF=0,0.001;SITEPOST=0.8758;SVTYPE=CNV

AFR_AF, AMR_AF, EAS_AF, SAS_AF report the allele frequencies for the variant alleles which are <CN0>,<CN2> in this case.

I hope that answers your questions. Please let us know if you have further questions.

Best,
Anja


> On 22 Dec 2017, at 10:54, Jessie Poquérusse <jessie.poquerusse at gmail.com> wrote:
> 
> Hello, 
> 
> I'm having a hard time getting my hands on the best, most recent VCF datasets, mapped to GRCh38/hg38, of SNP/indel and CNV/SV variations, which I could then filter according to commonality in the general population to obtain a list of common-only variants. My question is thus two-fold:
> 1) What is the best source for such variations, and 
> 2) Are there any instructions on how the variant frequency is encoded (does it correspond to E_freq, as per ftp://ftp.ensembl.org/pub/release-91/variation/vcf/homo_sapiens/README <ftp://ftp.ensembl.org/pub/release-91/variation/vcf/homo_sapiens/README>), and how to filter for this?
> 
> I realize I've asked a version of this question a few days ago, but now would love more details.
> 
> Thank-you & happy holidays!
> 
> Best,
> Jessie
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180102/f9af6d6f/attachment.html>


More information about the Dev mailing list