[ensembl-dev] 1000 genomes frequencies from Ensembl SQL?

Anja Thormann anja at ebi.ac.uk
Mon Jul 11 13:19:49 BST 2016


Hi Mark,

we store 1000 Genomes phase 3 genotypes in VCF files (location here ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/ <ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/>). If the data is requested we compute frequencies on the fly. The best way of getting the data is by using our API. You just need to set a flag letting the API know also to look up data in VCF files. Here are some code examples: http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html <http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html>
http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#population_genotypes <http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#population_genotypes>

If you need to retrieve frequencies for all 1000 Genomes phase 3 variants we would advice to use vcftools and work directly with the provided VCF files. 

We do provide data dumps for allele frequencies of 1000 Genomes phase 3 variants from the continental super populations (AFR, AMR, EAS, EUR, SAS)
ftp://ftp.ensembl.org/pub/release-84/variation/vcf/homo_sapiens/1000GENOMES-phase_3.vcf.gz

HTH,
Anja



> On 8 Jul 2016, at 22:18, Mark Miller <Mark.Miller at instem.com> wrote:
> 
> Also posted to Biostars https://www.biostars.org/p/200847/ <https://www.biostars.org/p/200847/>
>  
> 
> How would I write an SQL query to retrieve all of the frequencies, form all providers, such as seen on a page like http://useast.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=16:89919209-89920209;v=rs1805007;vdb=variation;vf=1232953 <http://useast.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=16:89919209-89920209;v=rs1805007;vdb=variation;vf=1232953> ?
> 
> Connection info: http://useast.ensembl.org/info/data/mysql.html <http://useast.ensembl.org/info/data/mysql.html>
> I'm especially interested in the 1000 genomes frequencies.
> 
> I tried this (among many other queries), but it doesn't seem to include the 1000 Genomes Phase 3 data (for example, the frequency for '1000GENOMES:phase_3:TSI', population ID 373537, should be 0.023 )
> 
> SELECT * FROM allele
> left join population
> on allele.population_id = population.population_id
> left join variation
> on allele.variation_id = variation.variation_id
> where variation.name = 'rs1805007'
> Do I need to learn about subsnps? 
> Is the data masked for privacy? There sure are a lot of NULLs.
> Do I just need to keep staring at the Ensembl ERD? Like maybe I need to look in some of the sample or individual tables? http://useast.ensembl.org/info/docs/api/variation/variation_schema.html <http://useast.ensembl.org/info/docs/api/variation/variation_schema.html>
>  
>  
>  
> Mark Miller
> Instem
> Head of Bioinformatics & SRS Product Manager
> W +1 610 941 0990 x131
>  
> NEW MOBILE NUMBER: +1 215 421 5294
>  
> The contents of this e-mail message, including any attachments, are intended solely for the use of the person or entity to which the e-mail was addressed. If you are not the intended recipient of this message, be advised that any dissemination, distribution, or use of the contents of this message is strictly prohibited. If you received this e-mail message in error, please e-mail is at instem.com <x-msg://125/is@instem.com> and contact the sender by reply e-mail. Please also permanently delete all copies of the original e-mail and any attached documentation. Thank you. Copyright 2016 Instem Group of Companies. For any other correspondence please write to Instem plc. a company registered in England and Wales, number 07148099, with registered office at Diamond Way, Stone Business Park, Stone, Staffordshire, ST15 0SD England.
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160711/4ff672bc/attachment.html>


More information about the Dev mailing list