[ensembl-dev] Frequencies of SNPS in populations
Laurent Gil
lgil at ebi.ac.uk
Fri Jan 18 14:44:52 GMT 2019
Dear Duarte,
You can download the 1000 Genomes VCFs and their indexed files here
(it's quite big!):
ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr...
<ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/>
Then you need to edit the following file in your Ensembl Variation API
(ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json):
https://github.com/Ensembl/ensembl-variation/blob/release/95/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json#L20-L22
And replace the highlighted lines by:
"type": "local",
"strict_name_match": 1,
"filename_template":
"<path_to_the_directory_where_you_downloaded_the_vcf_files>/ALL.chr###CHR###_GRCh38.genotypes.20170504.vcf.gz",
Best regards,
Laurent
Ensembl Variation
On 18/01/2019 14:32, Duarte Molha wrote:
> Just another question
>
> I can do what you say by querying the ensembl database remotely. But
> we have installed it locally as well and since my queries would be
> extensive I much prefered if I could also to this locally.
>
> Where and how do I download the VCFs and install them on my own server
> so that this can also be done locally?
>
> Many thanks
> Duarte
>
> On Thu, 17 Jan 2019 at 11:28, Laurent Gil <lgil at ebi.ac.uk
> <mailto:lgil at ebi.ac.uk>> wrote:
>
> Dear Duarte,
>
> The 1000 Genomes Phase 3 data are stored in a VCF file and not in
> a database (it was too big to store it in our databases), that's
> why you didn't see them in your results.
> However you can access it with the Ensembl Variation API. For
> that, you need add the following line in your script to force the
> API to look into the Ensembl Variation VCF files:
>
> $variation_adaptor->db->use_vcf(1);
>
>
> Here is a suggestion of your script with the change:
>
> my $variation_adaptor = $registry->get_adaptor("human", "variation", "variation");
> $variation_adaptor->db->use_vcf(1);
>
> my $variation = $variation_adaptor->fetch_by_name($id);
>
> foreach my $vf (@{$variation->get_all_VariationFeatures()}) {
>
> ...
>
> }
>
> Note that I also replaced the VariationFeatureAdaptor call
> "$vf_adaptor->fetch_all_by_Variation($var)}" to avoid
> using/instantiate an extra adaptor.
>
> There are some further descriptions in our Ensembl Variation API
> tutorial:
> https://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles
>
>
> Best regards,
>
> Laurent
> Ensembl Variation
>
> On 17/01/2019 09:54, Duarte Molha wrote:
>> Dear Developers
>>
>> I created a simple script to provide me with polymorphic
>> frequencies in the different populations in the database. However
>> after running it on my set it seems some variations do not show
>> results
>>
>>
>> take for example the INDEL rs141080692
>> When I run it though my script this is the information I get:
>>
>> rs141080692 GT 1000GENOMES:pilot_1_CEU_low_coverage_panel
>> - deletion 9 123543905 123543907
>> rs141080692 - 1000GENOMES:pilot_1_CEU_low_coverage_panel
>> - deletion 9 123543905 123543907
>> rs141080692 GT 1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel
>> - deletion 9 123543905 123543907
>> rs141080692 - 1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel
>> - deletion 9 123543905 123543907
>> rs141080692 GT 1000GENOMES:pilot_1_YRI_low_coverage_panel
>> - deletion 9 123543905 123543907
>> rs141080692 - 1000GENOMES:pilot_1_YRI_low_coverage_panel
>> - deletion 9 123543905 123543907
>> rs141080692 GT GMI:AK_Koreans - deletion 9
>> 123543905 123543907
>> rs141080692 - GMI:AK_Koreans - deletion 9
>> 123543905 123543907
>> rs141080692 GT GMI:NA10851 - deletion 9
>> 123543905 123543907
>> rs141080692 - GMI:NA10851 - deletion 9
>> 123543905 123543907
>> rs141080692 GT SSMP:SSM - deletion 9
>> 123543905 123543907
>> rs141080692 - SSMP:SSM - deletion 9
>> 123543905 123543907
>>
>> however, looking at the same database in your website:
>>
>> http://dec2015.archive.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=9:123543406-123544407;v=rs141080692;vdb=variation;vf=127601209
>>
>> You can see that there is information about its frequency in a
>> whole bunch of populations
>>
>> How do I go about fetching these?
>>
>> My script is pretty basic
>>
>> first I fect all populations or only ones I am interested in with:
>>
>> foreach my $pop (@{$population_adaptor->fetch_all()}){
>> my $name = $pop->name();
>> if (defined $name){
>> if (defined $population){
>> if ($name =~ /\Q$population/){
>> print STDERR "Selected Populations: $name \n";
>> push @selected_populations, $name;
>> }
>> }else{
>> print STDERR "Selected Populations: $name \n";
>> push @selected_populations, $name;
>> }
>> }
>> }
>>
>> I then use the variation adaptor to get the variation object
>>
>> my $variation = $variation_adaptor->fetch_by_name($id);
>>
>> Then I cycle though each variation feature with
>>
>> foreach my $vf (@{$vf_adaptor->fetch_all_by_Variation($var)}){
>> my @alleles = @{$vf->get_all_Alleles};
>>
>> ALLELE_CYCLE:foreach my $a (@alleles){
>> my $astr = $a->allele();
>> my $pop = $a->population();
>> my $pop_name = "-";
>> if (defined $pop){
>> $pop_name = $a->population->name() ;
>> }
>> my $freq = $a->frequency() || "-";
>> foreach my $p (@{$selected_populations}){
>> #print STDERR $pop_name."\t".$p."\n";
>> if ($pop_name eq $p){
>> print $out_fh join "\t", ($var->name(),
>> $astr,
>> $pop_name,
>> $freq,
>> $varClass,
>> $chr,
>> $start,
>> $end."\n");
>> next ALLELE_CYCLE;
>> }
>> }
>> }
>> }
>>
>> Am I doing something wrong?
>> There are the phase3 population data for example. They are clealy
>> included in your site
>>
>> Many thanks
>>
>> Duarte
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190118/49094c4e/attachment.html>
More information about the Dev
mailing list