[ensembl-dev] Frequencies of SNPS in populations

Laurent Gil lgil at ebi.ac.uk
Fri Jan 18 14:44:52 GMT 2019


Dear Duarte,


You can download the 1000 Genomes VCFs and their indexed files here 
(it's quite big!): 
ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr... 
<ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/>


Then you need to edit the following file in your Ensembl Variation API 
(ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json):

https://github.com/Ensembl/ensembl-variation/blob/release/95/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json#L20-L22 


And replace the highlighted lines by:

"type": "local",
"strict_name_match": 1,
"filename_template": 
"<path_to_the_directory_where_you_downloaded_the_vcf_files>/ALL.chr###CHR###_GRCh38.genotypes.20170504.vcf.gz",


Best regards,

Laurent
Ensembl Variation

On 18/01/2019 14:32, Duarte Molha wrote:
> Just another question
>
> I can do what you say by querying the ensembl database remotely. But 
> we have installed it locally as well and since my queries would be 
> extensive I much prefered if I could also to this locally.
>
> Where and how do I download the VCFs and install them on my own server 
> so that this can also be done locally?
>
> Many thanks
> Duarte
>
> On Thu, 17 Jan 2019 at 11:28, Laurent Gil <lgil at ebi.ac.uk 
> <mailto:lgil at ebi.ac.uk>> wrote:
>
>     Dear Duarte,
>
>     The 1000 Genomes Phase 3 data are stored in a VCF file and not in
>     a database (it was too big to store it in our databases), that's
>     why you didn't see them in your results.
>     However you can access it with the Ensembl Variation API. For
>     that, you need add the following line in your script to force the
>     API to look into the Ensembl Variation VCF files:
>
>     $variation_adaptor->db->use_vcf(1);
>
>
>     Here is a suggestion of your script with the change:
>
>     my  $variation_adaptor  =  $registry->get_adaptor("human",  "variation",  "variation");
>     $variation_adaptor->db->use_vcf(1);
>
>     my $variation = $variation_adaptor->fetch_by_name($id);
>
>     foreach my $vf (@{$variation->get_all_VariationFeatures()}) {
>
>          ...
>
>     }
>
>     Note that I also replaced the VariationFeatureAdaptor call
>     "$vf_adaptor->fetch_all_by_Variation($var)}" to avoid
>     using/instantiate an extra adaptor.
>
>     There are some further descriptions in our Ensembl Variation API
>     tutorial:
>     https://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles
>
>
>     Best regards,
>
>     Laurent
>     Ensembl Variation
>
>     On 17/01/2019 09:54, Duarte Molha wrote:
>>     Dear Developers
>>
>>     I created a simple script to provide me with polymorphic
>>     frequencies in the different populations in the database. However
>>     after running it on my set it seems some variations do not show
>>     results
>>
>>
>>     take for example the INDEL rs141080692
>>     When I run it though my script this is the information I get:
>>
>>     rs141080692     GT 1000GENOMES:pilot_1_CEU_low_coverage_panel    
>>     -       deletion        9  123543905       123543907
>>     rs141080692     -  1000GENOMES:pilot_1_CEU_low_coverage_panel    
>>     -       deletion        9  123543905       123543907
>>     rs141080692     GT 1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel
>>     -       deletion        9  123543905       123543907
>>     rs141080692     -  1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel
>>     -       deletion        9  123543905       123543907
>>     rs141080692     GT 1000GENOMES:pilot_1_YRI_low_coverage_panel    
>>     -       deletion        9  123543905       123543907
>>     rs141080692     -  1000GENOMES:pilot_1_YRI_low_coverage_panel    
>>     -       deletion        9  123543905       123543907
>>     rs141080692     GT GMI:AK_Koreans  -       deletion       9     
>>      123543905  123543907
>>     rs141080692     -  GMI:AK_Koreans  -       deletion       9     
>>      123543905  123543907
>>     rs141080692     GT GMI:NA10851     -       deletion       9     
>>      123543905  123543907
>>     rs141080692     -  GMI:NA10851     -       deletion       9     
>>      123543905  123543907
>>     rs141080692     GT SSMP:SSM        -       deletion       9     
>>      123543905  123543907
>>     rs141080692     -  SSMP:SSM        -       deletion       9     
>>      123543905  123543907
>>
>>     however, looking at the same database in your website:
>>
>>     http://dec2015.archive.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=9:123543406-123544407;v=rs141080692;vdb=variation;vf=127601209
>>
>>     You can see that there is information about its frequency in a
>>     whole bunch of populations
>>
>>     How do I go about fetching these?
>>
>>     My script is pretty basic
>>
>>     first I fect all populations or only ones I am interested in with:
>>
>>     foreach my $pop (@{$population_adaptor->fetch_all()}){
>>     my $name = $pop->name();
>>     if (defined $name){
>>     if (defined $population){
>>     if ($name =~ /\Q$population/){
>>     print STDERR "Selected Populations: $name \n";
>>     push @selected_populations, $name;
>>     }
>>     }else{
>>     print STDERR "Selected Populations: $name \n";
>>     push @selected_populations, $name;
>>     }
>>     }
>>     }
>>
>>     I then use the variation adaptor to get the variation object
>>
>>      my $variation = $variation_adaptor->fetch_by_name($id);
>>
>>     Then I cycle though each variation feature with
>>
>>     foreach my $vf (@{$vf_adaptor->fetch_all_by_Variation($var)}){
>>     my @alleles = @{$vf->get_all_Alleles};
>>
>>     ALLELE_CYCLE:foreach my $a (@alleles){
>>     my $astr = $a->allele();
>>     my $pop  = $a->population();
>>     my $pop_name = "-";
>>     if (defined $pop){
>>     $pop_name = $a->population->name() ;
>>     }
>>     my $freq = $a->frequency() || "-";
>>     foreach my $p (@{$selected_populations}){
>>     #print STDERR $pop_name."\t".$p."\n";
>>     if ($pop_name eq $p){
>>     print $out_fh join "\t", ($var->name(),
>>     $astr,
>>     $pop_name,
>>     $freq,
>>     $varClass,
>>     $chr,
>>     $start,
>>     $end."\n");
>>     next ALLELE_CYCLE;
>>     }
>>     }
>>     }
>>     }
>>
>>     Am I doing something wrong?
>>     There are the phase3 population data for example. They are clealy
>>     included in your site
>>
>>     Many thanks
>>
>>     Duarte
>>
>>
>>
>>
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190118/49094c4e/attachment.html>


More information about the Dev mailing list