[ensembl-dev] Frequencies of SNPS in populations

Duarte Molha duartemolha at gmail.com
Fri Jan 18 14:48:47 GMT 2019


Thank you ... that is brilliant


[image: --]


On Fri, 18 Jan 2019 at 14:44, Laurent Gil <lgil at ebi.ac.uk> wrote:

> Dear Duarte,
>
>
> You can download the 1000 Genomes VCFs and their indexed files here (it's
> quite big!): f
> tp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr...
> <ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/>
>
>
> Then you need to edit the following file in your Ensembl Variation API
> (ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json):
>
> https://github.com/Ensembl/ensembl-variation/blob/release/95/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json#L20-L22
>
> And replace the highlighted lines by:
>
> "type": "local",
> "strict_name_match": 1,
> "filename_template":
> "<path_to_the_directory_where_you_downloaded_the_vcf_files>/ALL.chr###CHR###_GRCh38.genotypes.20170504.vcf.gz",
>
>
> Best regards,
>
> Laurent
> Ensembl Variation
>
> On 18/01/2019 14:32, Duarte Molha wrote:
>
> Just another question
>
> I can do what you say by querying the ensembl database remotely. But we
> have installed it locally as well and since my queries would be extensive I
> much prefered if I could also to this locally.
>
> Where and how do I download the VCFs and install them on my own server so
> that this can also be done locally?
>
> Many thanks
> Duarte
>
> On Thu, 17 Jan 2019 at 11:28, Laurent Gil <lgil at ebi.ac.uk> wrote:
>
>> Dear Duarte,
>>
>> The 1000 Genomes Phase 3 data are stored in a VCF file and not in a
>> database (it was too big to store it in our databases), that's why you
>> didn't see them in your results.
>> However you can access it with the Ensembl Variation API. For that, you
>> need add the following line in your script to force the API to look into
>> the Ensembl Variation VCF files:
>>
>> $variation_adaptor->db->use_vcf(1);
>>
>>
>> Here is a suggestion of your script with the change:
>>
>> my $variation_adaptor = $registry->get_adaptor("human", "variation", "variation");$variation_adaptor->db->use_vcf(1);
>>
>> my $variation = $variation_adaptor->fetch_by_name($id);
>>
>> foreach my $vf (@{$variation->get_all_VariationFeatures()}) {
>>
>>     ...
>>
>> }
>>
>> Note that I also replaced the VariationFeatureAdaptor call
>> "$vf_adaptor->fetch_all_by_Variation($var)}" to avoid using/instantiate an
>> extra adaptor.
>>
>> There are some further descriptions in our Ensembl Variation API
>> tutorial:
>> https://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles
>>
>>
>> Best regards,
>>
>> Laurent
>> Ensembl Variation
>>
>> On 17/01/2019 09:54, Duarte Molha wrote:
>>
>> Dear Developers
>>
>> I created a simple script to provide me with polymorphic frequencies in
>> the different populations in the database. However after running it on my
>> set it seems some variations do not show results
>>
>>
>> take for example the INDEL rs141080692
>>
>> When I run it though my script this is the information I get:
>>
>> rs141080692     GT      1000GENOMES:pilot_1_CEU_low_coverage_panel
>> -       deletion        9       123543905       123543907
>> rs141080692     -       1000GENOMES:pilot_1_CEU_low_coverage_panel
>> -       deletion        9       123543905       123543907
>> rs141080692     GT      1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel
>> -       deletion        9       123543905       123543907
>> rs141080692     -       1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel
>> -       deletion        9       123543905       123543907
>> rs141080692     GT      1000GENOMES:pilot_1_YRI_low_coverage_panel
>> -       deletion        9       123543905       123543907
>> rs141080692     -       1000GENOMES:pilot_1_YRI_low_coverage_panel
>> -       deletion        9       123543905       123543907
>> rs141080692     GT      GMI:AK_Koreans  -       deletion        9
>>  123543905       123543907
>> rs141080692     -       GMI:AK_Koreans  -       deletion        9
>>  123543905       123543907
>> rs141080692     GT      GMI:NA10851     -       deletion        9
>>  123543905       123543907
>> rs141080692     -       GMI:NA10851     -       deletion        9
>>  123543905       123543907
>> rs141080692     GT      SSMP:SSM        -       deletion        9
>>  123543905       123543907
>> rs141080692     -       SSMP:SSM        -       deletion        9
>>  123543905       123543907
>>
>> however, looking at the same database in your website:
>>
>>
>> http://dec2015.archive.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=9:123543406-123544407;v=rs141080692;vdb=variation;vf=127601209
>>
>> You can see that there is information about its frequency in a whole
>> bunch of populations
>>
>> How do I go about fetching these?
>>
>> My script is pretty basic
>>
>> first I fect all populations or only ones I am interested in with:
>>
>> foreach my $pop (@{$population_adaptor->fetch_all()}){
>> my $name = $pop->name();
>> if (defined $name){
>> if (defined $population){
>> if ($name =~ /\Q$population/){
>> print STDERR "Selected Populations: $name \n";
>> push @selected_populations, $name;
>> }
>> }else{
>> print STDERR "Selected Populations: $name \n";
>> push @selected_populations, $name;
>> }
>> }
>> }
>>
>> I then use the variation adaptor to get the variation object
>>
>>  my $variation = $variation_adaptor->fetch_by_name($id);
>>
>> Then I cycle though each variation feature with
>>
>> foreach my $vf (@{$vf_adaptor->fetch_all_by_Variation($var)}){
>> my @alleles = @{$vf->get_all_Alleles};
>>
>> ALLELE_CYCLE:foreach my $a (@alleles){
>> my $astr = $a->allele();
>> my $pop  = $a->population();
>> my $pop_name = "-";
>> if (defined $pop){
>> $pop_name = $a->population->name() ;
>> }
>> my $freq = $a->frequency() || "-";
>> foreach my $p (@{$selected_populations}){
>> #print STDERR $pop_name."\t".$p."\n";
>> if ($pop_name eq $p){
>> print $out_fh join "\t", ( $var->name(),
>> $astr,
>> $pop_name,
>> $freq,
>> $varClass,
>> $chr,
>> $start,
>> $end."\n");
>> next ALLELE_CYCLE;
>> }
>> }
>> }
>> }
>>
>> Am I doing something wrong?
>> There are the phase3 population data for example. They are clealy
>> included in your site
>>
>> Many thanks
>>
>> Duarte
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190118/33f9b38e/attachment.html>


More information about the Dev mailing list