[ensembl-dev] Frequencies of SNPS in populations

Duarte Molha duartemolha at gmail.com
Fri Jan 18 14:32:06 GMT 2019


Just another question

I can do what you say by querying the ensembl database remotely. But we
have installed it locally as well and since my queries would be extensive I
much prefered if I could also to this locally.

Where and how do I download the VCFs and install them on my own server so
that this can also be done locally?

Many thanks
Duarte

On Thu, 17 Jan 2019 at 11:28, Laurent Gil <lgil at ebi.ac.uk> wrote:

> Dear Duarte,
>
> The 1000 Genomes Phase 3 data are stored in a VCF file and not in a
> database (it was too big to store it in our databases), that's why you
> didn't see them in your results.
> However you can access it with the Ensembl Variation API. For that, you
> need add the following line in your script to force the API to look into
> the Ensembl Variation VCF files:
>
> $variation_adaptor->db->use_vcf(1);
>
>
> Here is a suggestion of your script with the change:
>
> my $variation_adaptor = $registry->get_adaptor("human", "variation", "variation");$variation_adaptor->db->use_vcf(1);
>
> my $variation = $variation_adaptor->fetch_by_name($id);
>
> foreach my $vf (@{$variation->get_all_VariationFeatures()}) {
>
>     ...
>
> }
>
> Note that I also replaced the VariationFeatureAdaptor call
> "$vf_adaptor->fetch_all_by_Variation($var)}" to avoid using/instantiate an
> extra adaptor.
>
> There are some further descriptions in our Ensembl Variation API tutorial:
> https://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles
>
>
> Best regards,
>
> Laurent
> Ensembl Variation
>
> On 17/01/2019 09:54, Duarte Molha wrote:
>
> Dear Developers
>
> I created a simple script to provide me with polymorphic frequencies in
> the different populations in the database. However after running it on my
> set it seems some variations do not show results
>
>
> take for example the INDEL rs141080692
>
> When I run it though my script this is the information I get:
>
> rs141080692     GT      1000GENOMES:pilot_1_CEU_low_coverage_panel      -
>      deletion        9       123543905       123543907
> rs141080692     -       1000GENOMES:pilot_1_CEU_low_coverage_panel      -
>      deletion        9       123543905       123543907
> rs141080692     GT      1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel  -
>      deletion        9       123543905       123543907
> rs141080692     -       1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel  -
>      deletion        9       123543905       123543907
> rs141080692     GT      1000GENOMES:pilot_1_YRI_low_coverage_panel      -
>      deletion        9       123543905       123543907
> rs141080692     -       1000GENOMES:pilot_1_YRI_low_coverage_panel      -
>      deletion        9       123543905       123543907
> rs141080692     GT      GMI:AK_Koreans  -       deletion        9
>  123543905       123543907
> rs141080692     -       GMI:AK_Koreans  -       deletion        9
>  123543905       123543907
> rs141080692     GT      GMI:NA10851     -       deletion        9
>  123543905       123543907
> rs141080692     -       GMI:NA10851     -       deletion        9
>  123543905       123543907
> rs141080692     GT      SSMP:SSM        -       deletion        9
>  123543905       123543907
> rs141080692     -       SSMP:SSM        -       deletion        9
>  123543905       123543907
>
> however, looking at the same database in your website:
>
>
> http://dec2015.archive.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=9:123543406-123544407;v=rs141080692;vdb=variation;vf=127601209
>
> You can see that there is information about its frequency in a whole bunch
> of populations
>
> How do I go about fetching these?
>
> My script is pretty basic
>
> first I fect all populations or only ones I am interested in with:
>
> foreach my $pop (@{$population_adaptor->fetch_all()}){
> my $name = $pop->name();
> if (defined $name){
> if (defined $population){
> if ($name =~ /\Q$population/){
> print STDERR "Selected Populations: $name \n";
> push @selected_populations, $name;
> }
> }else{
> print STDERR "Selected Populations: $name \n";
> push @selected_populations, $name;
> }
> }
> }
>
> I then use the variation adaptor to get the variation object
>
>  my $variation = $variation_adaptor->fetch_by_name($id);
>
> Then I cycle though each variation feature with
>
> foreach my $vf (@{$vf_adaptor->fetch_all_by_Variation($var)}){
> my @alleles = @{$vf->get_all_Alleles};
>
> ALLELE_CYCLE:foreach my $a (@alleles){
> my $astr = $a->allele();
> my $pop  = $a->population();
> my $pop_name = "-";
> if (defined $pop){
> $pop_name = $a->population->name() ;
> }
> my $freq = $a->frequency() || "-";
> foreach my $p (@{$selected_populations}){
> #print STDERR $pop_name."\t".$p."\n";
> if ($pop_name eq $p){
> print $out_fh join "\t", ( $var->name(),
> $astr,
> $pop_name,
> $freq,
> $varClass,
> $chr,
> $start,
> $end."\n");
> next ALLELE_CYCLE;
> }
> }
> }
> }
>
> Am I doing something wrong?
> There are the phase3 population data for example. They are clealy included
> in your site
>
> Many thanks
>
> Duarte
>
>
>
>
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190118/595ff98e/attachment.html>


More information about the Dev mailing list