[ensembl-dev] Frequencies of SNPS in populations

Duarte Molha duartemolha at gmail.com
Thu Jan 17 09:54:09 GMT 2019


Dear Developers

I created a simple script to provide me with polymorphic frequencies in the
different populations in the database. However after running it on my set
it seems some variations do not show results


take for example the INDEL rs141080692

When I run it though my script this is the information I get:

rs141080692     GT      1000GENOMES:pilot_1_CEU_low_coverage_panel      -
     deletion        9       123543905       123543907
rs141080692     -       1000GENOMES:pilot_1_CEU_low_coverage_panel      -
     deletion        9       123543905       123543907
rs141080692     GT      1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel  -
     deletion        9       123543905       123543907
rs141080692     -       1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel  -
     deletion        9       123543905       123543907
rs141080692     GT      1000GENOMES:pilot_1_YRI_low_coverage_panel      -
     deletion        9       123543905       123543907
rs141080692     -       1000GENOMES:pilot_1_YRI_low_coverage_panel      -
     deletion        9       123543905       123543907
rs141080692     GT      GMI:AK_Koreans  -       deletion        9
 123543905       123543907
rs141080692     -       GMI:AK_Koreans  -       deletion        9
 123543905       123543907
rs141080692     GT      GMI:NA10851     -       deletion        9
 123543905       123543907
rs141080692     -       GMI:NA10851     -       deletion        9
 123543905       123543907
rs141080692     GT      SSMP:SSM        -       deletion        9
 123543905       123543907
rs141080692     -       SSMP:SSM        -       deletion        9
 123543905       123543907

however, looking at the same database in your website:

http://dec2015.archive.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=9:123543406-123544407;v=rs141080692;vdb=variation;vf=127601209

You can see that there is information about its frequency in a whole bunch
of populations

How do I go about fetching these?

My script is pretty basic

first I fect all populations or only ones I am interested in with:

foreach my $pop (@{$population_adaptor->fetch_all()}){
my $name = $pop->name();
if (defined $name){
if (defined $population){
if ($name =~ /\Q$population/){
print STDERR "Selected Populations: $name \n";
push @selected_populations, $name;
}
}else{
print STDERR "Selected Populations: $name \n";
push @selected_populations, $name;
}
}
}

I then use the variation adaptor to get the variation object

 my $variation = $variation_adaptor->fetch_by_name($id);

Then I cycle though each variation feature with

foreach my $vf (@{$vf_adaptor->fetch_all_by_Variation($var)}){
my @alleles = @{$vf->get_all_Alleles};

ALLELE_CYCLE:foreach my $a (@alleles){
my $astr = $a->allele();
my $pop  = $a->population();
my $pop_name = "-";
if (defined $pop){
$pop_name = $a->population->name() ;
}
my $freq = $a->frequency() || "-";
foreach my $p (@{$selected_populations}){
#print STDERR $pop_name."\t".$p."\n";
if ($pop_name eq $p){
print $out_fh join "\t", ( $var->name(),
$astr,
$pop_name,
$freq,
$varClass,
$chr,
$start,
$end."\n");
next ALLELE_CYCLE;
}
}
}
}

Am I doing something wrong?
There are the phase3 population data for example. They are clealy included
in your site

Many thanks

Duarte
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190117/b5a40a63/attachment.html>


More information about the Dev mailing list