[ensembl-dev] Frequencies of SNPS in populations

Laurent Gil lgil at ebi.ac.uk
Thu Jan 17 11:28:57 GMT 2019


Dear Duarte,

The 1000 Genomes Phase 3 data are stored in a VCF file and not in a 
database (it was too big to store it in our databases), that's why you 
didn't see them in your results.
However you can access it with the Ensembl Variation API. For that, you 
need add the following line in your script to force the API to look into 
the Ensembl Variation VCF files:

$variation_adaptor->db->use_vcf(1);


Here is a suggestion of your script with the change:

my  $variation_adaptor  =  $registry->get_adaptor("human",  "variation",  "variation");
$variation_adaptor->db->use_vcf(1);

my $variation = $variation_adaptor->fetch_by_name($id);

foreach my $vf (@{$variation->get_all_VariationFeatures()}) {

     ...

}

Note that I also replaced the VariationFeatureAdaptor call 
"$vf_adaptor->fetch_all_by_Variation($var)}" to avoid using/instantiate 
an extra adaptor.

There are some further descriptions in our Ensembl Variation API 
tutorial: 
https://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles


Best regards,

Laurent
Ensembl Variation

On 17/01/2019 09:54, Duarte Molha wrote:
> Dear Developers
>
> I created a simple script to provide me with polymorphic frequencies 
> in the different populations in the database. However after running it 
> on my set it seems some variations do not show results
>
>
> take for example the INDEL rs141080692
> When I run it though my script this is the information I get:
>
> rs141080692     GT 1000GENOMES:pilot_1_CEU_low_coverage_panel     -    
>    deletion        9  123543905       123543907
> rs141080692     -  1000GENOMES:pilot_1_CEU_low_coverage_panel     -    
>    deletion        9  123543905       123543907
> rs141080692     GT 1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel -    
>    deletion        9       123543905      123543907
> rs141080692     -  1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel -    
>    deletion        9       123543905      123543907
> rs141080692     GT 1000GENOMES:pilot_1_YRI_low_coverage_panel     -    
>    deletion        9  123543905       123543907
> rs141080692     -  1000GENOMES:pilot_1_YRI_low_coverage_panel     -    
>    deletion        9  123543905       123543907
> rs141080692     GT      GMI:AK_Koreans -       deletion        9      
>  123543905      123543907
> rs141080692     -       GMI:AK_Koreans -       deletion        9      
>  123543905      123543907
> rs141080692     GT      GMI:NA10851  -       deletion        9 
>  123543905       123543907
> rs141080692     -       GMI:NA10851  -       deletion        9 
>  123543905       123543907
> rs141080692     GT      SSMP:SSM -       deletion        9      
>  123543905      123543907
> rs141080692     -       SSMP:SSM -       deletion        9      
>  123543905      123543907
>
> however, looking at the same database in your website:
>
> http://dec2015.archive.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=9:123543406-123544407;v=rs141080692;vdb=variation;vf=127601209
>
> You can see that there is information about its frequency in a whole 
> bunch of populations
>
> How do I go about fetching these?
>
> My script is pretty basic
>
> first I fect all populations or only ones I am interested in with:
>
> foreach my $pop (@{$population_adaptor->fetch_all()}){
> my $name = $pop->name();
> if (defined $name){
> if (defined $population){
> if ($name =~ /\Q$population/){
> print STDERR "Selected Populations: $name \n";
> push @selected_populations, $name;
> }
> }else{
> print STDERR "Selected Populations: $name \n";
> push @selected_populations, $name;
> }
> }
> }
>
> I then use the variation adaptor to get the variation object
>
>  my $variation = $variation_adaptor->fetch_by_name($id);
>
> Then I cycle though each variation feature with
>
> foreach my $vf (@{$vf_adaptor->fetch_all_by_Variation($var)}){
> my @alleles = @{$vf->get_all_Alleles};
>
> ALLELE_CYCLE:foreach my $a (@alleles){
> my $astr = $a->allele();
> my $pop  = $a->population();
> my $pop_name = "-";
> if (defined $pop){
> $pop_name = $a->population->name() ;
> }
> my $freq = $a->frequency() || "-";
> foreach my $p (@{$selected_populations}){
> #print STDERR $pop_name."\t".$p."\n";
> if ($pop_name eq $p){
> print $out_fh join "\t", ($var->name(),
> $astr,
> $pop_name,
> $freq,
> $varClass,
> $chr,
> $start,
> $end."\n");
> next ALLELE_CYCLE;
> }
> }
> }
> }
>
> Am I doing something wrong?
> There are the phase3 population data for example. They are clealy 
> included in your site
>
> Many thanks
>
> Duarte
>
>
>
>
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190117/da8e1c4e/attachment.html>


More information about the Dev mailing list