[ensembl-dev] Frequencies of SNPS in populations
Laurent Gil
lgil at ebi.ac.uk
Thu Jan 17 11:28:57 GMT 2019
Dear Duarte,
The 1000 Genomes Phase 3 data are stored in a VCF file and not in a
database (it was too big to store it in our databases), that's why you
didn't see them in your results.
However you can access it with the Ensembl Variation API. For that, you
need add the following line in your script to force the API to look into
the Ensembl Variation VCF files:
$variation_adaptor->db->use_vcf(1);
Here is a suggestion of your script with the change:
my $variation_adaptor = $registry->get_adaptor("human", "variation", "variation");
$variation_adaptor->db->use_vcf(1);
my $variation = $variation_adaptor->fetch_by_name($id);
foreach my $vf (@{$variation->get_all_VariationFeatures()}) {
...
}
Note that I also replaced the VariationFeatureAdaptor call
"$vf_adaptor->fetch_all_by_Variation($var)}" to avoid using/instantiate
an extra adaptor.
There are some further descriptions in our Ensembl Variation API
tutorial:
https://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles
Best regards,
Laurent
Ensembl Variation
On 17/01/2019 09:54, Duarte Molha wrote:
> Dear Developers
>
> I created a simple script to provide me with polymorphic frequencies
> in the different populations in the database. However after running it
> on my set it seems some variations do not show results
>
>
> take for example the INDEL rs141080692
> When I run it though my script this is the information I get:
>
> rs141080692 GT 1000GENOMES:pilot_1_CEU_low_coverage_panel -
> deletion 9 123543905 123543907
> rs141080692 - 1000GENOMES:pilot_1_CEU_low_coverage_panel -
> deletion 9 123543905 123543907
> rs141080692 GT 1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel -
> deletion 9 123543905 123543907
> rs141080692 - 1000GENOMES:pilot_1_CHB+JPT_low_coverage_panel -
> deletion 9 123543905 123543907
> rs141080692 GT 1000GENOMES:pilot_1_YRI_low_coverage_panel -
> deletion 9 123543905 123543907
> rs141080692 - 1000GENOMES:pilot_1_YRI_low_coverage_panel -
> deletion 9 123543905 123543907
> rs141080692 GT GMI:AK_Koreans - deletion 9
> 123543905 123543907
> rs141080692 - GMI:AK_Koreans - deletion 9
> 123543905 123543907
> rs141080692 GT GMI:NA10851 - deletion 9
> 123543905 123543907
> rs141080692 - GMI:NA10851 - deletion 9
> 123543905 123543907
> rs141080692 GT SSMP:SSM - deletion 9
> 123543905 123543907
> rs141080692 - SSMP:SSM - deletion 9
> 123543905 123543907
>
> however, looking at the same database in your website:
>
> http://dec2015.archive.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=9:123543406-123544407;v=rs141080692;vdb=variation;vf=127601209
>
> You can see that there is information about its frequency in a whole
> bunch of populations
>
> How do I go about fetching these?
>
> My script is pretty basic
>
> first I fect all populations or only ones I am interested in with:
>
> foreach my $pop (@{$population_adaptor->fetch_all()}){
> my $name = $pop->name();
> if (defined $name){
> if (defined $population){
> if ($name =~ /\Q$population/){
> print STDERR "Selected Populations: $name \n";
> push @selected_populations, $name;
> }
> }else{
> print STDERR "Selected Populations: $name \n";
> push @selected_populations, $name;
> }
> }
> }
>
> I then use the variation adaptor to get the variation object
>
> my $variation = $variation_adaptor->fetch_by_name($id);
>
> Then I cycle though each variation feature with
>
> foreach my $vf (@{$vf_adaptor->fetch_all_by_Variation($var)}){
> my @alleles = @{$vf->get_all_Alleles};
>
> ALLELE_CYCLE:foreach my $a (@alleles){
> my $astr = $a->allele();
> my $pop = $a->population();
> my $pop_name = "-";
> if (defined $pop){
> $pop_name = $a->population->name() ;
> }
> my $freq = $a->frequency() || "-";
> foreach my $p (@{$selected_populations}){
> #print STDERR $pop_name."\t".$p."\n";
> if ($pop_name eq $p){
> print $out_fh join "\t", ($var->name(),
> $astr,
> $pop_name,
> $freq,
> $varClass,
> $chr,
> $start,
> $end."\n");
> next ALLELE_CYCLE;
> }
> }
> }
> }
>
> Am I doing something wrong?
> There are the phase3 population data for example. They are clealy
> included in your site
>
> Many thanks
>
> Duarte
>
>
>
>
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190117/da8e1c4e/attachment.html>
More information about the Dev
mailing list