[ensembl-dev] Issues with VariationFeature

Anja Thormann anja at ebi.ac.uk
Thu Feb 25 13:24:01 GMT 2016

Hi Johanne,

thank you for reporting this. The problem was caused by an error in the sql statement. I fixed the wrong statement and pushed the fix again to the release/83 branch. Please pull the changes. However, the call is not going to return any data at the moment. We recently switched from using a database for storing genotypes to reading the the genotypes from VCF files instead. This functionality is currently not supported by the method ($vf->get_all_LD_Populations()). We will add this for release/84. In the  meantime please use the LDFeatureContainerAdaptor and loop over all the populations for which we could compute LD data. The populations can be retrieved from the population adaptor using fetch_all_LD_Populations. We compute LD data on 1000 Genomes phase 3 data. In addition to computing LD data for a given SNP you can also specify a region or a a set of SNPs for which you want to compute LD data. Use the following methods from the LDFeatureContainerAdaptor (http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1DBSQL_1_1LDFeatureContainerAdaptor.html <http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1DBSQL_1_1LDFeatureContainerAdaptor.html>): fetch_by_VariationFeatures, fetch_by_Slice.


> On 25 Feb 2016, at 09:57, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:
> Dear Ensembl team, 
> When I do the following call: my @vf_pops = @{ $vf->get_all_LD_Populations() }; I get this error:
> DBD::mysql::st execute failed: Unknown column 'ip.population_id' in 'field list' at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VariationFeature.pm line 1429, <> line 1.DBD::mysql::st execute failed: Unknown column 'ip.population_id' in 'field list' at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VariationFeature.pm line 1429, <> line 1.
> Here’s the full script:
> use strict;
> use warnings;
> use Bio::EnsEMBL::Registry;
> my $start_run = time();
> my $registry = 'Bio::EnsEMBL::Registry';
> $registry->load_registry_from_db(
>   -host => 'ensembldb.ensembl.org <http://ensembldb.ensembl.org/>',
>   -user => 'anonymous'
> );
> my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation' );
> my $ldfc_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'ldfeaturecontainer');
> my $population_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'population');
> $variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also
> my $ld_populations = $population_adaptor->fetch_all_LD_Populations();
> foreach my $ld_population (@$ld_populations) {
>     print $ld_population->name, "\n";
> }
> my $variation_name = 'rs157580';
> my $variation = $variation_adaptor->fetch_by_name($variation_name);
> my @vfs = @{ $variation->get_all_VariationFeatures() };
> foreach my $vf (@vfs) {
>   print $vf->name, "\n";
>   my @vf_pops = @{ $vf->get_all_LD_Populations() };
>   foreach my $ld_population (@$ld_populations) {
>     print $ld_population->name, "\n";
>     my $ldfc = $ldfc_adaptor->fetch_by_VariationFeature($vf, $ld_population);
>     foreach my $ld_hash (@{$ldfc->get_all_ld_values}) {
> my $d_prime = $ld_hash->{d_prime};
> my $r2 = $ld_hash->{r2};
> my $variation_name1 = $ld_hash->{variation1}->variation_name;
> my $variation_name2 = $ld_hash->{variation2}->variation_name;
> print "$variation_name1 $variation_name2 d_prime=$d_prime r2=$r2\n";
>     }
>   }
> }
> my $end_run = time();
> my $run_time = $end_run - $start_run;
> print "Job took $run_time seconds\n";
> If I remove the call to get_all_LD_Populations, the script runs fine again. Do you have any idea on what I am doing wrong? Could it be a bug in the code, like the error I reported yesterday?
> Also, I have visited a lot of forums where LD calculation is discussed. Many users ask for a database one can query to find LD between SNPs, and genomic LD tracks, but all such services are only available on HapMap data. Do you know why there is none for all SNPs and LD produced up until 1000G phase 3? What kind of restrictions is there that makes it easier to compute LD on the fly, for instance? Space maybe? 
> (If there actually does exist databases/tracks of LD, I would be happy to know!)
> Best,
> Johanne
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160225/6285fae7/attachment.html>

More information about the Dev mailing list