[ensembl-dev] Issues in LDFeatureContainerAdaptor

Johanne Håøy Horn johannhh at ifi.uio.no
Wed Feb 24 17:20:53 GMT 2016


The bug fix did the trick! Thank you so much for your help, and the speed at which it was given.

Best,
Johanne

24. feb. 2016 kl. 16.43 skrev Anja Thormann <anja at ebi.ac.uk<mailto:anja at ebi.ac.uk>>:

I pushed a bug fix to the ensembl-variation release/83 branch. Could you please update your ensembl-variation git repo (pull the changes)?

You might also need to include failed variants. This is similar to setting the use_vcf flag:

$variation_adaptor->db->include_failed_variations(1);

Please let me know if you continue having problems running the code.

Regards,
Anja

On 24 Feb 2016, at 14:57, Johanne Håøy Horn <johannhh at ifi.uio.no<mailto:johannhh at ifi.uio.no>> wrote:

Thank you for your swift reply!

When I try to run the script you gave me, I get the following printout, ending with the same warnings/errors as I had before:

$ perl ensemblLD.pl
1000GENOMES:phase_3:ACB
1000GENOMES:phase_3:ASW
1000GENOMES:phase_3:BEB
1000GENOMES:phase_3:CDX
1000GENOMES:phase_3:CEU
1000GENOMES:phase_3:CHB
1000GENOMES:phase_3:CHS
1000GENOMES:phase_3:CLM
1000GENOMES:phase_3:ESN
1000GENOMES:phase_3:FIN
1000GENOMES:phase_3:GBR
1000GENOMES:phase_3:GIH
1000GENOMES:phase_3:IBS
1000GENOMES:phase_3:ITU
1000GENOMES:phase_3:JPT
1000GENOMES:phase_3:KHV
1000GENOMES:phase_3:LWK
1000GENOMES:phase_3:MAG
1000GENOMES:phase_3:MSL
1000GENOMES:phase_3:MXL
1000GENOMES:phase_3:PEL
1000GENOMES:phase_3:PJL
1000GENOMES:phase_3:PUR
1000GENOMES:phase_3:STU
1000GENOMES:phase_3:TSI
1000GENOMES:phase_3:YRI
1000GENOMES:phase_3:ACB
Use of uninitialized value $gt[1] in hash element at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 716.
Use of uninitialized value $gt[1] in hash element at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 716.
Use of uninitialized value $gt[1] in hash element at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 716.
Use of uninitialized value in string ne at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 617.
Use of uninitialized value in string ne at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 617.
Use of uninitialized value in string ne at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 617.
Can't call method "get_all_VariationFeatures" on an undefined value at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 870, <OUT> line 65174.

I use version 83 of the API, very recently downloaded and set up. Perhaps it is my installation locally that is the problem, not the code… Do you have any idea on what I might have done wrong? I have double checked that I have compiled the calc_genotype file in src/ensemble-variation/C_code, and included the src/ensemble-variation/C_code path in my PERL5LIB variable. Is there any other dependencies specifically related to the LDFeatureContainer that I should check is correctly set up?

Best,
Johanne Håøy Horn

24. feb. 2016 kl. 14.56 skrev Anja Thormann <anja at ebi.ac.uk<mailto:anja at ebi.ac.uk>>:

Dear Johanne,

I would recommend using the LDFeatureContainerAdaptor. I have written a small script to show you how to use the adaptor. In order to avoid exceeding the number of genotypes as is printed in the error message, you should define a population and variation feature for which you want to compute LD data. I print all the populations for which we can compute LD data in the beginning of the script. We have been working on speeding up our LD computation. The improvements are going into the next release/84 which will be out in March.

..
my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation' );
my $ldfc_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'ldfeaturecontainer');
my $population_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'population');
$variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also

my $ld_populations = $population_adaptor->fetch_all_LD_Populations();
foreach my $ld_population (@$ld_populations) {
  print $ld_population->name, "\n";
}

my $variation_name = 'rs157580';
my $variation = $variation_adaptor->fetch_by_name($variation_name);
my @vfs = @{ $variation->get_all_VariationFeatures() };

foreach my $vf (@vfs) {
  foreach my $ld_population (@$ld_populations) {
    print $ld_population->name, "\n";
    my $ldfc = $ldfc_adaptor->fetch_by_VariationFeature($vf, $ld_population);
    foreach my $ld_hash (@{$ldfc->get_all_ld_values}) {
      my $d_prime = $ld_hash->{d_prime};
      my $r2 = $ld_hash->{r2};
      my $variation_name1 = $ld_hash->{variation1}->variation_name;
      my $variation_name2 = $ld_hash->{variation2}->variation_name;
      print "$variation_name1 $variation_name2 d_prime=$d_prime r2=$r2\n";
    }
  }
}

Regards,
Anja


On 24 Feb 2016, at 13:05, Johanne Håøy Horn <johannhh at ifi.uio.no<mailto:johannhh at ifi.uio.no>> wrote:

Dear Ensembl team,

Thank you for previous help with setting up the API!

I am now able to use the API properly using the scripts of the tutorial pages of LD available from the ensembl blog and web page.

What I now try to do, is to expand a list of tag/index SNPs from GWAS to include the SNPs in LD with the input SNPs. The code I have is the following:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org<http://ensembldb.ensembl.org/>',
  -user => 'anonymous'
    );

my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation' );
$variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also

while (<>) {
    chomp; # Remove \n from input file line names
    my $variation = $variation_adaptor->fetch_by_name($_);
    print $variation->stable_id(), "\n";

    my @vfs = @{ $variation->get_all_VariationFeatures() };

    foreach my $vf (@vfs){
        print "get ld values\n";
        my $ld = $vf->get_all_LD_values();
        print "get ld variations\n";
        my @ldvs = @{ $ld->get_variations() };

        print "for each ld variation\n";
        foreach my $ldv (@ldvs) {
            print $ldv->stable_id();
        }
    }
}

When calling $vf->get_all_LD_values I get the following error/warning:
Use of uninitialized value $gt[1] in hash element at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 716.
...
Use of uninitialized value in string ne at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 617.
...
Number of genotypes supported by the program (500) exceeded
...
Can't call method "get_all_VariationFeatures" on an undefined value at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 870, <OUT> line 29120.

The «…» means the warning/error is printed multiple times in a row.

I have described the issue in further detail here: https://www.biostars.org/p/178467/

Could you help me with what I am doing wrong? Also, is there a better way of finding all SNPs in LD with an input SNP than the one I am trying to do? Speed seems to be an issue when the number of SNPs get large.

Best
Johanne Håøy Horn


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160224/b2184d95/attachment.html>


More information about the Dev mailing list