[ensembl-dev] Issues in LDFeatureContainerAdaptor

Anja Thormann anja at ebi.ac.uk
Wed Feb 24 13:56:31 GMT 2016


Dear Johanne,

I would recommend using the LDFeatureContainerAdaptor. I have written a small script to show you how to use the adaptor. In order to avoid exceeding the number of genotypes as is printed in the error message, you should define a population and variation feature for which you want to compute LD data. I print all the populations for which we can compute LD data in the beginning of the script. We have been working on speeding up our LD computation. The improvements are going into the next release/84 which will be out in March.

..
my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation' );
my $ldfc_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'ldfeaturecontainer');
my $population_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'population');
$variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also

my $ld_populations = $population_adaptor->fetch_all_LD_Populations();
foreach my $ld_population (@$ld_populations) {
  print $ld_population->name, "\n";
}

my $variation_name = 'rs157580';
my $variation = $variation_adaptor->fetch_by_name($variation_name);
my @vfs = @{ $variation->get_all_VariationFeatures() };

foreach my $vf (@vfs) {
  foreach my $ld_population (@$ld_populations) {
    print $ld_population->name, "\n";
    my $ldfc = $ldfc_adaptor->fetch_by_VariationFeature($vf, $ld_population);
    foreach my $ld_hash (@{$ldfc->get_all_ld_values}) {
      my $d_prime = $ld_hash->{d_prime};
      my $r2 = $ld_hash->{r2};
      my $variation_name1 = $ld_hash->{variation1}->variation_name;
      my $variation_name2 = $ld_hash->{variation2}->variation_name;
      print "$variation_name1 $variation_name2 d_prime=$d_prime r2=$r2\n";
    }
  }
}

Regards,
Anja


> On 24 Feb 2016, at 13:05, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:
> 
> Dear Ensembl team,
> 
> Thank you for previous help with setting up the API!
> 
> I am now able to use the API properly using the scripts of the tutorial pages of LD available from the ensembl blog and web page.
> 
> What I now try to do, is to expand a list of tag/index SNPs from GWAS to include the SNPs in LD with the input SNPs. The code I have is the following:
> 
> use strict;
> use warnings;
> use Bio::EnsEMBL::Registry;
> 
> my $registry = 'Bio::EnsEMBL::Registry';
> 
> $registry->load_registry_from_db(
>   -host => 'ensembldb.ensembl.org <http://ensembldb.ensembl.org/>',
>   -user => 'anonymous'
>     );
> 
> my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation' );
> $variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also
> 
> while (<>) {
>     chomp; # Remove \n from input file line names
>     my $variation = $variation_adaptor->fetch_by_name($_);
>     print $variation->stable_id(), "\n";
> 
>     my @vfs = @{ $variation->get_all_VariationFeatures() };
> 
>     foreach my $vf (@vfs){
>         print "get ld values\n";
>         my $ld = $vf->get_all_LD_values();
>         print "get ld variations\n";
>         my @ldvs = @{ $ld->get_variations() };
> 
>         print "for each ld variation\n";
>         foreach my $ldv (@ldvs) {
>             print $ldv->stable_id();
>         }
>     }   
> }
> 
> When calling $vf->get_all_LD_values I get the following error/warning:
> Use of uninitialized value $gt[1] in hash element at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 716.
> ...
> Use of uninitialized value in string ne at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 617.
> ...
> Number of genotypes supported by the program (500) exceeded
> ...
> Can't call method "get_all_VariationFeatures" on an undefined value at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 870, <OUT> line 29120.
> 
> The «…» means the warning/error is printed multiple times in a row.
> 
> I have described the issue in further detail here: https://www.biostars.org/p/178467/ <https://www.biostars.org/p/178467/>
> 
> Could you help me with what I am doing wrong? Also, is there a better way of finding all SNPs in LD with an input SNP than the one I am trying to do? Speed seems to be an issue when the number of SNPs get large.
> 
> Best
> Johanne Håøy Horn
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160224/efa99d1f/attachment.html>


More information about the Dev mailing list