[ensembl-dev] Getting LD data from 1000G phase 3 perl API (v83)

Johanne Håøy Horn johannhh at ifi.uio.no
Fri Jan 15 12:35:29 GMT 2016


Dear Ensembl dev team,

I wish to use your perl API (version 83) to create a program that does the following: Have as input a list of SNPs by rsID, fetch 1000G phase 3 LD SNPs and provide list of the ones with r^2 > 0.8 as output.

I tried following the example posted by Emily here: https://www.biostars.org/p/109785/#147784

I seem to want the same functionality and have the same problems as the user in the correspondence, but while they appear to have found a solution, I still have two issues right now:
1) I provide an input file to the script, where each line is rs###. I am able to print out this id using $_ as rs###\n, but when using $variation_adaptor->fetch_by_name($_); I do not get a $variation object, just undefined. Therefore, in the code below, I have used a test rsID so far. Do you have any suggestions on why the rsID from my input file gives undefined variations?
2) When running the following perl code, I get the error pasted below:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org<http://ensembldb.ensembl.org>',
  -user => 'anonymous'
    );

my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation' );
$variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also

# For each rsID in the input file from calling python script:
while (<>) {
    # Find all SNPs in LD and print
    # $_ represents a line from stdin
    print $_;
    my $variation = $variation_adaptor->fetch_by_name('rs1333049');  # Test data
    print $variation;
    if ($variation) {
        my @vfs = @{ $variation->get_all_VariationFeatures };

        foreach my $vf (@vfs){
            my $ld = $vf->get_all_LD_values;  # error seems to occur here
            my @pops = @{ $vf->get_all_LD_Populations };
            my @ldvs = @{ $ld->get_variations };

            foreach my $pop (@pops) {

                if ($pop->name =~ /1000GENOMES/) {

                    foreach my $ldv (@ldvs) {
                        if ($ldv->stable_id ne $_) {
                            my @ldvfs = @{ $ldv->get_all_VariationFeatures };

                            foreach my $ldvf (@ldvfs) {
                                my @tvs = @{ $ldvf->get_all_TranscriptVariations };
                                my $r2 = $ld->get_r_square($vf, $ldvf, $pop);

                                foreach my $tv (@tvs) {
                                    my $gene = $tv->transcript->get_Gene;

                                    if ($r2 > 0.8) {
                                        print $variation->stable_id, "\t", $ldv->stable_id, "\t", $gene->external_name, "\t", $r2, "\t", $pop->name, "\n";
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Error from terminal:

-------------------- WARNING ----------------------
MSG: 'Bio::EnsEMBL::Variation::DBSQL::VCFCollectionAdaptor' cannot be found.
Exception Can't locate JSON.pm in @INC (@INC contains: /software/lib/perl5/x86_64-linux-thread-multi /software/lib/perl5 /software/lib/perl5/5.10.1/x86_64-linux-thread-multi /software/lib/perl5/5.10.1 /usit/invitro/data/common_software/share/perl5/5.10.1 /hpc/lib/perl5 /cluster/lib/perl5 /usit/abel/u1/johannhh/src/BioPerl-1.6.1 /usit/abel/u1/johannhh/src/ensembl/modules /usit/abel/u1/johannhh/src/ensembl-compara/modules /usit/abel/u1/johannhh/src/ensembl-variation/modules /usit/abel/u1/johannhh/src/ensembl-funcgen/modules /usit/abel/u1/johannhh/src/lib64/perl5 /usit/abel/u1/johannhh/src/ensembl-io/modules /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm line 91, <> line 1.
BEGIN failed--compilation aborted at /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm line 91, <> line 1.
Compilation failed in require at (eval 260) line 3, <> line 1.


FILE: Bio/EnsEMBL/Registry.pm LINE: 1169
CALLED BY: EnsEMBL/DBSQL/DBAdaptor.pm  LINE: 988
Date (localtime)    = Fri Jan 15 11:32:21 2016
Ensembl API version = 83
---------------------------------------------------

-------------------- WARNING ----------------------
MSG: Could not find VCFCollection adaptor in the registry for homo_sapiens variation

FILE: EnsEMBL/DBSQL/DBAdaptor.pm LINE: 991
CALLED BY: Variation/DBSQL/LDFeatureContainerAdaptor.pm  LINE: 451
Date (localtime)    = Fri Jan 15 11:32:21 2016
Ensembl API version = 83
---------------------------------------------------

-------------------- EXCEPTION --------------------
MSG: Could not get adaptor VCFCollection for homo_sapiens variation

STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD /usit/abel/u1/johannhh/src/ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:995
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::_fetch_by_Slice_VCF /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:451
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_Slice /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:165
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_VariationFeature /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:246
STACK Bio::EnsEMBL::Variation::VariationFeature::get_all_LD_values /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VariationFeature.pm:1388
STACK toplevel ldCalculation.pl:28
Date (localtime)    = Fri Jan 15 11:32:21 2016
Ensembl API version = 83
—————————————————————————

When searching around, it seemed as though the error is somewhat connected to vcftools. However, I have followed the installation steps as described on the ensembl web page, and would expect the installation to have been successful.
For the installation, I have followed the guide here: http://www.ensembl.info/blog/2015/06/18/1000-genomes-phase-3-frequencies-genotypes-and-ld-data/
Here: http://www.ensembl.org/info/docs/api/api_installation.html
And here: http://www.ensembl.org/info/docs/api/api_git.html
(they overlap some, and my perl5-installation was put in ~/src/lib64/perl5 not in the path described in the guides) I believe I have set the correct values of PERL5LIB and PATH.

I am able to run the code under «LD-calculation» here: http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html
But when I try to run the «Using in the API script» from this page: http://www.ensembl.info/blog/2015/06/18/1000-genomes-phase-3-frequencies-genotypes-and-ld-data/ I get what seems to be a similar error:

-------------------- WARNING ----------------------
MSG: 'Bio::EnsEMBL::Variation::DBSQL::VCFCollectionAdaptor' cannot be found.
Exception Can't locate JSON.pm in @INC (@INC contains: /software/lib/perl5/x86_64-linux-thread-multi /software/lib/perl5 /software/lib/perl5/5.10.1/x86_64-linux-thread-multi /software/lib/perl5/5.10.1 /usit/invitro/data/common_software/share/perl5/5.10.1 /hpc/lib/perl5 /cluster/lib/perl5 /usit/abel/u1/johannhh/src/BioPerl-1.6.1 /usit/abel/u1/johannhh/src/ensembl/modules /usit/abel/u1/johannhh/src/ensembl-compara/modules /usit/abel/u1/johannhh/src/ensembl-variation/modules /usit/abel/u1/johannhh/src/ensembl-funcgen/modules /usit/abel/u1/johannhh/src/lib64/perl5 /usit/abel/u1/johannhh/src/ensembl-io/modules /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm line 91.
BEGIN failed--compilation aborted at /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm line 91.
Compilation failed in require at (eval 261) line 3.


FILE: Bio/EnsEMBL/Registry.pm LINE: 1169
CALLED BY: EnsEMBL/DBSQL/DBAdaptor.pm  LINE: 988
Date (localtime)    = Fri Jan 15 13:11:11 2016
Ensembl API version = 83
---------------------------------------------------

-------------------- WARNING ----------------------
MSG: Could not find VCFCollection adaptor in the registry for homo_sapiens variation

FILE: EnsEMBL/DBSQL/DBAdaptor.pm LINE: 991
CALLED BY: Variation/DBSQL/SampleGenotypeAdaptor.pm  LINE: 287
Date (localtime)    = Fri Jan 15 13:11:11 2016
Ensembl API version = 83
---------------------------------------------------

-------------------- EXCEPTION --------------------
MSG: Could not get adaptor VCFCollection for homo_sapiens variation

STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD /usit/abel/u1/johannhh/src/ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:995
STACK Bio::EnsEMBL::Variation::DBSQL::SampleGenotypeAdaptor::fetch_all_by_Variation /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/SampleGenotypeAdaptor.pm:287
STACK Bio::EnsEMBL::Variation::Variation::get_all_SampleGenotypes /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/Variation.pm:987
STACK Bio::EnsEMBL::Variation::DBSQL::AlleleAdaptor::_fetch_all_by_Variation_from_Genotypes /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/AlleleAdaptor.pm:307
STACK Bio::EnsEMBL::Variation::DBSQL::AlleleAdaptor::fetch_all_by_Variation /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/AlleleAdaptor.pm:273
STACK Bio::EnsEMBL::Variation::Variation::get_all_Alleles /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/Variation.pm:861
STACK toplevel ensembleapi.pl:17
Date (localtime)    = Fri Jan 15 13:11:11 2016
Ensembl API version = 83
—————————————————————————

Could you please assist me in discovering the reason of these errors?

As a side note: I am very new to perl and the ensembl api. My apologies if this question has been answered previously.
Also, if you have a database of ldSNPs available (as opposed to calculating them on the fly as above), I would appreciate if you notified me. The correspondence I found on this topic in different forums so far suggested otherwise, but some of it was quite old and perhaps outdated.

Best,
Johanne H. Horn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160115/803d04db/attachment.html>


More information about the Dev mailing list