[ensembl-dev] Getting LD data from 1000G phase 3 perl API (v83)

Sarah Hunt seh at ebi.ac.uk
Fri Jan 15 12:44:43 GMT 2016


Hi Johanne,

Thanks for the detailed report. It looks as if you don't have the 
JSON.pm module installed or in your path. Adding this will sort out the 
problem.

Best wishes,

Sarah

On 15/01/2016 12:35, Johanne Håøy Horn wrote:
> Dear Ensembl dev team,
>
> I wish to use your perl API (version 83) to create a program that does 
> the following: Have as input a list of SNPs by rsID, fetch 1000G phase 
> 3 LD SNPs and provide list of the ones with r^2 > 0.8 as output.
>
> I tried following the example posted by Emily here: 
> https://www.biostars.org/p/109785/#147784
>
> I seem to want the same functionality and have the same problems as 
> the user in the correspondence, but while they appear to have found a 
> solution, I still have two issues right now:
> 1) I provide an input file to the script, where each line is rs###. I 
> am able to print out this id using $_ as rs###\n, but when using 
> $variation_adaptor->fetch_by_name($_); I do not get a $variation 
> object, just undefined. Therefore, in the code below, I have used a 
> test rsID so far. Do you have any suggestions on why the rsID from my 
> input file gives undefined variations?
> 2) When running the following perl code, I get the error pasted below:
>
> use strict;
> use warnings;
> use Bio::EnsEMBL::Registry;
>
> my $registry = 'Bio::EnsEMBL::Registry';
>
> $registry->load_registry_from_db(
>   -host => 'ensembldb.ensembl.org <http://ensembldb.ensembl.org>',
>   -user => 'anonymous'
>     );
>
> my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 
> 'variation', 'variation' );
> $variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also
>
> # For each rsID in the input file from calling python script:
> while (<>) {
>     # Find all SNPs in LD and print
>     # $_ represents a line from stdin
>     print $_;
>     my $variation = $variation_adaptor->fetch_by_name('rs1333049'); # 
> Test data
>     print $variation;
>     if ($variation) {
>         my @vfs = @{ $variation->get_all_VariationFeatures };
>
>         foreach my $vf (@vfs){
>             my $ld = $vf->get_all_LD_values; # error seems to occur here
>             my @pops = @{ $vf->get_all_LD_Populations };
>             my @ldvs = @{ $ld->get_variations };
>
>             foreach my $pop (@pops) {
>
>                 if ($pop->name =~ /1000GENOMES/) {
>
> foreach my $ldv (@ldvs) {
>     if ($ldv->stable_id ne $_) {
>         my @ldvfs = @{ $ldv->get_all_VariationFeatures };
>
>         foreach my $ldvf (@ldvfs) {
>             my @tvs = @{ $ldvf->get_all_TranscriptVariations };
>             my $r2 = $ld->get_r_square($vf, $ldvf, $pop);
>
>             foreach my $tv (@tvs) {
>                 my $gene = $tv->transcript->get_Gene;
>
>                 if ($r2 > 0.8) {
>                     print $variation->stable_id, "\t", 
> $ldv->stable_id, "\t", $gene->external_name, "\t", $r2, "\t", 
> $pop->name, "\n";
>                 }
>             }
>         }
>     }
> }
>                 }
>             }
>         }
>     }
> }
>
> *Error from terminal:*
> *
> *
> -------------------- WARNING ----------------------
> MSG: 'Bio::EnsEMBL::Variation::DBSQL::VCFCollectionAdaptor' cannot be 
> found.
> Exception Can't locate JSON.pm in @INC (@INC contains: 
> /software/lib/perl5/x86_64-linux-thread-multi /software/lib/perl5 
> /software/lib/perl5/5.10.1/x86_64-linux-thread-multi 
> /software/lib/perl5/5.10.1 
> /usit/invitro/data/common_software/share/perl5/5.10.1 /hpc/lib/perl5 
> /cluster/lib/perl5 /usit/abel/u1/johannhh/src/BioPerl-1.6.1 
> /usit/abel/u1/johannhh/src/ensembl/modules 
> /usit/abel/u1/johannhh/src/ensembl-compara/modules 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules 
> /usit/abel/u1/johannhh/src/ensembl-funcgen/modules 
> /usit/abel/u1/johannhh/src/lib64/perl5 
> /usit/abel/u1/johannhh/src/ensembl-io/modules /usr/local/lib64/perl5 
> /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl 
> /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm 
> line 91, <> line 1.
> BEGIN failed--compilation aborted at 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm 
> line 91, <> line 1.
> Compilation failed in require at (eval 260) line 3, <> line 1.
>
>
> FILE: Bio/EnsEMBL/Registry.pm LINE: 1169
> CALLED BY: EnsEMBL/DBSQL/DBAdaptor.pm  LINE: 988
> Date (localtime)  = Fri Jan 15 11:32:21 2016
> Ensembl API version = 83
> ---------------------------------------------------
>
> -------------------- WARNING ----------------------
> MSG: Could not find VCFCollection adaptor in the registry for 
> homo_sapiens variation
>
> FILE: EnsEMBL/DBSQL/DBAdaptor.pm LINE: 991
> CALLED BY: Variation/DBSQL/LDFeatureContainerAdaptor.pm  LINE: 451
> Date (localtime)  = Fri Jan 15 11:32:21 2016
> Ensembl API version = 83
> ---------------------------------------------------
>
> -------------------- EXCEPTION --------------------
> MSG: Could not get adaptor VCFCollection for homo_sapiens variation
>
> STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD 
> /usit/abel/u1/johannhh/src/ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:995
> STACK 
> Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::_fetch_by_Slice_VCF 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:451
> STACK 
> Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_Slice 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:165
> STACK 
> Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_VariationFeature 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:246
> STACK Bio::EnsEMBL::Variation::VariationFeature::get_all_LD_values 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VariationFeature.pm:1388
> STACK toplevel ldCalculation.pl:28
> Date (localtime)  = Fri Jan 15 11:32:21 2016
> Ensembl API version = 83
> —————————————————————————
>
> When searching around, it seemed as though the error is somewhat 
> connected to vcftools. However, I have followed the installation steps 
> as described on the ensembl web page, and would expect the 
> installation to have been successful.
> For the installation, I have followed the guide here: 
> http://www.ensembl.info/blog/2015/06/18/1000-genomes-phase-3-frequencies-genotypes-and-ld-data/
> Here: http://www.ensembl.org/info/docs/api/api_installation.html
> And here: http://www.ensembl.org/info/docs/api/api_git.html
> (they overlap some, and my perl5-installation was put in 
> ~/src/lib64/perl5 not in the path described in the guides) I believe I 
> have set the correct values of PERL5LIB and PATH.
>
> I am able to run the code under «LD-calculation» here: 
> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html
> But when I try to run the «Using in the API script» from this page: 
> http://www.ensembl.info/blog/2015/06/18/1000-genomes-phase-3-frequencies-genotypes-and-ld-data/ I 
> get what seems to be a similar error:
>
> -------------------- WARNING ----------------------
> MSG: 'Bio::EnsEMBL::Variation::DBSQL::VCFCollectionAdaptor' cannot be 
> found.
> Exception Can't locate JSON.pm in @INC (@INC contains: 
> /software/lib/perl5/x86_64-linux-thread-multi /software/lib/perl5 
> /software/lib/perl5/5.10.1/x86_64-linux-thread-multi 
> /software/lib/perl5/5.10.1 
> /usit/invitro/data/common_software/share/perl5/5.10.1 /hpc/lib/perl5 
> /cluster/lib/perl5 /usit/abel/u1/johannhh/src/BioPerl-1.6.1 
> /usit/abel/u1/johannhh/src/ensembl/modules 
> /usit/abel/u1/johannhh/src/ensembl-compara/modules 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules 
> /usit/abel/u1/johannhh/src/ensembl-funcgen/modules 
> /usit/abel/u1/johannhh/src/lib64/perl5 
> /usit/abel/u1/johannhh/src/ensembl-io/modules /usr/local/lib64/perl5 
> /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl 
> /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm 
> line 91.
> BEGIN failed--compilation aborted at 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm 
> line 91.
> Compilation failed in require at (eval 261) line 3.
>
>
> FILE: Bio/EnsEMBL/Registry.pm LINE: 1169
> CALLED BY: EnsEMBL/DBSQL/DBAdaptor.pm  LINE: 988
> Date (localtime)  = Fri Jan 15 13:11:11 2016
> Ensembl API version = 83
> ---------------------------------------------------
>
> -------------------- WARNING ----------------------
> MSG: Could not find VCFCollection adaptor in the registry for 
> homo_sapiens variation
>
> FILE: EnsEMBL/DBSQL/DBAdaptor.pm LINE: 991
> CALLED BY: Variation/DBSQL/SampleGenotypeAdaptor.pm  LINE: 287
> Date (localtime)  = Fri Jan 15 13:11:11 2016
> Ensembl API version = 83
> ---------------------------------------------------
>
> -------------------- EXCEPTION --------------------
> MSG: Could not get adaptor VCFCollection for homo_sapiens variation
>
> STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD 
> /usit/abel/u1/johannhh/src/ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:995
> STACK 
> Bio::EnsEMBL::Variation::DBSQL::SampleGenotypeAdaptor::fetch_all_by_Variation 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/SampleGenotypeAdaptor.pm:287
> STACK Bio::EnsEMBL::Variation::Variation::get_all_SampleGenotypes 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/Variation.pm:987
> STACK 
> Bio::EnsEMBL::Variation::DBSQL::AlleleAdaptor::_fetch_all_by_Variation_from_Genotypes 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/AlleleAdaptor.pm:307
> STACK 
> Bio::EnsEMBL::Variation::DBSQL::AlleleAdaptor::fetch_all_by_Variation 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/AlleleAdaptor.pm:273
> STACK Bio::EnsEMBL::Variation::Variation::get_all_Alleles 
> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/Variation.pm:861
> STACK toplevel ensembleapi.pl:17
> Date (localtime)  = Fri Jan 15 13:11:11 2016
> Ensembl API version = 83
> —————————————————————————
>
> Could you please assist me in discovering the reason of these errors?
>
> As a side note: I am very new to perl and the ensembl api. My 
> apologies if this question has been answered previously.
> Also, if you have a database of ldSNPs available (as opposed to 
> calculating them on the fly as above), I would appreciate if you 
> notified me. The correspondence I found on this topic in different 
> forums so far suggested otherwise, but some of it was quite old and 
> perhaps outdated.
>
> Best,
> Johanne H. Horn
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160115/f6fb5272/attachment.html>


More information about the Dev mailing list