[ensembl-dev] Getting LD data from 1000G phase 3 perl API (v83)

mag mr6 at ebi.ac.uk
Fri Jan 15 12:50:24 GMT 2016


Hi Johanne,

As for your first problem, have you tried trimming off the input line?
According to your comment, it print outs as rs##\n, so the additional \n 
will cause problems.
You should be able to remove it with the following command:
chomp;


Regards,
Magali

On 15/01/2016 12:44, Sarah Hunt wrote:
>
> Hi Johanne,
>
> Thanks for the detailed report. It looks as if you don't have the 
> JSON.pm module installed or in your path. Adding this will sort out 
> the problem.
>
> Best wishes,
>
> Sarah
>
> On 15/01/2016 12:35, Johanne Håøy Horn wrote:
>> Dear Ensembl dev team,
>>
>> I wish to use your perl API (version 83) to create a program that 
>> does the following: Have as input a list of SNPs by rsID, fetch 1000G 
>> phase 3 LD SNPs and provide list of the ones with r^2 > 0.8 as output.
>>
>> I tried following the example posted by Emily here: 
>> https://www.biostars.org/p/109785/#147784
>>
>> I seem to want the same functionality and have the same problems as 
>> the user in the correspondence, but while they appear to have found a 
>> solution, I still have two issues right now:
>> 1) I provide an input file to the script, where each line is rs###. I 
>> am able to print out this id using $_ as rs###\n, but when using 
>> $variation_adaptor->fetch_by_name($_); I do not get a $variation 
>> object, just undefined. Therefore, in the code below, I have used a 
>> test rsID so far. Do you have any suggestions on why the rsID from my 
>> input file gives undefined variations?
>> 2) When running the following perl code, I get the error pasted below:
>>
>> use strict;
>> use warnings;
>> use Bio::EnsEMBL::Registry;
>>
>> my $registry = 'Bio::EnsEMBL::Registry';
>>
>> $registry->load_registry_from_db(
>>   -host => 'ensembldb.ensembl.org <http://ensembldb.ensembl.org>',
>>   -user => 'anonymous'
>>     );
>>
>> my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 
>> 'variation', 'variation' );
>> $variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also
>>
>> # For each rsID in the input file from calling python script:
>> while (<>) {
>>     # Find all SNPs in LD and print
>>     # $_ represents a line from stdin
>>     print $_;
>>     my $variation = $variation_adaptor->fetch_by_name('rs1333049'); # 
>> Test data
>>     print $variation;
>>     if ($variation) {
>>         my @vfs = @{ $variation->get_all_VariationFeatures };
>>
>>         foreach my $vf (@vfs){
>>             my $ld = $vf->get_all_LD_values; # error seems to occur here
>>             my @pops = @{ $vf->get_all_LD_Populations };
>>             my @ldvs = @{ $ld->get_variations };
>>
>> foreach my $pop (@pops) {
>>
>>                 if ($pop->name =~ /1000GENOMES/) {
>>
>>   foreach my $ldv (@ldvs) {
>>       if ($ldv->stable_id ne $_) {
>>           my @ldvfs = @{ $ldv->get_all_VariationFeatures };
>>
>>           foreach my $ldvf (@ldvfs) {
>>               my @tvs = @{ $ldvf->get_all_TranscriptVariations };
>>               my $r2 = $ld->get_r_square($vf, $ldvf, $pop);
>>
>>               foreach my $tv (@tvs) {
>>                   my $gene = $tv->transcript->get_Gene;
>>
>>                   if ($r2 > 0.8) {
>>                       print $variation->stable_id, "\t", 
>> $ldv->stable_id, "\t", $gene->external_name, "\t", $r2, "\t", 
>> $pop->name, "\n";
>>                   }
>>               }
>>           }
>>       }
>>   }
>>                 }
>>             }
>>         }
>>     }
>> }
>>
>> *Error from terminal:*
>> *
>> *
>> -------------------- WARNING ----------------------
>> MSG: 'Bio::EnsEMBL::Variation::DBSQL::VCFCollectionAdaptor' cannot be 
>> found.
>> Exception Can't locate JSON.pm in @INC (@INC contains: 
>> /software/lib/perl5/x86_64-linux-thread-multi /software/lib/perl5 
>> /software/lib/perl5/5.10.1/x86_64-linux-thread-multi 
>> /software/lib/perl5/5.10.1 
>> /usit/invitro/data/common_software/share/perl5/5.10.1 /hpc/lib/perl5 
>> /cluster/lib/perl5 /usit/abel/u1/johannhh/src/BioPerl-1.6.1 
>> /usit/abel/u1/johannhh/src/ensembl/modules 
>> /usit/abel/u1/johannhh/src/ensembl-compara/modules 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules 
>> /usit/abel/u1/johannhh/src/ensembl-funcgen/modules 
>> /usit/abel/u1/johannhh/src/lib64/perl5 
>> /usit/abel/u1/johannhh/src/ensembl-io/modules /usr/local/lib64/perl5 
>> /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl 
>> /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm 
>> line 91, <> line 1.
>> BEGIN failed--compilation aborted at 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm 
>> line 91, <> line 1.
>> Compilation failed in require at (eval 260) line 3, <> line 1.
>>
>>
>> FILE: Bio/EnsEMBL/Registry.pm LINE: 1169
>> CALLED BY: EnsEMBL/DBSQL/DBAdaptor.pm  LINE: 988
>> Date (localtime)  = Fri Jan 15 11:32:21 2016
>> Ensembl API version = 83
>> ---------------------------------------------------
>>
>> -------------------- WARNING ----------------------
>> MSG: Could not find VCFCollection adaptor in the registry for 
>> homo_sapiens variation
>>
>> FILE: EnsEMBL/DBSQL/DBAdaptor.pm LINE: 991
>> CALLED BY: Variation/DBSQL/LDFeatureContainerAdaptor.pm  LINE: 451
>> Date (localtime)  = Fri Jan 15 11:32:21 2016
>> Ensembl API version = 83
>> ---------------------------------------------------
>>
>> -------------------- EXCEPTION --------------------
>> MSG: Could not get adaptor VCFCollection for homo_sapiens variation
>>
>> STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD 
>> /usit/abel/u1/johannhh/src/ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:995
>> STACK 
>> Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::_fetch_by_Slice_VCF/usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:451
>> STACK 
>> Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_Slice/usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:165
>> STACK 
>> Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_VariationFeature/usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:246
>> STACK Bio::EnsEMBL::Variation::VariationFeature::get_all_LD_values 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VariationFeature.pm:1388
>> STACK toplevel ldCalculation.pl:28
>> Date (localtime)  = Fri Jan 15 11:32:21 2016
>> Ensembl API version = 83
>> —————————————————————————
>>
>> When searching around, it seemed as though the error is somewhat 
>> connected to vcftools. However, I have followed the installation 
>> steps as described on the ensembl web page, and would expect the 
>> installation to have been successful.
>> For the installation, I have followed the guide here: 
>> http://www.ensembl.info/blog/2015/06/18/1000-genomes-phase-3-frequencies-genotypes-and-ld-data/
>> Here: http://www.ensembl.org/info/docs/api/api_installation.html
>> And here: http://www.ensembl.org/info/docs/api/api_git.html
>> (they overlap some, and my perl5-installation was put in 
>> ~/src/lib64/perl5 not in the path described in the guides) I believe 
>> I have set the correct values of PERL5LIB and PATH.
>>
>> I am able to run the code under «LD-calculation» here: 
>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html
>> But when I try to run the «Using in the API script» from this page: 
>> http://www.ensembl.info/blog/2015/06/18/1000-genomes-phase-3-frequencies-genotypes-and-ld-data/ I 
>> get what seems to be a similar error:
>>
>> -------------------- WARNING ----------------------
>> MSG: 'Bio::EnsEMBL::Variation::DBSQL::VCFCollectionAdaptor' cannot be 
>> found.
>> Exception Can't locate JSON.pm in @INC (@INC contains: 
>> /software/lib/perl5/x86_64-linux-thread-multi /software/lib/perl5 
>> /software/lib/perl5/5.10.1/x86_64-linux-thread-multi 
>> /software/lib/perl5/5.10.1 
>> /usit/invitro/data/common_software/share/perl5/5.10.1 /hpc/lib/perl5 
>> /cluster/lib/perl5 /usit/abel/u1/johannhh/src/BioPerl-1.6.1 
>> /usit/abel/u1/johannhh/src/ensembl/modules 
>> /usit/abel/u1/johannhh/src/ensembl-compara/modules 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules 
>> /usit/abel/u1/johannhh/src/ensembl-funcgen/modules 
>> /usit/abel/u1/johannhh/src/lib64/perl5 
>> /usit/abel/u1/johannhh/src/ensembl-io/modules /usr/local/lib64/perl5 
>> /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl 
>> /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm 
>> line 91.
>> BEGIN failed--compilation aborted at 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VCFCollectionAdaptor.pm 
>> line 91.
>> Compilation failed in require at (eval 261) line 3.
>>
>>
>> FILE: Bio/EnsEMBL/Registry.pm LINE: 1169
>> CALLED BY: EnsEMBL/DBSQL/DBAdaptor.pm  LINE: 988
>> Date (localtime)  = Fri Jan 15 13:11:11 2016
>> Ensembl API version = 83
>> ---------------------------------------------------
>>
>> -------------------- WARNING ----------------------
>> MSG: Could not find VCFCollection adaptor in the registry for 
>> homo_sapiens variation
>>
>> FILE: EnsEMBL/DBSQL/DBAdaptor.pm LINE: 991
>> CALLED BY: Variation/DBSQL/SampleGenotypeAdaptor.pm  LINE: 287
>> Date (localtime)  = Fri Jan 15 13:11:11 2016
>> Ensembl API version = 83
>> ---------------------------------------------------
>>
>> -------------------- EXCEPTION --------------------
>> MSG: Could not get adaptor VCFCollection for homo_sapiens variation
>>
>> STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD 
>> /usit/abel/u1/johannhh/src/ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:995
>> STACK 
>> Bio::EnsEMBL::Variation::DBSQL::SampleGenotypeAdaptor::fetch_all_by_Variation/usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/SampleGenotypeAdaptor.pm:287
>> STACK Bio::EnsEMBL::Variation::Variation::get_all_SampleGenotypes 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/Variation.pm:987
>> STACK 
>> Bio::EnsEMBL::Variation::DBSQL::AlleleAdaptor::_fetch_all_by_Variation_from_Genotypes/usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/AlleleAdaptor.pm:307
>> STACK 
>> Bio::EnsEMBL::Variation::DBSQL::AlleleAdaptor::fetch_all_by_Variation 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/AlleleAdaptor.pm:273
>> STACK Bio::EnsEMBL::Variation::Variation::get_all_Alleles 
>> /usit/abel/u1/johannhh/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/Variation.pm:861
>> STACK toplevel ensembleapi.pl:17
>> Date (localtime)  = Fri Jan 15 13:11:11 2016
>> Ensembl API version = 83
>> —————————————————————————
>>
>> Could you please assist me in discovering the reason of these errors?
>>
>> As a side note: I am very new to perl and the ensembl api. My 
>> apologies if this question has been answered previously.
>> Also, if you have a database of ldSNPs available (as opposed to 
>> calculating them on the fly as above), I would appreciate if you 
>> notified me. The correspondence I found on this topic in different 
>> forums so far suggested otherwise, but some of it was quite old and 
>> perhaps outdated.
>>
>> Best,
>> Johanne H. Horn
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160115/899ce355/attachment.html>


More information about the Dev mailing list