[ensembl-dev] VEP cache file creation for e74

Jan Vogel jan.vogel at gmail.com
Wed Mar 4 02:59:58 GMT 2015


Hello Will, 

I tried that option already - it’s correct, the adaptors.gz file does not get written, however the option does not prevent that the adaptor information is written into the other index files - like the sequence ones… … 

zcat homo_sapiens/74/1/230000001-231000000.gz | less

S{
    package Bio::EnsEMBL::DBSQL::DBConnection;
    use strict 'refs';
    $arg;
}

I also tried the —strip option in the hope that the DBadaptor info does not get written, however it’s not doing it :-( 


I’ve added a line to strip_transcript_cache as it was keeping the Translation Adaptor … now it works. 
I just sent you a pull request for the e74 checkout.  Below my change: 


    sub strip_transcript_cache {
        my $config = shift;
        my $cache = shift;

        foreach my $chr(keys %$cache) {
            foreach my $tr(@{$cache->{$chr}}) {
                foreach my $exon(@{$tr->{_trans_exon_array}}) {
                    delete $exon->{slice}->{adaptor};

                    for(qw(adaptor created_date modified_date is_current version is_constitutive _seq_cache dbID slice)) {
                        delete $exon->{$_};
                    }
                }

                delete $tr->{adaptor};
                delete $tr->{slice}->{adaptor};
                ## jhv change to NOT store the adaptor in the dumped files. 
                delete $tr->{translation}->{adaptor} if defined($tr->{translation});
            }
        }
    }





Thanks for your support, 

    Jan 


On Mar 3, 2015, at 1:16 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hi Jan,
> 
> If you add the flag --no_adaptor_cache to your command it should work OK.
> 
> This has been fixed in a newer version of VEP, but I guess you're tied to 74.
> 
> Regards
> 
> Will
> 
> More info: it's not the version of Storable that is the issue, it's that Storable is trying to serialise a DBI object that has an active connection to the database. The VEP's serialisation code takes care of this by disconnecting from the DB before the object gets serialised, but in this case it doesn't work (and FWIW caching in the adaptors is no longer necessary anyway, this behaviour should have been made the default way before 74).
> 
> On 3 March 2015 at 05:55, Jan Vogel <jan.vogel at gmail.com> wrote:
> 
> 
> Hello Ensembl and Will,
> 
> I’m running into trouble when I am trying to create / update my cache files for VEP. 
> 
> My best guess currently is, that the problem is either related to the ensembl code, or related to storing references with Storable.pm - I’m using version $VERSION = ‘2.41’ of this module.
> 
> Here’s how to reproduce the error :
> 
> perl  ensembl-tools-release-74/scripts/variant_effect_predictor/variant_effect_predictor.pl \
>        -build all -dir tmp --verbose --host <HOST> --user <USER> --pass <SECRET> --port 3326
> 
> ERROR: 
> 
> 2015-03-02 20:56:27 - Connected to core version 74 database and variation version 74 database
> 
> Can't store CODE items at …/perl/5.18.2/x86_64-linux-2.6-rhel6/lib/5.18.2/x86_64-linux-thread-multi/Storable.pm line 304, at 
> variant_effect_predictor/e74_api/ensembl-variation-release-74/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4261.
> 
> 
> So reading trough http://perldoc.perl.org/Storable.html#CODE-REFERENCES suggests to serialize code-references with B::Deparse. I’ve added 2 lines to the variant_effect_predictor.pl script: 
> 
> $Storable::Eval=1; 
> $Storable::Deparse=1; 
> 
> 
> Now, the script writes the binary cache files - however, the next problem occurs when I try to read them. 
> 
> Depending on which version of the Ensembl API I am using, I get these error messages: 
> 
> e75: 
> 
> 2015-03-02 21:42:33 - Reading cached adaptor data
> code sub {
>     package Bio::EnsEMBL::DBSQL::DBConnection;
>     use strict;
>     $args[0];
> } caused an error: Global symbol "@args" requires explicit package name at (eval 108) line 4
> 
> 
> e74:
> 2015-03-02 21:47:40 - Reading cached adaptor data
> code sub {
>     package Bio::EnsEMBL::DBSQL::DBConnection;
>     use strict;
>     $arg;
> } caused an error: Global symbol "$arg" requires explicit package name at (eval 130) line 4, at /gne/research/apps/ensembl/ensembl-74/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4281.
> 
> I’ve tried various work-arounds and different ensembl versions, but did not succeed. Any help welcome.
> 
> 
> The database I am currently working on is an e67 core which has been patched up to e74 ( so i can’t use the e74 cache files from the web). 
> 
> I tried to build indices with e77 but did not succeed either. 
> 
> 
> Any help welcome 
> 
>    Jan Vogel 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150303/be80c507/attachment.html>


More information about the Dev mailing list