[ensembl-dev] A question on lazy loading

Thu Sep 9 09:05:29 BST 2010

On Wed, Sep 08, 2010 at 03:24:26PM -0500, Ma, Man Chun John wrote:
> Hi all,
> 
> I'm writing a script on parse some Ensembl Variation data. For the sake
> of uniformity, our lab decided to use flanking sequences as extracted
> from the reference sequence, instead of what is stored in the
> flanking_sequence table. 
> I originally write the following, supposing $v is a Variation and $vf is
> a VariationFeature object of the same SNP:
> 
> [...]
> My
> $three_prime_flanking=$vf->Slice->subseq($vf->seq_region_start-101,$vf_r
> egion_start-1);
> [...]
> 
> When running the script under ActivePerl 5.10.1 with DBD-mysql 4.011, I
> found there has been a big increase in both network activity and memory
> use when it comes to this line, much more than similar scripts I ran
> under the same environment. However, after I changed it to the
> following, the script returned normal:
> 
> [...]
> My $vf_slice=$vf->Slice;
> My
> $three_prime_flanking=$vf_Slice->subseq($vf->seq_region_start-101,$vf_re
> gion_start-1);
> [...]
> 
> Is there something about the Ensembl API's lazy loading that I don't
> know here?

Hi John,

I can't see any technical difference between the two pieces of code.  In
the second one, you save $vf->Slice() as a temporary variable, but the
execution path is no different from the first piece of code.

My guess is that if you run the pieces of code after each other, the
second one would benefit from the cached data (on the MySQL server side)
left behind from the first one.

However, I don't know what else you're doing with $vf_slice later.  If
you're reusing it, you might unknowingly use its internally cached
sequence, which would account for the decrease in network traffic
compared to creating new Slice objects from $vf over and over again.

Andreas

-- 
Andreas Kähäri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom