[ensembl-dev] flushing slice data from cache when using the ensembl perl API

Wed Oct 13 14:46:18 BST 2010

On Wed, Oct 13 2010, David Gacquer <dgacquer at ulb.ac.be> wrote:

> my $overlapping_genes = $slice->get_all_Genes();
> print "number of overlapping genes: ".scalar @{$overlapping_genes}."\n";
> while ( my $gene = shift @{$overlapping_genes} ) {
>       print "gene stable id: ".$gene->stable_id()."\n";
> ...
> }
> 
> I have a particular issue when the same genomic position appears
> several times in a row. All overlapping genes and transcripts are
> correctly found only the first time.

I haven't tested it, but I think if you change line 3 of the above code
snippet to

    foreach my $gene ( @{$overlapping_genes} ) {

it should work as expected.

the rational is this (but as I said, I didn't test it so I might be
wrong):

$slice->get_all_Genes() returns an array reference. the reference
actually refers to the slice feature cache, which is (AFAIR) a global
cache in the SliceAdaptor (so if you retrieve the same slice twice,
there is only one copy of each feature in the feature cache). if you
shift the feature array, you remove the first element from the list, and
since the list references the cache, you remove the feature from the
cache. so next time you use this slice, the feature will no longer be
there.

using a foreach loop rather than while/shift, you don't touch the cache,
so your code should be fine.

in general, while/shift sometimes improves performance (since you shrink
your dataset as you process it), but there's always the risk of shared
reference gotchas.

HTH
    patrick

-- 
Patrick Meidl, Mag.
Bioinformatician

Ce-M-M-
Research Centre for Molecular Medicine
of the Austrian Academy of Science

Lazarettgasse 14 / AKH BT 25.3
Vienna, Austria

room 02.205
phone +43 1 40160 70016
email pmeidl at cemm.oeaw.ac.at
web http://www.cemm.at/