[ensembl-dev] [Compara] Memory leak?

Thu Jan 26 14:49:24 GMT 2012

Hi Javier,

thanks for the answer. Apart from the member adaptor, do you have a 
hypothesis regarding what might cause the leak preventing me to complete 
my scripts? Thanks!

Giuseppe

On 25/01/12 14:35, Javier Herrero wrote:
> Apologies, my fingers slipped on the keyboard and sent the email before
> it was ready... :-(
>
> So, the cache in the member adaptor has been added for a specific
> pipeline. Unfortunately, the implementation is different than in other
> adaptors and the common ways to clear the cache do not work.
>
> We don't have a final solution right now. However, you can manually
> clear the cache by using:
>
> $member_adaptor->{'_member_cache'} = undef;
>
> Sorry for the inconvenience. I hope this works for you.
>
> Javier
>
> On 25/01/12 14:31, Javier Herrero wrote:
>> Hi Giuseppe
>>
>> The cache in the member adaptor has been added for a specific
>> pipeline, but does not follow the same procedure as other adaptors
>>
>> On 24/01/12 19:44, Giuseppe G. wrote:
>>> Hi,
>>>
>>> I'm running a pipeline composed of three blocks. Block 1 uses the
>>> Ensembl Api, Block 2 uses the output of Block 1 but not the Ensembl
>>> Api, Block 3 uses the output of Block 3 and processes it using the
>>> Ensembl Api again. My code is structured as follows
>>>
>>>
>>> -set up registry
>>> -pass registry to block 1; create relevant adapters and do something
>>> -registry->clear
>>> -pass output_block_1 to bloc 2; do something
>>> -set up registry
>>> -pass registry and output_block_2 to block_3; create relevant
>>> adapters and do something
>>> -registry->clear
>>> -finish
>>>
>>> Both block_1 and block_3 will operate on some text input file
>>> creating as many different ensembl adapters as needed, cycling on
>>> each text entry and then writing to a output file.
>>>
>>> Now this has worked well for a number of years (since rel. 58 I'd
>>> say). It has been used on genome-wide lists of ensembl gene IDs
>>> without any problems.
>>>
>>> Since r.64, however, I'm having problems completing the process, even
>>> for relatively small input files (~1500 IDs). The pipeline will not
>>> complete running block_3 and quit with an "out of memory!" perl
>>> error. Memory usage will approach 60% halfway through block_3 on a
>>> i686 4GB ram machine running Unix.
>>>
>>> I'm currently puzzled as to what the reason for the following
>>> behaviour might be: if I run the third block alone (ie I comment out
>>> the code from block1 and 2 in my script, and give block_3 the
>>> completed output from block2) block3 will complete. But, if I
>>> understand correctly, the clear() method should disconnect and
>>> release all memory used by the ensembl connection in block_1. So why
>>> does the presence of two pipeline blocks, both using the API, crash
>>> my script - which will instead complete ok if I run only one block at
>>> a time? Is there maybe a more appropriate method to use on completion
>>> of a registry session, rather than clear()? Or maybe for some unknown
>>> reason (to me) Perl does not fully release memory during runtime?
>>>
>>> I've started doing some memory profiling in block3, using
>>> Devel::Size, Devel::Gladiator, etc. In block3 I create the following
>>> adapters:
>>>
>>> gene
>>> genomeDB
>>> member
>>> homology
>>> proteintree
>>> methodlinkspeciesset
>>> NCBItaxon
>>>
>>> iterating through the input lines, I'm checking the total size of
>>> each of these. Having reached approximately 10% of my input file, all
>>> of these stay constant in size - apart from the member_adaptor which
>>> has grown 145-fold its initial size (I'm talking about the size of
>>> the full data structures here).
>>>
>>> I was wondering if the member adaptor's size increase is to be
>>> expected and if the reason for my out of memory errors has to be
>>> found somewhere else. I did attempt deactivating caching through the
>>> registry calls, but no luck. Your help would be greatly appreciates
>>> as usual. Thanks a lot!
>>>
>>> Giuseppe
>>>
>>
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.