[ensembl-dev] [Compara] Memory leak?

Javier Herrero jherrero at ebi.ac.uk
Thu Jan 26 15:10:32 GMT 2012


Hi Giuseppe

I take this hasn't completely resolved your problem.

The main issue with the cache in the adaptors is that it creates 
circular references ($object->adaptor and $adaptor->{cache}->object). 
The Perl memory garbage collector is unable to handle these circular 
references properly (unless you 'weaken' them). It will not delete the 
object because the adaptor refers to it and it won't delete the adaptor 
either for the same reason. Perl is unable to realise that neither 
object is needed anymore and will keep them in memory until the end of 
the execution.

I believe this can happen when you call $registry->clear(). Next time 
you request them, you'll get brand new adaptors but the old one is still 
in memory, albeit inaccessible. If you are still using $registry->clear, 
I'd try to remove those calls.

Maybe someone else has another suggestion.

I hope this helps

Javier


On 26/01/12 14:49, Giuseppe G. wrote:
> Hi Javier,
>
> thanks for the answer. Apart from the member adaptor, do you have a 
> hypothesis regarding what might cause the leak preventing me to 
> complete my scripts? Thanks!
>
> Giuseppe
>
> On 25/01/12 14:35, Javier Herrero wrote:
>> Apologies, my fingers slipped on the keyboard and sent the email before
>> it was ready... :-(
>>
>> So, the cache in the member adaptor has been added for a specific
>> pipeline. Unfortunately, the implementation is different than in other
>> adaptors and the common ways to clear the cache do not work.
>>
>> We don't have a final solution right now. However, you can manually
>> clear the cache by using:
>>
>> $member_adaptor->{'_member_cache'} = undef;
>>
>> Sorry for the inconvenience. I hope this works for you.
>>
>> Javier
>>
>> On 25/01/12 14:31, Javier Herrero wrote:
>>> Hi Giuseppe
>>>
>>> The cache in the member adaptor has been added for a specific
>>> pipeline, but does not follow the same procedure as other adaptors
>>>
>>> On 24/01/12 19:44, Giuseppe G. wrote:
>>>> Hi,
>>>>
>>>> I'm running a pipeline composed of three blocks. Block 1 uses the
>>>> Ensembl Api, Block 2 uses the output of Block 1 but not the Ensembl
>>>> Api, Block 3 uses the output of Block 3 and processes it using the
>>>> Ensembl Api again. My code is structured as follows
>>>>
>>>>
>>>> -set up registry
>>>> -pass registry to block 1; create relevant adapters and do something
>>>> -registry->clear
>>>> -pass output_block_1 to bloc 2; do something
>>>> -set up registry
>>>> -pass registry and output_block_2 to block_3; create relevant
>>>> adapters and do something
>>>> -registry->clear
>>>> -finish
>>>>
>>>> Both block_1 and block_3 will operate on some text input file
>>>> creating as many different ensembl adapters as needed, cycling on
>>>> each text entry and then writing to a output file.
>>>>
>>>> Now this has worked well for a number of years (since rel. 58 I'd
>>>> say). It has been used on genome-wide lists of ensembl gene IDs
>>>> without any problems.
>>>>
>>>> Since r.64, however, I'm having problems completing the process, even
>>>> for relatively small input files (~1500 IDs). The pipeline will not
>>>> complete running block_3 and quit with an "out of memory!" perl
>>>> error. Memory usage will approach 60% halfway through block_3 on a
>>>> i686 4GB ram machine running Unix.
>>>>
>>>> I'm currently puzzled as to what the reason for the following
>>>> behaviour might be: if I run the third block alone (ie I comment out
>>>> the code from block1 and 2 in my script, and give block_3 the
>>>> completed output from block2) block3 will complete. But, if I
>>>> understand correctly, the clear() method should disconnect and
>>>> release all memory used by the ensembl connection in block_1. So why
>>>> does the presence of two pipeline blocks, both using the API, crash
>>>> my script - which will instead complete ok if I run only one block at
>>>> a time? Is there maybe a more appropriate method to use on completion
>>>> of a registry session, rather than clear()? Or maybe for some unknown
>>>> reason (to me) Perl does not fully release memory during runtime?
>>>>
>>>> I've started doing some memory profiling in block3, using
>>>> Devel::Size, Devel::Gladiator, etc. In block3 I create the following
>>>> adapters:
>>>>
>>>> gene
>>>> genomeDB
>>>> member
>>>> homology
>>>> proteintree
>>>> methodlinkspeciesset
>>>> NCBItaxon
>>>>
>>>> iterating through the input lines, I'm checking the total size of
>>>> each of these. Having reached approximately 10% of my input file, all
>>>> of these stay constant in size - apart from the member_adaptor which
>>>> has grown 145-fold its initial size (I'm talking about the size of
>>>> the full data structures here).
>>>>
>>>> I was wondering if the member adaptor's size increase is to be
>>>> expected and if the reason for my out of memory errors has to be
>>>> found somewhere else. I did attempt deactivating caching through the
>>>> registry calls, but no luck. Your help would be greatly appreciates
>>>> as usual. Thanks a lot!
>>>>
>>>> Giuseppe
>>>>
>>>
>>
>

-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK





More information about the Dev mailing list