[ensembl-dev] [Compara] Memory leak?
Javier Herrero
jherrero at ebi.ac.uk
Wed Jan 25 14:35:48 GMT 2012
Apologies, my fingers slipped on the keyboard and sent the email before
it was ready... :-(
So, the cache in the member adaptor has been added for a specific
pipeline. Unfortunately, the implementation is different than in other
adaptors and the common ways to clear the cache do not work.
We don't have a final solution right now. However, you can manually
clear the cache by using:
$member_adaptor->{'_member_cache'} = undef;
Sorry for the inconvenience. I hope this works for you.
Javier
On 25/01/12 14:31, Javier Herrero wrote:
> Hi Giuseppe
>
> The cache in the member adaptor has been added for a specific
> pipeline, but does not follow the same procedure as other adaptors
>
> On 24/01/12 19:44, Giuseppe G. wrote:
>> Hi,
>>
>> I'm running a pipeline composed of three blocks. Block 1 uses the
>> Ensembl Api, Block 2 uses the output of Block 1 but not the Ensembl
>> Api, Block 3 uses the output of Block 3 and processes it using the
>> Ensembl Api again. My code is structured as follows
>>
>>
>> -set up registry
>> -pass registry to block 1; create relevant adapters and do something
>> -registry->clear
>> -pass output_block_1 to bloc 2; do something
>> -set up registry
>> -pass registry and output_block_2 to block_3; create relevant
>> adapters and do something
>> -registry->clear
>> -finish
>>
>> Both block_1 and block_3 will operate on some text input file
>> creating as many different ensembl adapters as needed, cycling on
>> each text entry and then writing to a output file.
>>
>> Now this has worked well for a number of years (since rel. 58 I'd
>> say). It has been used on genome-wide lists of ensembl gene IDs
>> without any problems.
>>
>> Since r.64, however, I'm having problems completing the process, even
>> for relatively small input files (~1500 IDs). The pipeline will not
>> complete running block_3 and quit with an "out of memory!" perl
>> error. Memory usage will approach 60% halfway through block_3 on a
>> i686 4GB ram machine running Unix.
>>
>> I'm currently puzzled as to what the reason for the following
>> behaviour might be: if I run the third block alone (ie I comment out
>> the code from block1 and 2 in my script, and give block_3 the
>> completed output from block2) block3 will complete. But, if I
>> understand correctly, the clear() method should disconnect and
>> release all memory used by the ensembl connection in block_1. So why
>> does the presence of two pipeline blocks, both using the API, crash
>> my script - which will instead complete ok if I run only one block at
>> a time? Is there maybe a more appropriate method to use on completion
>> of a registry session, rather than clear()? Or maybe for some unknown
>> reason (to me) Perl does not fully release memory during runtime?
>>
>> I've started doing some memory profiling in block3, using
>> Devel::Size, Devel::Gladiator, etc. In block3 I create the following
>> adapters:
>>
>> gene
>> genomeDB
>> member
>> homology
>> proteintree
>> methodlinkspeciesset
>> NCBItaxon
>>
>> iterating through the input lines, I'm checking the total size of
>> each of these. Having reached approximately 10% of my input file, all
>> of these stay constant in size - apart from the member_adaptor which
>> has grown 145-fold its initial size (I'm talking about the size of
>> the full data structures here).
>>
>> I was wondering if the member adaptor's size increase is to be
>> expected and if the reason for my out of memory errors has to be
>> found somewhere else. I did attempt deactivating caching through the
>> registry calls, but no luck. Your help would be greatly appreciates
>> as usual. Thanks a lot!
>>
>> Giuseppe
>>
>
--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
More information about the Dev
mailing list