[ensembl-dev] [Compara] Memory leak?
Giuseppe G.
G.Gallone at sms.ed.ac.uk
Tue Jan 24 19:44:33 GMT 2012
Hi,
I'm running a pipeline composed of three blocks. Block 1 uses the
Ensembl Api, Block 2 uses the output of Block 1 but not the Ensembl Api,
Block 3 uses the output of Block 3 and processes it using the Ensembl
Api again. My code is structured as follows
-set up registry
-pass registry to block 1; create relevant adapters and do something
-registry->clear
-pass output_block_1 to bloc 2; do something
-set up registry
-pass registry and output_block_2 to block_3; create relevant adapters
and do something
-registry->clear
-finish
Both block_1 and block_3 will operate on some text input file creating
as many different ensembl adapters as needed, cycling on each text entry
and then writing to a output file.
Now this has worked well for a number of years (since rel. 58 I'd say).
It has been used on genome-wide lists of ensembl gene IDs without any
problems.
Since r.64, however, I'm having problems completing the process, even
for relatively small input files (~1500 IDs). The pipeline will not
complete running block_3 and quit with an "out of memory!" perl error.
Memory usage will approach 60% halfway through block_3 on a i686 4GB ram
machine running Unix.
I'm currently puzzled as to what the reason for the following behaviour
might be: if I run the third block alone (ie I comment out the code from
block1 and 2 in my script, and give block_3 the completed output from
block2) block3 will complete. But, if I understand correctly, the
clear() method should disconnect and release all memory used by the
ensembl connection in block_1. So why does the presence of two pipeline
blocks, both using the API, crash my script - which will instead
complete ok if I run only one block at a time? Is there maybe a more
appropriate method to use on completion of a registry session, rather
than clear()? Or maybe for some unknown reason (to me) Perl does not
fully release memory during runtime?
I've started doing some memory profiling in block3, using Devel::Size,
Devel::Gladiator, etc. In block3 I create the following adapters:
gene
genomeDB
member
homology
proteintree
methodlinkspeciesset
NCBItaxon
iterating through the input lines, I'm checking the total size of each
of these. Having reached approximately 10% of my input file, all of
these stay constant in size - apart from the member_adaptor which has
grown 145-fold its initial size (I'm talking about the size of the full
data structures here).
I was wondering if the member adaptor's size increase is to be expected
and if the reason for my out of memory errors has to be found somewhere
else. I did attempt deactivating caching through the registry calls, but
no luck. Your help would be greatly appreciates as usual. Thanks a lot!
Giuseppe
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the Dev
mailing list