[ensembl-dev] [Compara] Memory leak?
Javier Herrero
jherrero at ebi.ac.uk
Wed Jan 25 14:31:38 GMT 2012
Hi Giuseppe
The cache in the member adaptor has been added for a specific pipeline,
but does not follow the same procedure as other adaptors
On 24/01/12 19:44, Giuseppe G. wrote:
> Hi,
>
> I'm running a pipeline composed of three blocks. Block 1 uses the
> Ensembl Api, Block 2 uses the output of Block 1 but not the Ensembl
> Api, Block 3 uses the output of Block 3 and processes it using the
> Ensembl Api again. My code is structured as follows
>
>
> -set up registry
> -pass registry to block 1; create relevant adapters and do something
> -registry->clear
> -pass output_block_1 to bloc 2; do something
> -set up registry
> -pass registry and output_block_2 to block_3; create relevant adapters
> and do something
> -registry->clear
> -finish
>
> Both block_1 and block_3 will operate on some text input file creating
> as many different ensembl adapters as needed, cycling on each text
> entry and then writing to a output file.
>
> Now this has worked well for a number of years (since rel. 58 I'd
> say). It has been used on genome-wide lists of ensembl gene IDs
> without any problems.
>
> Since r.64, however, I'm having problems completing the process, even
> for relatively small input files (~1500 IDs). The pipeline will not
> complete running block_3 and quit with an "out of memory!" perl error.
> Memory usage will approach 60% halfway through block_3 on a i686 4GB
> ram machine running Unix.
>
> I'm currently puzzled as to what the reason for the following
> behaviour might be: if I run the third block alone (ie I comment out
> the code from block1 and 2 in my script, and give block_3 the
> completed output from block2) block3 will complete. But, if I
> understand correctly, the clear() method should disconnect and release
> all memory used by the ensembl connection in block_1. So why does the
> presence of two pipeline blocks, both using the API, crash my script -
> which will instead complete ok if I run only one block at a time? Is
> there maybe a more appropriate method to use on completion of a
> registry session, rather than clear()? Or maybe for some unknown
> reason (to me) Perl does not fully release memory during runtime?
>
> I've started doing some memory profiling in block3, using Devel::Size,
> Devel::Gladiator, etc. In block3 I create the following adapters:
>
> gene
> genomeDB
> member
> homology
> proteintree
> methodlinkspeciesset
> NCBItaxon
>
> iterating through the input lines, I'm checking the total size of each
> of these. Having reached approximately 10% of my input file, all of
> these stay constant in size - apart from the member_adaptor which has
> grown 145-fold its initial size (I'm talking about the size of the
> full data structures here).
>
> I was wondering if the member adaptor's size increase is to be
> expected and if the reason for my out of memory errors has to be found
> somewhere else. I did attempt deactivating caching through the
> registry calls, but no luck. Your help would be greatly appreciates as
> usual. Thanks a lot!
>
> Giuseppe
>
--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
More information about the Dev
mailing list