[ensembl-dev] Peak memory/gzip use in VEP

Stuart Watt morungos at gmail.com
Mon Mar 7 15:48:31 GMT 2016


Hi all

I’m seeing hefty peak memory use from VEP and it’s breaking some of my cluster jobs. I think one of the issues is that it can spin up many gzip processes temporarily, these showed clearly in top. This was something of a surprise to me, as all the code I could see used IO::Uncompress::* when it was available. 

However, I did eventually find that deserialize_from_file in the ensemble-variation API is probably where this is happening. Can I maybe suggest that an option for in-process Perl based deserialization is allowed? My guess is that not running this through piped open() calls will actually speed performance here?

Am I on a sensible track with this?

All the best
Stuart



More information about the Dev mailing list