[ensembl-dev] Peak memory/gzip use in VEP

Will McLaren wm2 at ebi.ac.uk
Tue Mar 8 09:56:31 GMT 2016


Hi Stuart,

Thanks very much for the insightful report. While there are indeed many
open() calls, they shouldn't be concurrent (unless you are using --fork, in
which case you would only ever get as many as the number of forks you
specify).

It's something we've been intending to look at doing anyway, as the fewer
system calls there are, the better (assuming they can be replaced by module
subroutines from the Perl core). We will take a look at doing this for the
next release of VEP.

Regards

Will McLaren
Ensembl Variation



On 7 March 2016 at 15:48, Stuart Watt <morungos at gmail.com> wrote:

> Hi all
>
> I’m seeing hefty peak memory use from VEP and it’s breaking some of my
> cluster jobs. I think one of the issues is that it can spin up many gzip
> processes temporarily, these showed clearly in top. This was something of a
> surprise to me, as all the code I could see used IO::Uncompress::* when it
> was available.
>
> However, I did eventually find that deserialize_from_file in the
> ensemble-variation API is probably where this is happening. Can I maybe
> suggest that an option for in-process Perl based deserialization is
> allowed? My guess is that not running this through piped open() calls will
> actually speed performance here?
>
> Am I on a sensible track with this?
>
> All the best
> Stuart
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160308/f3cc4956/attachment.html>


More information about the Dev mailing list