[ensembl-dev] Optimizing VEP speed and plugins.

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Wed Jun 5 11:41:40 BST 2013


Hello dear developers,

I think I've a stable configuration that fits my needs for huge number 
of variants.

My last VCF input has 400.000 variants.

Currently on my cluster, with one compute node (x8 cpu and 32GB RAM) 
using 8 threads and a buffer size of 15000 variants the time for VEP 
with all the plugins and options I need it takes VEP 2h 56min to 
calculate 15000 variants.

I'm using a local ensembl71 database replica + cache for homo_sapiens. 
So the time to load vars into memory is very small.

The 99% of time it takes the VEP script it's obviously from "Calculating 
consequences".

I've also noticed that VEP with 8 threads consumes the 100% of my 8 CPUs 
with 8 threads, it's really great. But RAM load being used is very low 8GB.

So I've a few questions.

  * Has someone achieved to parallelize VEP process with MPI or OpenMPI?
    It would be awesome being able to select for example 16 threads and
    being able to distribute 8 and 8 threads between two different
    machines (compute nodes).

  * In order to optimize self coded plugins, I've been reading into this
    from VEP ensembl website: /"VEP users writing plugins should be
    aware that while the VEP code attempts to preserve the state of any
    plugin-specific cached data between separate forks, there may be
    situations where data is lost. If you find this is the case, you
    should disable forking in the new() method of your plugin by
    deleting the "fork" key from the $config hash."

    /I had no problems with my plugins after fixing them (thanks to the
    great support of developers on this list). But I feel they're
    slowing VEP I'm sure they can be optimized. I really would like a
    direction, guide or some tips that I could use to optimize my code.

  * I hope a new way to share plugins between VEP users is available
    soon, so we can help, give tips between all devs to improve the
    code, speed, results etc..


Thank you !

Best regards,
Guillermo.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130605/336b43ea/attachment.html>


More information about the Dev mailing list