[ensembl-dev] Optimizing VEP speed and plugins.

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Thu Jun 6 13:24:14 BST 2013


Hello,

Are plugins using multiple threads feature? If not how can i force 
plugin to benefit from multiple threads when specified?

I've noticed launching vep with 8 CPUs at start of calculating 
consequences I can see the 8 perl processes running cpu at 100%. Then 
after 1hour or so just 1 thread is active with CPU at 100% the rest are 
still alive but not using cpu. Then after some time number of variants 
specified in buffer size is written into output file then then all the 
threads resume to use CPU.

So for VEP with 8 threads it would be something like:

  * Retrieve info from database
  * Calculate consequences with all threads at max CPU usage
  * Calculate consequences with 1 thread at max CPU usage
  * Write output

Repeat.

I would like to know if the time where only 1 CPU is being is due to the 
fact that plugins are maybe not using multiple threads option, or if 
it's because VEP script is doing any other thing that I don't know.

Thank you.

Best regards,
Guillermo.

On 06/05/2013 12:41 PM, Guillermo Marco Puche wrote:
> Hello dear developers,
>
> I think I've a stable configuration that fits my needs for huge number 
> of variants.
>
> My last VCF input has 400.000 variants.
>
> Currently on my cluster, with one compute node (x8 cpu and 32GB RAM) 
> using 8 threads and a buffer size of 15000 variants the time for VEP 
> with all the plugins and options I need it takes VEP 2h 56min to 
> calculate 15000 variants.
>
> I'm using a local ensembl71 database replica + cache for homo_sapiens. 
> So the time to load vars into memory is very small.
>
> The 99% of time it takes the VEP script it's obviously from 
> "Calculating consequences".
>
> I've also noticed that VEP with 8 threads consumes the 100% of my 8 
> CPUs with 8 threads, it's really great. But RAM load being used is 
> very low 8GB.
>
> So I've a few questions.
>
>   * Has someone achieved to parallelize VEP process with MPI or
>     OpenMPI? It would be awesome being able to select for example 16
>     threads and being able to distribute 8 and 8 threads between two
>     different machines (compute nodes).
>
>   * In order to optimize self coded plugins, I've been reading into
>     this from VEP ensembl website: /"VEP users writing plugins should
>     be aware that while the VEP code attempts to preserve the state of
>     any plugin-specific cached data between separate forks, there may
>     be situations where data is lost. If you find this is the case,
>     you should disable forking in the new() method of your plugin by
>     deleting the "fork" key from the $config hash."
>
>     /I had no problems with my plugins after fixing them (thanks to
>     the great support of developers on this list). But I feel they're
>     slowing VEP I'm sure they can be optimized. I really would like a
>     direction, guide or some tips that I could use to optimize my code.
>
>   * I hope a new way to share plugins between VEP users is available
>     soon, so we can help, give tips between all devs to improve the
>     code, speed, results etc..
>
>
> Thank you !
>
> Best regards,
> Guillermo.
>
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130606/4a738278/attachment.html>


More information about the Dev mailing list