[ensembl-dev] Optimizing VEP speed and plugins.

Sarah Hunt seh at ebi.ac.uk
Fri Jun 7 12:00:56 BST 2013


Hi Guillermo,

The plugins are not using threads but they will use forks when the -fork
option is specified, as mentioned in the documentation you quoted
previously.

Different batches of variants will have different data and therefore take
different lengths of time to analyse. When a batch of variants is
submitted for analysis it is divided in equal numbers between the required
number of forks which will take different lengths of time to complete
before the next batch is submitted for analysis.It is more likely one fork
will run longer than the others than they all complete together.

Best wishes,

Sarah

On Thu, Jun 6, 2013 at 1:24 PM, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  Hello,
>
> Are plugins using multiple threads feature? If not how can i force plugin
> to benefit from multiple threads when specified?
>
> I've noticed launching vep with 8 CPUs at start of calculating
> consequences I can see the 8 perl processes running cpu at 100%. Then after
> 1hour or so just 1 thread is active with CPU at 100% the rest are still
> alive but not using cpu. Then after some time number of variants specified
> in buffer size is written into output file then then all the threads resume
> to use CPU.
>
> So for VEP with 8 threads it would be something like:
>
>    - Retrieve info from database
>    - Calculate consequences with all threads at max CPU usage
>    - Calculate consequences with 1 thread at max CPU usage
>    - Write output
>
> Repeat.
>
> I would like to know if the time where only 1 CPU is being is due to the
> fact that plugins are maybe not using multiple threads option, or if it's
> because VEP script is doing any other thing that I don't know.
>
> Thank you.
>
> Best regards,
> Guillermo.
>  On 06/05/2013 12:41 PM, Guillermo Marco Puche wrote:
>
>  Hello dear developers,
>
> I think I've a stable configuration that fits my needs for huge number of
> variants.
>
> My last VCF input has 400.000 variants.
>
> Currently on my cluster, with one compute node (x8 cpu and 32GB RAM) using
> 8 threads and a buffer size of 15000 variants the time for VEP with all the
> plugins and options I need it takes VEP 2h 56min to calculate 15000
> variants.
>
> I'm using a local ensembl71 database replica + cache for homo_sapiens. So
> the time to load vars into memory is very small.
>
> The 99% of time it takes the VEP script it's obviously from "Calculating
> consequences".
>
> I've also noticed that VEP with 8 threads consumes the 100% of my 8 CPUs
> with 8 threads, it's really great. But RAM load being used is very low 8GB.
>
> So I've a few questions.
>
>
>    - Has someone achieved to parallelize VEP process with MPI or OpenMPI?
>    It would be awesome being able to select for example 16 threads and being
>    able to distribute 8 and 8 threads between two different machines (compute
>    nodes).
>
>    - In order to optimize self coded plugins, I've been reading into this
>    from VEP ensembl website: *"VEP users writing plugins should be aware
>    that while the VEP code attempts to preserve the state of any
>    plugin-specific cached data between separate forks, there may be situations
>    where data is lost. If you find this is the case, you should disable
>    forking in the new() method of your plugin by deleting the "fork" key from
>    the $config hash."
>
>    *I had no problems with my plugins after fixing them (thanks to the
>    great support of developers on this list). But I feel they're slowing VEP
>    I'm sure they can be optimized. I really would like a direction, guide or
>    some tips that I could use to optimize my code.
>
>    - I hope a new way to share plugins between VEP users is available
>    soon, so we can help, give tips between all devs to improve the code,
>    speed, results etc..
>
>
> Thank you !
>
> Best regards,
> Guillermo.
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130607/4e7ed594/attachment.html>


More information about the Dev mailing list