[ensembl-dev] VEP error: Forked process failed.

Stuart Meacham sm766 at cam.ac.uk
Tue May 14 09:55:30 BST 2013


Hi,

I certainly don't want to hijack this thread but it seemed daft to start 
another. I am also getting forking errors. I don't use any custom 
plugins and am using a validated VCF as input (with about 600,000 
variants). Trying to fork more than 4 threads is unstable even on my 
machine which has 64 cores and half a TB of RAM.

I haven't found anything reproducible, however if I do I'll report back 
to the list.

Thanks

Stuart

On 14/05/2013 09:42, Will McLaren wrote:
> Hello,
>
> Your aa_grantham_distance plugin is somewhat inefficient - it 
> retrieves the peptide alleles from the HGVS annotation, which itself 
> requires some database fetching and processing to produce. This is why 
> it is slow.
>
> You can get the peptides from the transcript variation object:
>
> my @peps = split "/", $tva->transcript_variation->pep_allele_string();
>
> This will give you single-letter AA codes, but you could either modify 
> your hash or use BioPerl to convert:
>
> $seqobj = Bio::PrimarySeq->new ( -seq => $single_letter_aa);
> $three_letter_aa = Bio::SeqUtils->seq3($seqobj);
>
> You should also declare your distances hash in the new() sub and store 
> it on $self; this will also marginally speed up your plugin.
>
> Regarding the forking issues, we are working on improving stability 
> under forking.
>
> Thanks for your patience
>
> Will
>
>
> On 14 May 2013 07:37, Guillermo Marco Puche 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Hello,
>
>     I'm not really sure which one of those plugins is causing the fork
>     error. I cannot recreate it now running each one of them separately.
>
>     Here are both:
>
>     https://github.com/guillermomarco/vep_plugins_71
>
>     They also slow the calculating consequences process a lot.
>     aa_grantham_distance.pm <http://aa_grantham_distance.pm> is just a
>     hardcoded plugin from one of the biologists in my work. It was
>     just a pure copy paste and adaptation to make it work as a VEP
>     plugin. Maybe the problem is in the matrix definition every time
>     the sub routine is called. I'm not running out of memory nor CPU.
>     I'm currently using it with 2 threads and buffersize of 500 for a
>     5000 variant vcf file.
>
>     I'm my honest opinion, I think one (or even both) of those plugins
>     are slowing so much the calculating process that sometimes the
>     fork just dies. Like when you have a timeout during to heavy
>     network traffic. So when you use them together with lot of other
>     plugins like Condel, Consequence, etc.. they may be causing the
>     process to handle and die.
>
>     Best regards,
>     Guillermo.
>
>
>     On 05/13/2013 03:55 PM, Duarte Molha wrote:
>>     I also get this error... it is so prevalent and so difficult to
>>     pinpoint what is causing it that I have given up on forking my
>>     annotation process.
>>
>>     I do think it is related to the number of forks. It seems to
>>     crash less often if you use a low number of forks... anything
>>     above 5 will undoubtedly crash the script at least in my experience.
>>
>>     Cheers
>>
>>     Duarte
>>
>>     =========================
>>          Duarte Miguel Paulo Molha
>>     http://about.me/duarte
>>     =========================
>>
>>
>>     On Mon, May 13, 2013 at 2:50 PM, Will McLaren <wm2 at ebi.ac.uk
>>     <mailto:wm2 at ebi.ac.uk>> wrote:
>>
>>         Hi Guillermo,
>>
>>         Test each plugin individually until you find the one that
>>         causes the error. It is highly unlikely that a particular
>>         combination of plugins is causing the crash.
>>
>>         Check that there are no "print" (to STDOUT or STDERR)
>>         statements in your plugin - forking assumes that code remains
>>         silent otherwise it will throw errors like this.
>>
>>         Also, check what, if anything, is cached between runs of your
>>         plugin. If you are caching things (for example to avoid
>>         re-querying a database), you may need to write storable hooks
>>         to ensure the data is getting cached between forks - see
>>         https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pm
>>         for an example.
>>
>>         If you still have no luck, send me the code and an input file
>>         that recreates the problem.
>>
>>         Regards
>>
>>         Will
>>
>>
>>         On 13 May 2013 13:18, Guillermo Marco Puche
>>         <guillermo.marco at sistemasgenomicos.com
>>         <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>             Hello,
>>
>>             I've started to recently having problems with VEP script
>>             while using different plugins (most of them own plugins).
>>
>>             2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database
>>             2013-05-13 13:59:44 - Loaded plugin: vcf_input
>>             2013-05-13 13:59:44 - Loaded plugin: biobase
>>             2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance
>>             2013-05-13 13:59:44 - Loaded plugin: flanking_sequence
>>             2013-05-13 13:59:44 - Loaded plugin: Condel
>>             2013-05-13 13:59:44 - Output fields redefined (37 defined)
>>             2013-05-13 13:59:44 - Starting...
>>             2013-05-13 13:59:45 - Read 3888 variants into buffer
>>             2013-05-13 13:59:54 - Reading transcript data from cache and/or database
>>             [===============================================]  [ 100% ]
>>             2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)
>>             2013-05-13 14:02:38 - Calculating consequences
>>             [===================================>           ]   [ 78% ]
>>             ERROR: Forked process failed
>>
>>
>>             I'm not getting any other error message. So I cannot
>>             debug properly. I thought my plugins were OK but it's
>>             seems they don't. I think the problem occurs when I use
>>             "aa_grantham_distance plugin" together with
>>             "flanking_sequence". I've no idea what could be causing this.
>>
>>             I'm running VEP on verbose mode but I can't get any
>>             usefull information. How could I debug that?
>>
>>             Best regards,
>>             Guillermo.
>>
>>
>>             _______________________________________________
>>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>             Posting guidelines and subscribe/unsubscribe info:
>>             http://lists.ensembl.org/mailman/listinfo/dev
>>             Ensembl Blog: http://www.ensembl.info/
>>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130514/82c008f6/attachment.html>


More information about the Dev mailing list