[ensembl-dev] VEP error: Forked process failed.

Will McLaren wm2 at ebi.ac.uk
Tue May 14 10:16:13 BST 2013


Stuart, Guillermo, Duarte,

I'm currently working on some code as I stated above to improve stability
and performance under forking.

I've committed some code to the HEAD of our CVS tree which should help the
problems you are encountering. You'd all be welcome to test this out, with
the obvious proviso that this is development code and may contain bugs!

To use this, you should download the copy of VEP.pm from:

http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.92&root=ensembl

and replace the VEP.pm under
ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils (or just
Bio/EnsEMBL/Variation/Utils if you use INSTALL.pl) with this one.

This code will appear in production in the next proper release of Ensembl.

Regards

Will


On 14 May 2013 09:55, Stuart Meacham <sm766 at cam.ac.uk> wrote:

>  Hi,
>
> I certainly don't want to hijack this thread but it seemed daft to start
> another. I am also getting forking errors. I don't use any custom plugins
> and am using a validated VCF as input (with about 600,000 variants). Trying
> to fork more than 4 threads is unstable even on my machine which has 64
> cores and half a TB of RAM.
>
> I haven't found anything reproducible, however if I do I'll report back to
> the list.
>
> Thanks
>
> Stuart
>
>
> On 14/05/2013 09:42, Will McLaren wrote:
>
> Hello,
>
>  Your aa_grantham_distance plugin is somewhat inefficient - it retrieves
> the peptide alleles from the HGVS annotation, which itself requires some
> database fetching and processing to produce. This is why it is slow.
>
>  You can get the peptides from the transcript variation object:
>
>  my @peps = split "/", $tva->transcript_variation->pep_allele_string();
>
>  This will give you single-letter AA codes, but you could either modify
> your hash or use BioPerl to convert:
>
>  $seqobj = Bio::PrimarySeq->new ( -seq => $single_letter_aa);
> $three_letter_aa = Bio::SeqUtils->seq3($seqobj);
>
>  You should also declare your distances hash in the new() sub and store
> it on $self; this will also marginally speed up your plugin.
>
>  Regarding the forking issues, we are working on improving stability
> under forking.
>
>  Thanks for your patience
>
>  Will
>
>
>  On 14 May 2013 07:37, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:
>
>>  Hello,
>>
>> I'm not really sure which one of those plugins is causing the fork error.
>> I cannot recreate it now running each one of them separately.
>>
>> Here are both:
>>
>> https://github.com/guillermomarco/vep_plugins_71
>>
>> They also slow the calculating consequences process a lot.
>> aa_grantham_distance.pm is just a hardcoded plugin from one of the
>> biologists in my work. It was just a pure copy paste and adaptation to make
>> it work as a VEP plugin. Maybe the problem is in the matrix definition
>> every time the sub routine is called. I'm not running out of memory nor
>> CPU. I'm currently using it with 2 threads and buffersize of 500 for a 5000
>> variant vcf file.
>>
>> I'm my honest opinion, I think one (or even both) of those plugins are
>> slowing so much the calculating process that sometimes the fork just dies.
>> Like when you have a timeout during to heavy network traffic. So when you
>> use them together with lot of other plugins like Condel, Consequence, etc..
>> they may be causing the process to handle and die.
>>
>> Best regards,
>> Guillermo.
>>
>>
>> On 05/13/2013 03:55 PM, Duarte Molha wrote:
>>
>> I also get this error... it is so prevalent and so difficult to pinpoint
>> what is causing it that I have given up on forking my annotation process.
>>
>>  I do think it is related to the number of forks. It seems to crash less
>> often if you use a low number of forks... anything above 5
>> will undoubtedly crash the script at least in my experience.
>>
>>  Cheers
>>
>> Duarte
>>
>> =========================
>>      Duarte Miguel Paulo Molha
>>           http://about.me/duarte
>> =========================
>>
>>
>> On Mon, May 13, 2013 at 2:50 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>
>>> Hi Guillermo,
>>>
>>>  Test each plugin individually until you find the one that causes the
>>> error. It is highly unlikely that a particular combination of plugins is
>>> causing the crash.
>>>
>>>  Check that there are no "print" (to STDOUT or STDERR) statements in
>>> your plugin - forking assumes that code remains silent otherwise it will
>>> throw errors like this.
>>>
>>>  Also, check what, if anything, is cached between runs of your plugin.
>>> If you are caching things (for example to avoid re-querying a database),
>>> you may need to write storable hooks to ensure the data is getting cached
>>> between forks - see
>>> https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pmfor an example.
>>>
>>>  If you still have no luck, send me the code and an input file that
>>> recreates the problem.
>>>
>>>  Regards
>>>
>>>  Will
>>>
>>>
>>>  On 13 May 2013 13:18, Guillermo Marco Puche <
>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>>>   Hello,
>>>>
>>>> I've started to recently having problems with VEP script while using
>>>> different plugins (most of them own plugins).
>>>>
>>>> 2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database
>>>> 2013-05-13 13:59:44 - Loaded plugin: vcf_input
>>>> 2013-05-13 13:59:44 - Loaded plugin: biobase
>>>> 2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance
>>>> 2013-05-13 13:59:44 - Loaded plugin: flanking_sequence
>>>> 2013-05-13 13:59:44 - Loaded plugin: Condel
>>>> 2013-05-13 13:59:44 - Output fields redefined (37 defined)
>>>> 2013-05-13 13:59:44 - Starting...
>>>> 2013-05-13 13:59:45 - Read 3888 variants into buffer
>>>> 2013-05-13 13:59:54 - Reading transcript data from cache and/or database
>>>> [===============================================]  [ 100% ]
>>>> 2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)
>>>> 2013-05-13 14:02:38 - Calculating consequences
>>>> [===================================>           ]   [ 78% ]
>>>> ERROR: Forked process failed
>>>>
>>>>
>>>>
>>>> I'm not getting any other error message. So I cannot debug properly. I
>>>> thought my plugins were OK but it's seems they don't. I think the problem
>>>> occurs when I use "aa_grantham_distance plugin" together with
>>>> "flanking_sequence". I've no idea what could be causing this.
>>>>
>>>> I'm running VEP on verbose mode but I can't get any usefull
>>>> information. How could I debug that?
>>>>
>>>> Best regards,
>>>> Guillermo.
>>>>
>>>>
>>>>  _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130514/9f0e864e/attachment.html>


More information about the Dev mailing list