[ensembl-dev] VEP error: Forked process failed.

Duarte Molha duartemolha at gmail.com
Tue May 14 10:42:34 BST 2013


Thanks

Running a annotation using 16 forks... lets see how it handles :)
I'll report back any issues.

Thanks for the update

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================


On Tue, May 14, 2013 at 10:16 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Stuart, Guillermo, Duarte,
>
> I'm currently working on some code as I stated above to improve stability
> and performance under forking.
>
> I've committed some code to the HEAD of our CVS tree which should help the
> problems you are encountering. You'd all be welcome to test this out, with
> the obvious proviso that this is development code and may contain bugs!
>
> To use this, you should download the copy of VEP.pm from:
>
>
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.92&root=ensembl
>
> and replace the VEP.pm under
> ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils (or just
> Bio/EnsEMBL/Variation/Utils if you use INSTALL.pl) with this one.
>
> This code will appear in production in the next proper release of Ensembl.
>
> Regards
>
> Will
>
>
> On 14 May 2013 09:55, Stuart Meacham <sm766 at cam.ac.uk> wrote:
>
>>  Hi,
>>
>> I certainly don't want to hijack this thread but it seemed daft to start
>> another. I am also getting forking errors. I don't use any custom plugins
>> and am using a validated VCF as input (with about 600,000 variants). Trying
>> to fork more than 4 threads is unstable even on my machine which has 64
>> cores and half a TB of RAM.
>>
>> I haven't found anything reproducible, however if I do I'll report back
>> to the list.
>>
>> Thanks
>>
>> Stuart
>>
>>
>> On 14/05/2013 09:42, Will McLaren wrote:
>>
>> Hello,
>>
>>  Your aa_grantham_distance plugin is somewhat inefficient - it retrieves
>> the peptide alleles from the HGVS annotation, which itself requires some
>> database fetching and processing to produce. This is why it is slow.
>>
>>  You can get the peptides from the transcript variation object:
>>
>>  my @peps = split "/", $tva->transcript_variation->pep_allele_string();
>>
>>  This will give you single-letter AA codes, but you could either modify
>> your hash or use BioPerl to convert:
>>
>>  $seqobj = Bio::PrimarySeq->new ( -seq => $single_letter_aa);
>> $three_letter_aa = Bio::SeqUtils->seq3($seqobj);
>>
>>  You should also declare your distances hash in the new() sub and store
>> it on $self; this will also marginally speed up your plugin.
>>
>>  Regarding the forking issues, we are working on improving stability
>> under forking.
>>
>>  Thanks for your patience
>>
>>  Will
>>
>>
>>  On 14 May 2013 07:37, Guillermo Marco Puche <
>> guillermo.marco at sistemasgenomicos.com> wrote:
>>
>>>  Hello,
>>>
>>> I'm not really sure which one of those plugins is causing the fork
>>> error. I cannot recreate it now running each one of them separately.
>>>
>>> Here are both:
>>>
>>> https://github.com/guillermomarco/vep_plugins_71
>>>
>>> They also slow the calculating consequences process a lot.
>>> aa_grantham_distance.pm is just a hardcoded plugin from one of the
>>> biologists in my work. It was just a pure copy paste and adaptation to make
>>> it work as a VEP plugin. Maybe the problem is in the matrix definition
>>> every time the sub routine is called. I'm not running out of memory nor
>>> CPU. I'm currently using it with 2 threads and buffersize of 500 for a 5000
>>> variant vcf file.
>>>
>>> I'm my honest opinion, I think one (or even both) of those plugins are
>>> slowing so much the calculating process that sometimes the fork just dies.
>>> Like when you have a timeout during to heavy network traffic. So when you
>>> use them together with lot of other plugins like Condel, Consequence, etc..
>>> they may be causing the process to handle and die.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>>
>>> On 05/13/2013 03:55 PM, Duarte Molha wrote:
>>>
>>> I also get this error... it is so prevalent and so difficult to pinpoint
>>> what is causing it that I have given up on forking my annotation process.
>>>
>>>  I do think it is related to the number of forks. It seems to crash
>>> less often if you use a low number of forks... anything above 5
>>> will undoubtedly crash the script at least in my experience.
>>>
>>>  Cheers
>>>
>>> Duarte
>>>
>>> =========================
>>>      Duarte Miguel Paulo Molha
>>>           http://about.me/duarte
>>> =========================
>>>
>>>
>>> On Mon, May 13, 2013 at 2:50 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>
>>>> Hi Guillermo,
>>>>
>>>>  Test each plugin individually until you find the one that causes the
>>>> error. It is highly unlikely that a particular combination of plugins is
>>>> causing the crash.
>>>>
>>>>  Check that there are no "print" (to STDOUT or STDERR) statements in
>>>> your plugin - forking assumes that code remains silent otherwise it will
>>>> throw errors like this.
>>>>
>>>>  Also, check what, if anything, is cached between runs of your plugin.
>>>> If you are caching things (for example to avoid re-querying a database),
>>>> you may need to write storable hooks to ensure the data is getting cached
>>>> between forks - see
>>>> https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pmfor an example.
>>>>
>>>>  If you still have no luck, send me the code and an input file that
>>>> recreates the problem.
>>>>
>>>>  Regards
>>>>
>>>>  Will
>>>>
>>>>
>>>>  On 13 May 2013 13:18, Guillermo Marco Puche <
>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>
>>>>>   Hello,
>>>>>
>>>>> I've started to recently having problems with VEP script while using
>>>>> different plugins (most of them own plugins).
>>>>>
>>>>> 2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database
>>>>> 2013-05-13 13:59:44 - Loaded plugin: vcf_input
>>>>> 2013-05-13 13:59:44 - Loaded plugin: biobase
>>>>> 2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance
>>>>> 2013-05-13 13:59:44 - Loaded plugin: flanking_sequence
>>>>> 2013-05-13 13:59:44 - Loaded plugin: Condel
>>>>> 2013-05-13 13:59:44 - Output fields redefined (37 defined)
>>>>> 2013-05-13 13:59:44 - Starting...
>>>>> 2013-05-13 13:59:45 - Read 3888 variants into buffer
>>>>> 2013-05-13 13:59:54 - Reading transcript data from cache and/or database
>>>>> [===============================================]  [ 100% ]
>>>>> 2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)
>>>>> 2013-05-13 14:02:38 - Calculating consequences
>>>>> [===================================>           ]   [ 78% ]
>>>>> ERROR: Forked process failed
>>>>>
>>>>>
>>>>>
>>>>> I'm not getting any other error message. So I cannot debug properly. I
>>>>> thought my plugins were OK but it's seems they don't. I think the problem
>>>>> occurs when I use "aa_grantham_distance plugin" together with
>>>>> "flanking_sequence". I've no idea what could be causing this.
>>>>>
>>>>> I'm running VEP on verbose mode but I can't get any usefull
>>>>> information. How could I debug that?
>>>>>
>>>>> Best regards,
>>>>> Guillermo.
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130514/597a8ab9/attachment.html>


More information about the Dev mailing list