[ensembl-dev] VEP error: Forked process failed.
Stuart Meacham
sm766 at cam.ac.uk
Tue May 14 09:55:30 BST 2013
Hi,
I certainly don't want to hijack this thread but it seemed daft to start
another. I am also getting forking errors. I don't use any custom
plugins and am using a validated VCF as input (with about 600,000
variants). Trying to fork more than 4 threads is unstable even on my
machine which has 64 cores and half a TB of RAM.
I haven't found anything reproducible, however if I do I'll report back
to the list.
Thanks
Stuart
On 14/05/2013 09:42, Will McLaren wrote:
> Hello,
>
> Your aa_grantham_distance plugin is somewhat inefficient - it
> retrieves the peptide alleles from the HGVS annotation, which itself
> requires some database fetching and processing to produce. This is why
> it is slow.
>
> You can get the peptides from the transcript variation object:
>
> my @peps = split "/", $tva->transcript_variation->pep_allele_string();
>
> This will give you single-letter AA codes, but you could either modify
> your hash or use BioPerl to convert:
>
> $seqobj = Bio::PrimarySeq->new ( -seq => $single_letter_aa);
> $three_letter_aa = Bio::SeqUtils->seq3($seqobj);
>
> You should also declare your distances hash in the new() sub and store
> it on $self; this will also marginally speed up your plugin.
>
> Regarding the forking issues, we are working on improving stability
> under forking.
>
> Thanks for your patience
>
> Will
>
>
> On 14 May 2013 07:37, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Hello,
>
> I'm not really sure which one of those plugins is causing the fork
> error. I cannot recreate it now running each one of them separately.
>
> Here are both:
>
> https://github.com/guillermomarco/vep_plugins_71
>
> They also slow the calculating consequences process a lot.
> aa_grantham_distance.pm <http://aa_grantham_distance.pm> is just a
> hardcoded plugin from one of the biologists in my work. It was
> just a pure copy paste and adaptation to make it work as a VEP
> plugin. Maybe the problem is in the matrix definition every time
> the sub routine is called. I'm not running out of memory nor CPU.
> I'm currently using it with 2 threads and buffersize of 500 for a
> 5000 variant vcf file.
>
> I'm my honest opinion, I think one (or even both) of those plugins
> are slowing so much the calculating process that sometimes the
> fork just dies. Like when you have a timeout during to heavy
> network traffic. So when you use them together with lot of other
> plugins like Condel, Consequence, etc.. they may be causing the
> process to handle and die.
>
> Best regards,
> Guillermo.
>
>
> On 05/13/2013 03:55 PM, Duarte Molha wrote:
>> I also get this error... it is so prevalent and so difficult to
>> pinpoint what is causing it that I have given up on forking my
>> annotation process.
>>
>> I do think it is related to the number of forks. It seems to
>> crash less often if you use a low number of forks... anything
>> above 5 will undoubtedly crash the script at least in my experience.
>>
>> Cheers
>>
>> Duarte
>>
>> =========================
>> Duarte Miguel Paulo Molha
>> http://about.me/duarte
>> =========================
>>
>>
>> On Mon, May 13, 2013 at 2:50 PM, Will McLaren <wm2 at ebi.ac.uk
>> <mailto:wm2 at ebi.ac.uk>> wrote:
>>
>> Hi Guillermo,
>>
>> Test each plugin individually until you find the one that
>> causes the error. It is highly unlikely that a particular
>> combination of plugins is causing the crash.
>>
>> Check that there are no "print" (to STDOUT or STDERR)
>> statements in your plugin - forking assumes that code remains
>> silent otherwise it will throw errors like this.
>>
>> Also, check what, if anything, is cached between runs of your
>> plugin. If you are caching things (for example to avoid
>> re-querying a database), you may need to write storable hooks
>> to ensure the data is getting cached between forks - see
>> https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pm
>> for an example.
>>
>> If you still have no luck, send me the code and an input file
>> that recreates the problem.
>>
>> Regards
>>
>> Will
>>
>>
>> On 13 May 2013 13:18, Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com
>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>> Hello,
>>
>> I've started to recently having problems with VEP script
>> while using different plugins (most of them own plugins).
>>
>> 2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database
>> 2013-05-13 13:59:44 - Loaded plugin: vcf_input
>> 2013-05-13 13:59:44 - Loaded plugin: biobase
>> 2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance
>> 2013-05-13 13:59:44 - Loaded plugin: flanking_sequence
>> 2013-05-13 13:59:44 - Loaded plugin: Condel
>> 2013-05-13 13:59:44 - Output fields redefined (37 defined)
>> 2013-05-13 13:59:44 - Starting...
>> 2013-05-13 13:59:45 - Read 3888 variants into buffer
>> 2013-05-13 13:59:54 - Reading transcript data from cache and/or database
>> [===============================================] [ 100% ]
>> 2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)
>> 2013-05-13 14:02:38 - Calculating consequences
>> [===================================> ] [ 78% ]
>> ERROR: Forked process failed
>>
>>
>> I'm not getting any other error message. So I cannot
>> debug properly. I thought my plugins were OK but it's
>> seems they don't. I think the problem occurs when I use
>> "aa_grantham_distance plugin" together with
>> "flanking_sequence". I've no idea what could be causing this.
>>
>> I'm running VEP on verbose mode but I can't get any
>> usefull information. How could I debug that?
>>
>> Best regards,
>> Guillermo.
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130514/82c008f6/attachment.html>
More information about the Dev
mailing list