[ensembl-dev] VEP error: Forked process failed.

Will McLaren wm2 at ebi.ac.uk
Tue May 14 16:26:46 BST 2013


Hi Duarte,

You are correct, it was the --individual flag causing the problem.

I have committed a fix to the 71 branch (does not include forking fixes)
and to head (does include forking fixes, download from
http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.94&root=ensembl
).

Regards

Will


On 14 May 2013 15:17, Duarte Molha <duartemolha at gmail.com> wrote:

> However I believe it must be something related to the --individual
> [all|ind list] flag
> And does not seem to be related to the --forks argument since it also
> thoughs that error when running single threaded.
>
> However in single thread mode it simply displays the error and keeps going
> and with multiple forks the process dies entirely.
>
> My vcf contains 3 samples genotypes and I have included the --individual
> all option.
>
> Best regards
> Duarte
>
>
> =========================
>      Duarte Miguel Paulo Molha
>          http://about.me/duarte
> =========================
>
>
> On Tue, May 14, 2013 at 2:00 PM, Duarte Molha <duartemolha at gmail.com>wrote:
>
>> Hi Will
>>
>> Unfortunately the script does not report in which line it fails and I
>> cannot provide you with the entire file since it is private data.
>> Is there a way of reporting the line >
>>
>> Thanks
>>
>> Duarte
>>
>>
>> =========================
>>      Duarte Miguel Paulo Molha
>>          http://about.me/duarte
>> =========================
>>
>>
>> On Tue, May 14, 2013 at 1:05 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>
>>> Hi Duarte,
>>>
>>> Do you have some input that causes this error?
>>>
>>> Thanks
>>>
>>> Will
>>>
>>>
>>> On 14 May 2013 12:57, Duarte Molha <duartemolha at gmail.com> wrote:
>>>
>>>> Another bug using the updated version... now using the --check_alleles
>>>> and --check_existing options the script dies at line 4759
>>>>
>>>> Use of uninitialized value in string ne at
>>>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4759.
>>>>
>>>>
>>>> Best regards
>>>>
>>>> Duarte
>>>>
>>>> =========================
>>>>      Duarte Miguel Paulo Molha
>>>>          http://about.me/duarte
>>>> =========================
>>>>
>>>>
>>>> On Tue, May 14, 2013 at 11:22 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>
>>>>> Thanks - try
>>>>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.93&root=ensembl
>>>>>
>>>>> Will
>>>>>
>>>>>
>>>>> On 14 May 2013 11:09, Duarte Molha <duartemolha at gmail.com> wrote:
>>>>>
>>>>>> It seems the problems are still there :
>>>>>>
>>>>>> Here is my output:
>>>>>>
>>>>>>  perl variant_effect_predictor.pl --config vep_human.ini -i
>>>>>> INPUT.vcf --fork 16
>>>>>>
>>>>>> 2013-05-14 10:33:38 - Read configuration from vep_human.ini
>>>>>> #----------------------------------#
>>>>>> # ENSEMBL VARIANT EFFECT PREDICTOR #
>>>>>> #----------------------------------#
>>>>>>
>>>>>> version 71
>>>>>>
>>>>>> By Will McLaren (wm2 at ebi.ac.uk)
>>>>>>
>>>>>> Configuration options:
>>>>>>
>>>>>> ###
>>>>>> allow_non_variant    1
>>>>>> buffer_size                 500000
>>>>>> cache                1
>>>>>> canonical            1
>>>>>> ccds                 1
>>>>>> check_alleles        1
>>>>>> check_existing       1
>>>>>> config               vep_human.ini
>>>>>> core_type            core
>>>>>> custom
>>>>>> /ReferenceData/vep_additional_annotations/Somatic_variation_phenotypes.bed.gz,Somatic,bed,exact
>>>>>>
>>>>>>
>>>>>> /ReferenceData/vep_additional_annotations/dbsnp135_ensembl_variation_phenotype.bed.gz,dbsnp135,bed,exact
>>>>>> dir                       /ReferenceData/vep_cache/
>>>>>> domains                 1
>>>>>> force_overwrite      1
>>>>>> fork                          16
>>>>>> gmaf                 1
>>>>>> hgnc                 1
>>>>>> host                 ensembldb.ensembl.org
>>>>>> individual           all
>>>>>> input_file           INPUT.vcf
>>>>>> numbers              1
>>>>>> plugin               Blosum62
>>>>>> Condel,/ReferenceData/vep_cache/Plugins/config/Condel/config,b  Carol
>>>>>> polyphen             b
>>>>>> port                 5306
>>>>>> protein              1
>>>>>> regulatory           1
>>>>>> sift                 b
>>>>>> species              homo_sapiens
>>>>>> stats                HASH(0x35a8000)
>>>>>> terms                SO
>>>>>> toplevel_dir         /ReferenceData/vep_cache/
>>>>>> verbose              1
>>>>>> xref_refseq          1
>>>>>>
>>>>>> --------------------
>>>>>>
>>>>>> Will only load v71 databases
>>>>>> Species 'homo_sapiens' loaded from database 'homo_sapiens_core_71_37'
>>>>>> Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_71_37'
>>>>>> Species 'homo_sapiens' loaded from database 'homo_sapiens_vega_71_37'
>>>>>> Species 'homo_sapiens' loaded from database
>>>>>> 'homo_sapiens_otherfeatures_71_37'
>>>>>> Species 'homo_sapiens' loaded from database
>>>>>> 'homo_sapiens_rnaseq_71_37'
>>>>>> homo_sapiens_variation_71_37 loaded
>>>>>> homo_sapiens_funcgen_71_37 loaded
>>>>>> Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the following
>>>>>> compara databases will be ignored: ensembl_compara_71
>>>>>> ensembl_ancestral_71 loaded
>>>>>> ensembl_ontology_71 loaded
>>>>>> 2013-05-14 10:33:39 - Connected to core version 71 database and
>>>>>> variation version 71 database
>>>>>> 2013-05-14 10:33:39 - Read existing cache info
>>>>>> 2013-05-14 10:33:39 - Loaded plugin: Blosum62
>>>>>> 2013-05-14 10:33:39 - Loaded plugin: Condel
>>>>>> 2013-05-14 10:33:39 - Loaded plugin: Carol
>>>>>> 2013-05-14 10:33:40 - Starting...
>>>>>> 2013-05-14 10:33:40 - Detected format of input file as vcf
>>>>>> 2013-05-14 10:33:46 - Read 195789 variants into buffer
>>>>>> 2013-05-14 10:33:46 - Skipping 67552 non-variant loci
>>>>>> 2013-05-14 10:33:46 - Reading transcript data from cache and/or
>>>>>> database
>>>>>> [======================================================================================================]
>>>>>>  [ 100% ]
>>>>>> 2013-05-14 10:40:19 - Retrieved 189344 transcripts (0 mem, 202901
>>>>>> cached, 0 DB, 13557 duplicates)
>>>>>> 2013-05-14 10:40:19 - Reading regulatory data from cache and/or
>>>>>> database
>>>>>> [======================================================================================================]
>>>>>>  [ 100% ]
>>>>>> 2013-05-14 10:50:09 - Retrieved 872092 regulatory features (0 mem,
>>>>>> 872351 cached, 0 DB, 259 duplicates)
>>>>>> 2013-05-14 10:50:12 - Calculating consequences
>>>>>> Use of uninitialized value $_ in pattern match (m//) at
>>>>>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1022.
>>>>>> Use of uninitialized value $_ in pattern match (m//) at
>>>>>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1030.
>>>>>> Use of uninitialized value $_ in pattern match (m//) at
>>>>>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1037.
>>>>>> Use of uninitialized value $_ in pattern match (m//) at
>>>>>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1089.
>>>>>> Use of uninitialized value $_ in concatenation (.) or string at
>>>>>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1097.
>>>>>>
>>>>>> ERROR: Forked process failed
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> =========================
>>>>>>      Duarte Miguel Paulo Molha
>>>>>>          http://about.me/duarte
>>>>>> =========================
>>>>>>
>>>>>>
>>>>>> On Tue, May 14, 2013 at 10:42 AM, Duarte Molha <duartemolha at gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Running a annotation using 16 forks... lets see how it handles :)
>>>>>>> I'll report back any issues.
>>>>>>>
>>>>>>> Thanks for the update
>>>>>>>
>>>>>>> Duarte
>>>>>>>
>>>>>>>
>>>>>>> =========================
>>>>>>>      Duarte Miguel Paulo Molha
>>>>>>>          http://about.me/duarte
>>>>>>> =========================
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 14, 2013 at 10:16 AM, Will McLaren <wm2 at ebi.ac.uk>wrote:
>>>>>>>
>>>>>>>> Stuart, Guillermo, Duarte,
>>>>>>>>
>>>>>>>> I'm currently working on some code as I stated above to improve
>>>>>>>> stability and performance under forking.
>>>>>>>>
>>>>>>>> I've committed some code to the HEAD of our CVS tree which should
>>>>>>>> help the problems you are encountering. You'd all be welcome to test this
>>>>>>>> out, with the obvious proviso that this is development code and may contain
>>>>>>>> bugs!
>>>>>>>>
>>>>>>>> To use this, you should download the copy of VEP.pm from:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.92&root=ensembl
>>>>>>>>
>>>>>>>> and replace the VEP.pm under
>>>>>>>> ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils (or just
>>>>>>>> Bio/EnsEMBL/Variation/Utils if you use INSTALL.pl) with this one.
>>>>>>>>
>>>>>>>> This code will appear in production in the next proper release of
>>>>>>>> Ensembl.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Will
>>>>>>>>
>>>>>>>>
>>>>>>>> On 14 May 2013 09:55, Stuart Meacham <sm766 at cam.ac.uk> wrote:
>>>>>>>>
>>>>>>>>>  Hi,
>>>>>>>>>
>>>>>>>>> I certainly don't want to hijack this thread but it seemed daft to
>>>>>>>>> start another. I am also getting forking errors. I don't use any custom
>>>>>>>>> plugins and am using a validated VCF as input (with about 600,000
>>>>>>>>> variants). Trying to fork more than 4 threads is unstable even on my
>>>>>>>>> machine which has 64 cores and half a TB of RAM.
>>>>>>>>>
>>>>>>>>> I haven't found anything reproducible, however if I do I'll report
>>>>>>>>> back to the list.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Stuart
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 14/05/2013 09:42, Will McLaren wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>>  Your aa_grantham_distance plugin is somewhat inefficient - it
>>>>>>>>> retrieves the peptide alleles from the HGVS annotation, which itself
>>>>>>>>> requires some database fetching and processing to produce. This is why it
>>>>>>>>> is slow.
>>>>>>>>>
>>>>>>>>>  You can get the peptides from the transcript variation object:
>>>>>>>>>
>>>>>>>>>  my @peps = split "/",
>>>>>>>>> $tva->transcript_variation->pep_allele_string();
>>>>>>>>>
>>>>>>>>>  This will give you single-letter AA codes, but you could either
>>>>>>>>> modify your hash or use BioPerl to convert:
>>>>>>>>>
>>>>>>>>>  $seqobj = Bio::PrimarySeq->new ( -seq => $single_letter_aa);
>>>>>>>>> $three_letter_aa = Bio::SeqUtils->seq3($seqobj);
>>>>>>>>>
>>>>>>>>>  You should also declare your distances hash in the new() sub and
>>>>>>>>> store it on $self; this will also marginally speed up your plugin.
>>>>>>>>>
>>>>>>>>>  Regarding the forking issues, we are working on improving
>>>>>>>>> stability under forking.
>>>>>>>>>
>>>>>>>>>  Thanks for your patience
>>>>>>>>>
>>>>>>>>>  Will
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  On 14 May 2013 07:37, Guillermo Marco Puche <
>>>>>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>>>>>
>>>>>>>>>>  Hello,
>>>>>>>>>>
>>>>>>>>>> I'm not really sure which one of those plugins is causing the
>>>>>>>>>> fork error. I cannot recreate it now running each one of them separately.
>>>>>>>>>>
>>>>>>>>>> Here are both:
>>>>>>>>>>
>>>>>>>>>> https://github.com/guillermomarco/vep_plugins_71
>>>>>>>>>>
>>>>>>>>>> They also slow the calculating consequences process a lot.
>>>>>>>>>> aa_grantham_distance.pm is just a hardcoded plugin from one of
>>>>>>>>>> the biologists in my work. It was just a pure copy paste and adaptation to
>>>>>>>>>> make it work as a VEP plugin. Maybe the problem is in the matrix definition
>>>>>>>>>> every time the sub routine is called. I'm not running out of memory nor
>>>>>>>>>> CPU. I'm currently using it with 2 threads and buffersize of 500 for a 5000
>>>>>>>>>> variant vcf file.
>>>>>>>>>>
>>>>>>>>>> I'm my honest opinion, I think one (or even both) of those
>>>>>>>>>> plugins are slowing so much the calculating process that sometimes the fork
>>>>>>>>>> just dies. Like when you have a timeout during to heavy network traffic. So
>>>>>>>>>> when you use them together with lot of other plugins like Condel,
>>>>>>>>>> Consequence, etc.. they may be causing the process to handle and die.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Guillermo.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/13/2013 03:55 PM, Duarte Molha wrote:
>>>>>>>>>>
>>>>>>>>>> I also get this error... it is so prevalent and so difficult to
>>>>>>>>>> pinpoint what is causing it that I have given up on forking my annotation
>>>>>>>>>> process.
>>>>>>>>>>
>>>>>>>>>>  I do think it is related to the number of forks. It seems to
>>>>>>>>>> crash less often if you use a low number of forks... anything above 5
>>>>>>>>>> will undoubtedly crash the script at least in my experience.
>>>>>>>>>>
>>>>>>>>>>  Cheers
>>>>>>>>>>
>>>>>>>>>> Duarte
>>>>>>>>>>
>>>>>>>>>> =========================
>>>>>>>>>>      Duarte Miguel Paulo Molha
>>>>>>>>>>           http://about.me/duarte
>>>>>>>>>> =========================
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, May 13, 2013 at 2:50 PM, Will McLaren <wm2 at ebi.ac.uk>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Guillermo,
>>>>>>>>>>>
>>>>>>>>>>>  Test each plugin individually until you find the one that
>>>>>>>>>>> causes the error. It is highly unlikely that a particular combination of
>>>>>>>>>>> plugins is causing the crash.
>>>>>>>>>>>
>>>>>>>>>>>  Check that there are no "print" (to STDOUT or STDERR)
>>>>>>>>>>> statements in your plugin - forking assumes that code remains silent
>>>>>>>>>>> otherwise it will throw errors like this.
>>>>>>>>>>>
>>>>>>>>>>>  Also, check what, if anything, is cached between runs of your
>>>>>>>>>>> plugin. If you are caching things (for example to avoid re-querying a
>>>>>>>>>>> database), you may need to write storable hooks to ensure the data is
>>>>>>>>>>> getting cached between forks - see
>>>>>>>>>>> https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pmfor an example.
>>>>>>>>>>>
>>>>>>>>>>>  If you still have no luck, send me the code and an input file
>>>>>>>>>>> that recreates the problem.
>>>>>>>>>>>
>>>>>>>>>>>  Regards
>>>>>>>>>>>
>>>>>>>>>>>  Will
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On 13 May 2013 13:18, Guillermo Marco Puche <
>>>>>>>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>   Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I've started to recently having problems with VEP script while
>>>>>>>>>>>> using different plugins (most of them own plugins).
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database
>>>>>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: vcf_input
>>>>>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: biobase
>>>>>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance
>>>>>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: flanking_sequence
>>>>>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: Condel
>>>>>>>>>>>> 2013-05-13 13:59:44 - Output fields redefined (37 defined)
>>>>>>>>>>>> 2013-05-13 13:59:44 - Starting...
>>>>>>>>>>>> 2013-05-13 13:59:45 - Read 3888 variants into buffer
>>>>>>>>>>>> 2013-05-13 13:59:54 - Reading transcript data from cache and/or database
>>>>>>>>>>>> [===============================================]  [ 100% ]
>>>>>>>>>>>> 2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)
>>>>>>>>>>>> 2013-05-13 14:02:38 - Calculating consequences
>>>>>>>>>>>> [===================================>           ]   [ 78% ]
>>>>>>>>>>>> ERROR: Forked process failed
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not getting any other error message. So I cannot debug
>>>>>>>>>>>> properly. I thought my plugins were OK but it's seems they don't. I think
>>>>>>>>>>>> the problem occurs when I use "aa_grantham_distance plugin" together with
>>>>>>>>>>>> "flanking_sequence". I've no idea what could be causing this.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm running VEP on verbose mode but I can't get any usefull
>>>>>>>>>>>> information. How could I debug that?
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Guillermo.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130514/23731b50/attachment.html>


More information about the Dev mailing list