[ensembl-dev] VEP error: Forked process failed.

Duarte Molha duartemolha at gmail.com
Tue May 14 11:09:38 BST 2013


It seems the problems are still there :

Here is my output:

perl variant_effect_predictor.pl --config vep_human.ini -i INPUT.vcf --fork
16

2013-05-14 10:33:38 - Read configuration from vep_human.ini
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

version 71

By Will McLaren (wm2 at ebi.ac.uk)

Configuration options:

###
allow_non_variant    1
buffer_size                 500000
cache                1
canonical            1
ccds                 1
check_alleles        1
check_existing       1
config               vep_human.ini
core_type            core
custom
/ReferenceData/vep_additional_annotations/Somatic_variation_phenotypes.bed.gz,Somatic,bed,exact


/ReferenceData/vep_additional_annotations/dbsnp135_ensembl_variation_phenotype.bed.gz,dbsnp135,bed,exact
dir                       /ReferenceData/vep_cache/
domains                 1
force_overwrite      1
fork                          16
gmaf                 1
hgnc                 1
host                 ensembldb.ensembl.org
individual           all
input_file           INPUT.vcf
numbers              1
plugin               Blosum62
Condel,/ReferenceData/vep_cache/Plugins/config/Condel/config,b  Carol
polyphen             b
port                 5306
protein              1
regulatory           1
sift                 b
species              homo_sapiens
stats                HASH(0x35a8000)
terms                SO
toplevel_dir         /ReferenceData/vep_cache/
verbose              1
xref_refseq          1

--------------------

Will only load v71 databases
Species 'homo_sapiens' loaded from database 'homo_sapiens_core_71_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_71_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_vega_71_37'
Species 'homo_sapiens' loaded from database
'homo_sapiens_otherfeatures_71_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_rnaseq_71_37'
homo_sapiens_variation_71_37 loaded
homo_sapiens_funcgen_71_37 loaded
Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the following compara
databases will be ignored: ensembl_compara_71
ensembl_ancestral_71 loaded
ensembl_ontology_71 loaded
2013-05-14 10:33:39 - Connected to core version 71 database and variation
version 71 database
2013-05-14 10:33:39 - Read existing cache info
2013-05-14 10:33:39 - Loaded plugin: Blosum62
2013-05-14 10:33:39 - Loaded plugin: Condel
2013-05-14 10:33:39 - Loaded plugin: Carol
2013-05-14 10:33:40 - Starting...
2013-05-14 10:33:40 - Detected format of input file as vcf
2013-05-14 10:33:46 - Read 195789 variants into buffer
2013-05-14 10:33:46 - Skipping 67552 non-variant loci
2013-05-14 10:33:46 - Reading transcript data from cache and/or database
[======================================================================================================]
 [ 100% ]
2013-05-14 10:40:19 - Retrieved 189344 transcripts (0 mem, 202901 cached, 0
DB, 13557 duplicates)
2013-05-14 10:40:19 - Reading regulatory data from cache and/or database
[======================================================================================================]
 [ 100% ]
2013-05-14 10:50:09 - Retrieved 872092 regulatory features (0 mem, 872351
cached, 0 DB, 259 duplicates)
2013-05-14 10:50:12 - Calculating consequences
Use of uninitialized value $_ in pattern match (m//) at
/NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1022.
Use of uninitialized value $_ in pattern match (m//) at
/NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1030.
Use of uninitialized value $_ in pattern match (m//) at
/NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1037.
Use of uninitialized value $_ in pattern match (m//) at
/NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1089.
Use of uninitialized value $_ in concatenation (.) or string at
/NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1097.

ERROR: Forked process failed




=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================


On Tue, May 14, 2013 at 10:42 AM, Duarte Molha <duartemolha at gmail.com>wrote:

> Thanks
>
> Running a annotation using 16 forks... lets see how it handles :)
> I'll report back any issues.
>
> Thanks for the update
>
> Duarte
>
>
> =========================
>      Duarte Miguel Paulo Molha
>          http://about.me/duarte
> =========================
>
>
> On Tue, May 14, 2013 at 10:16 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
>> Stuart, Guillermo, Duarte,
>>
>> I'm currently working on some code as I stated above to improve stability
>> and performance under forking.
>>
>> I've committed some code to the HEAD of our CVS tree which should help
>> the problems you are encountering. You'd all be welcome to test this out,
>> with the obvious proviso that this is development code and may contain bugs!
>>
>> To use this, you should download the copy of VEP.pm from:
>>
>>
>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.92&root=ensembl
>>
>> and replace the VEP.pm under
>> ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils (or just
>> Bio/EnsEMBL/Variation/Utils if you use INSTALL.pl) with this one.
>>
>> This code will appear in production in the next proper release of Ensembl.
>>
>> Regards
>>
>> Will
>>
>>
>> On 14 May 2013 09:55, Stuart Meacham <sm766 at cam.ac.uk> wrote:
>>
>>>  Hi,
>>>
>>> I certainly don't want to hijack this thread but it seemed daft to start
>>> another. I am also getting forking errors. I don't use any custom plugins
>>> and am using a validated VCF as input (with about 600,000 variants). Trying
>>> to fork more than 4 threads is unstable even on my machine which has 64
>>> cores and half a TB of RAM.
>>>
>>> I haven't found anything reproducible, however if I do I'll report back
>>> to the list.
>>>
>>> Thanks
>>>
>>> Stuart
>>>
>>>
>>> On 14/05/2013 09:42, Will McLaren wrote:
>>>
>>> Hello,
>>>
>>>  Your aa_grantham_distance plugin is somewhat inefficient - it
>>> retrieves the peptide alleles from the HGVS annotation, which itself
>>> requires some database fetching and processing to produce. This is why it
>>> is slow.
>>>
>>>  You can get the peptides from the transcript variation object:
>>>
>>>  my @peps = split "/", $tva->transcript_variation->pep_allele_string();
>>>
>>>  This will give you single-letter AA codes, but you could either modify
>>> your hash or use BioPerl to convert:
>>>
>>>  $seqobj = Bio::PrimarySeq->new ( -seq => $single_letter_aa);
>>> $three_letter_aa = Bio::SeqUtils->seq3($seqobj);
>>>
>>>  You should also declare your distances hash in the new() sub and store
>>> it on $self; this will also marginally speed up your plugin.
>>>
>>>  Regarding the forking issues, we are working on improving stability
>>> under forking.
>>>
>>>  Thanks for your patience
>>>
>>>  Will
>>>
>>>
>>>  On 14 May 2013 07:37, Guillermo Marco Puche <
>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>>>  Hello,
>>>>
>>>> I'm not really sure which one of those plugins is causing the fork
>>>> error. I cannot recreate it now running each one of them separately.
>>>>
>>>> Here are both:
>>>>
>>>> https://github.com/guillermomarco/vep_plugins_71
>>>>
>>>> They also slow the calculating consequences process a lot.
>>>> aa_grantham_distance.pm is just a hardcoded plugin from one of the
>>>> biologists in my work. It was just a pure copy paste and adaptation to make
>>>> it work as a VEP plugin. Maybe the problem is in the matrix definition
>>>> every time the sub routine is called. I'm not running out of memory nor
>>>> CPU. I'm currently using it with 2 threads and buffersize of 500 for a 5000
>>>> variant vcf file.
>>>>
>>>> I'm my honest opinion, I think one (or even both) of those plugins are
>>>> slowing so much the calculating process that sometimes the fork just dies.
>>>> Like when you have a timeout during to heavy network traffic. So when you
>>>> use them together with lot of other plugins like Condel, Consequence, etc..
>>>> they may be causing the process to handle and die.
>>>>
>>>> Best regards,
>>>> Guillermo.
>>>>
>>>>
>>>> On 05/13/2013 03:55 PM, Duarte Molha wrote:
>>>>
>>>> I also get this error... it is so prevalent and so difficult to
>>>> pinpoint what is causing it that I have given up on forking my annotation
>>>> process.
>>>>
>>>>  I do think it is related to the number of forks. It seems to crash
>>>> less often if you use a low number of forks... anything above 5
>>>> will undoubtedly crash the script at least in my experience.
>>>>
>>>>  Cheers
>>>>
>>>> Duarte
>>>>
>>>> =========================
>>>>      Duarte Miguel Paulo Molha
>>>>           http://about.me/duarte
>>>> =========================
>>>>
>>>>
>>>> On Mon, May 13, 2013 at 2:50 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>
>>>>> Hi Guillermo,
>>>>>
>>>>>  Test each plugin individually until you find the one that causes the
>>>>> error. It is highly unlikely that a particular combination of plugins is
>>>>> causing the crash.
>>>>>
>>>>>  Check that there are no "print" (to STDOUT or STDERR) statements in
>>>>> your plugin - forking assumes that code remains silent otherwise it will
>>>>> throw errors like this.
>>>>>
>>>>>  Also, check what, if anything, is cached between runs of your
>>>>> plugin. If you are caching things (for example to avoid re-querying a
>>>>> database), you may need to write storable hooks to ensure the data is
>>>>> getting cached between forks - see
>>>>> https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pmfor an example.
>>>>>
>>>>>  If you still have no luck, send me the code and an input file that
>>>>> recreates the problem.
>>>>>
>>>>>  Regards
>>>>>
>>>>>  Will
>>>>>
>>>>>
>>>>>  On 13 May 2013 13:18, Guillermo Marco Puche <
>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>
>>>>>>   Hello,
>>>>>>
>>>>>> I've started to recently having problems with VEP script while using
>>>>>> different plugins (most of them own plugins).
>>>>>>
>>>>>> 2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database
>>>>>> 2013-05-13 13:59:44 - Loaded plugin: vcf_input
>>>>>> 2013-05-13 13:59:44 - Loaded plugin: biobase
>>>>>> 2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance
>>>>>> 2013-05-13 13:59:44 - Loaded plugin: flanking_sequence
>>>>>> 2013-05-13 13:59:44 - Loaded plugin: Condel
>>>>>> 2013-05-13 13:59:44 - Output fields redefined (37 defined)
>>>>>> 2013-05-13 13:59:44 - Starting...
>>>>>> 2013-05-13 13:59:45 - Read 3888 variants into buffer
>>>>>> 2013-05-13 13:59:54 - Reading transcript data from cache and/or database
>>>>>> [===============================================]  [ 100% ]
>>>>>> 2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)
>>>>>> 2013-05-13 14:02:38 - Calculating consequences
>>>>>> [===================================>           ]   [ 78% ]
>>>>>> ERROR: Forked process failed
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm not getting any other error message. So I cannot debug properly.
>>>>>> I thought my plugins were OK but it's seems they don't. I think the problem
>>>>>> occurs when I use "aa_grantham_distance plugin" together with
>>>>>> "flanking_sequence". I've no idea what could be causing this.
>>>>>>
>>>>>> I'm running VEP on verbose mode but I can't get any usefull
>>>>>> information. How could I debug that?
>>>>>>
>>>>>> Best regards,
>>>>>> Guillermo.
>>>>>>
>>>>>>
>>>>>>  _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130514/2d1a114c/attachment.html>


More information about the Dev mailing list