[ensembl-dev] VEP error: Forked process failed.
Duarte Molha
duartemolha at gmail.com
Tue May 14 12:57:13 BST 2013
Another bug using the updated version... now using the --check_alleles and
--check_existing options the script dies at line 4759
Use of uninitialized value in string ne at
/NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4759.
Best regards
Duarte
=========================
Duarte Miguel Paulo Molha
http://about.me/duarte
=========================
On Tue, May 14, 2013 at 11:22 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
> Thanks - try
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.93&root=ensembl
>
> Will
>
>
> On 14 May 2013 11:09, Duarte Molha <duartemolha at gmail.com> wrote:
>
>> It seems the problems are still there :
>>
>> Here is my output:
>>
>> perl variant_effect_predictor.pl --config vep_human.ini -i INPUT.vcf
>> --fork 16
>>
>> 2013-05-14 10:33:38 - Read configuration from vep_human.ini
>> #----------------------------------#
>> # ENSEMBL VARIANT EFFECT PREDICTOR #
>> #----------------------------------#
>>
>> version 71
>>
>> By Will McLaren (wm2 at ebi.ac.uk)
>>
>> Configuration options:
>>
>> ###
>> allow_non_variant 1
>> buffer_size 500000
>> cache 1
>> canonical 1
>> ccds 1
>> check_alleles 1
>> check_existing 1
>> config vep_human.ini
>> core_type core
>> custom
>> /ReferenceData/vep_additional_annotations/Somatic_variation_phenotypes.bed.gz,Somatic,bed,exact
>>
>>
>> /ReferenceData/vep_additional_annotations/dbsnp135_ensembl_variation_phenotype.bed.gz,dbsnp135,bed,exact
>> dir /ReferenceData/vep_cache/
>> domains 1
>> force_overwrite 1
>> fork 16
>> gmaf 1
>> hgnc 1
>> host ensembldb.ensembl.org
>> individual all
>> input_file INPUT.vcf
>> numbers 1
>> plugin Blosum62
>> Condel,/ReferenceData/vep_cache/Plugins/config/Condel/config,b Carol
>> polyphen b
>> port 5306
>> protein 1
>> regulatory 1
>> sift b
>> species homo_sapiens
>> stats HASH(0x35a8000)
>> terms SO
>> toplevel_dir /ReferenceData/vep_cache/
>> verbose 1
>> xref_refseq 1
>>
>> --------------------
>>
>> Will only load v71 databases
>> Species 'homo_sapiens' loaded from database 'homo_sapiens_core_71_37'
>> Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_71_37'
>> Species 'homo_sapiens' loaded from database 'homo_sapiens_vega_71_37'
>> Species 'homo_sapiens' loaded from database
>> 'homo_sapiens_otherfeatures_71_37'
>> Species 'homo_sapiens' loaded from database 'homo_sapiens_rnaseq_71_37'
>> homo_sapiens_variation_71_37 loaded
>> homo_sapiens_funcgen_71_37 loaded
>> Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the following
>> compara databases will be ignored: ensembl_compara_71
>> ensembl_ancestral_71 loaded
>> ensembl_ontology_71 loaded
>> 2013-05-14 10:33:39 - Connected to core version 71 database and variation
>> version 71 database
>> 2013-05-14 10:33:39 - Read existing cache info
>> 2013-05-14 10:33:39 - Loaded plugin: Blosum62
>> 2013-05-14 10:33:39 - Loaded plugin: Condel
>> 2013-05-14 10:33:39 - Loaded plugin: Carol
>> 2013-05-14 10:33:40 - Starting...
>> 2013-05-14 10:33:40 - Detected format of input file as vcf
>> 2013-05-14 10:33:46 - Read 195789 variants into buffer
>> 2013-05-14 10:33:46 - Skipping 67552 non-variant loci
>> 2013-05-14 10:33:46 - Reading transcript data from cache and/or database
>> [======================================================================================================]
>> [ 100% ]
>> 2013-05-14 10:40:19 - Retrieved 189344 transcripts (0 mem, 202901 cached,
>> 0 DB, 13557 duplicates)
>> 2013-05-14 10:40:19 - Reading regulatory data from cache and/or database
>> [======================================================================================================]
>> [ 100% ]
>> 2013-05-14 10:50:09 - Retrieved 872092 regulatory features (0 mem, 872351
>> cached, 0 DB, 259 duplicates)
>> 2013-05-14 10:50:12 - Calculating consequences
>> Use of uninitialized value $_ in pattern match (m//) at
>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1022.
>> Use of uninitialized value $_ in pattern match (m//) at
>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1030.
>> Use of uninitialized value $_ in pattern match (m//) at
>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1037.
>> Use of uninitialized value $_ in pattern match (m//) at
>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1089.
>> Use of uninitialized value $_ in concatenation (.) or string at
>> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1097.
>>
>> ERROR: Forked process failed
>>
>>
>>
>>
>> =========================
>> Duarte Miguel Paulo Molha
>> http://about.me/duarte
>> =========================
>>
>>
>> On Tue, May 14, 2013 at 10:42 AM, Duarte Molha <duartemolha at gmail.com>wrote:
>>
>>> Thanks
>>>
>>> Running a annotation using 16 forks... lets see how it handles :)
>>> I'll report back any issues.
>>>
>>> Thanks for the update
>>>
>>> Duarte
>>>
>>>
>>> =========================
>>> Duarte Miguel Paulo Molha
>>> http://about.me/duarte
>>> =========================
>>>
>>>
>>> On Tue, May 14, 2013 at 10:16 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>
>>>> Stuart, Guillermo, Duarte,
>>>>
>>>> I'm currently working on some code as I stated above to improve
>>>> stability and performance under forking.
>>>>
>>>> I've committed some code to the HEAD of our CVS tree which should help
>>>> the problems you are encountering. You'd all be welcome to test this out,
>>>> with the obvious proviso that this is development code and may contain bugs!
>>>>
>>>> To use this, you should download the copy of VEP.pm from:
>>>>
>>>>
>>>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.92&root=ensembl
>>>>
>>>> and replace the VEP.pm under
>>>> ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils (or just
>>>> Bio/EnsEMBL/Variation/Utils if you use INSTALL.pl) with this one.
>>>>
>>>> This code will appear in production in the next proper release of
>>>> Ensembl.
>>>>
>>>> Regards
>>>>
>>>> Will
>>>>
>>>>
>>>> On 14 May 2013 09:55, Stuart Meacham <sm766 at cam.ac.uk> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I certainly don't want to hijack this thread but it seemed daft to
>>>>> start another. I am also getting forking errors. I don't use any custom
>>>>> plugins and am using a validated VCF as input (with about 600,000
>>>>> variants). Trying to fork more than 4 threads is unstable even on my
>>>>> machine which has 64 cores and half a TB of RAM.
>>>>>
>>>>> I haven't found anything reproducible, however if I do I'll report
>>>>> back to the list.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Stuart
>>>>>
>>>>>
>>>>> On 14/05/2013 09:42, Will McLaren wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> Your aa_grantham_distance plugin is somewhat inefficient - it
>>>>> retrieves the peptide alleles from the HGVS annotation, which itself
>>>>> requires some database fetching and processing to produce. This is why it
>>>>> is slow.
>>>>>
>>>>> You can get the peptides from the transcript variation object:
>>>>>
>>>>> my @peps = split "/",
>>>>> $tva->transcript_variation->pep_allele_string();
>>>>>
>>>>> This will give you single-letter AA codes, but you could either
>>>>> modify your hash or use BioPerl to convert:
>>>>>
>>>>> $seqobj = Bio::PrimarySeq->new ( -seq => $single_letter_aa);
>>>>> $three_letter_aa = Bio::SeqUtils->seq3($seqobj);
>>>>>
>>>>> You should also declare your distances hash in the new() sub and
>>>>> store it on $self; this will also marginally speed up your plugin.
>>>>>
>>>>> Regarding the forking issues, we are working on improving stability
>>>>> under forking.
>>>>>
>>>>> Thanks for your patience
>>>>>
>>>>> Will
>>>>>
>>>>>
>>>>> On 14 May 2013 07:37, Guillermo Marco Puche <
>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm not really sure which one of those plugins is causing the fork
>>>>>> error. I cannot recreate it now running each one of them separately.
>>>>>>
>>>>>> Here are both:
>>>>>>
>>>>>> https://github.com/guillermomarco/vep_plugins_71
>>>>>>
>>>>>> They also slow the calculating consequences process a lot.
>>>>>> aa_grantham_distance.pm is just a hardcoded plugin from one of the
>>>>>> biologists in my work. It was just a pure copy paste and adaptation to make
>>>>>> it work as a VEP plugin. Maybe the problem is in the matrix definition
>>>>>> every time the sub routine is called. I'm not running out of memory nor
>>>>>> CPU. I'm currently using it with 2 threads and buffersize of 500 for a 5000
>>>>>> variant vcf file.
>>>>>>
>>>>>> I'm my honest opinion, I think one (or even both) of those plugins
>>>>>> are slowing so much the calculating process that sometimes the fork just
>>>>>> dies. Like when you have a timeout during to heavy network traffic. So when
>>>>>> you use them together with lot of other plugins like Condel, Consequence,
>>>>>> etc.. they may be causing the process to handle and die.
>>>>>>
>>>>>> Best regards,
>>>>>> Guillermo.
>>>>>>
>>>>>>
>>>>>> On 05/13/2013 03:55 PM, Duarte Molha wrote:
>>>>>>
>>>>>> I also get this error... it is so prevalent and so difficult to
>>>>>> pinpoint what is causing it that I have given up on forking my annotation
>>>>>> process.
>>>>>>
>>>>>> I do think it is related to the number of forks. It seems to crash
>>>>>> less often if you use a low number of forks... anything above 5
>>>>>> will undoubtedly crash the script at least in my experience.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Duarte
>>>>>>
>>>>>> =========================
>>>>>> Duarte Miguel Paulo Molha
>>>>>> http://about.me/duarte
>>>>>> =========================
>>>>>>
>>>>>>
>>>>>> On Mon, May 13, 2013 at 2:50 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>>>
>>>>>>> Hi Guillermo,
>>>>>>>
>>>>>>> Test each plugin individually until you find the one that causes
>>>>>>> the error. It is highly unlikely that a particular combination of plugins
>>>>>>> is causing the crash.
>>>>>>>
>>>>>>> Check that there are no "print" (to STDOUT or STDERR) statements
>>>>>>> in your plugin - forking assumes that code remains silent otherwise it will
>>>>>>> throw errors like this.
>>>>>>>
>>>>>>> Also, check what, if anything, is cached between runs of your
>>>>>>> plugin. If you are caching things (for example to avoid re-querying a
>>>>>>> database), you may need to write storable hooks to ensure the data is
>>>>>>> getting cached between forks - see
>>>>>>> https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pmfor an example.
>>>>>>>
>>>>>>> If you still have no luck, send me the code and an input file that
>>>>>>> recreates the problem.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Will
>>>>>>>
>>>>>>>
>>>>>>> On 13 May 2013 13:18, Guillermo Marco Puche <
>>>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I've started to recently having problems with VEP script while
>>>>>>>> using different plugins (most of them own plugins).
>>>>>>>>
>>>>>>>> 2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database
>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: vcf_input
>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: biobase
>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance
>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: flanking_sequence
>>>>>>>> 2013-05-13 13:59:44 - Loaded plugin: Condel
>>>>>>>> 2013-05-13 13:59:44 - Output fields redefined (37 defined)
>>>>>>>> 2013-05-13 13:59:44 - Starting...
>>>>>>>> 2013-05-13 13:59:45 - Read 3888 variants into buffer
>>>>>>>> 2013-05-13 13:59:54 - Reading transcript data from cache and/or database
>>>>>>>> [===============================================] [ 100% ]
>>>>>>>> 2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)
>>>>>>>> 2013-05-13 14:02:38 - Calculating consequences
>>>>>>>> [===================================> ] [ 78% ]
>>>>>>>> ERROR: Forked process failed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not getting any other error message. So I cannot debug
>>>>>>>> properly. I thought my plugins were OK but it's seems they don't. I think
>>>>>>>> the problem occurs when I use "aa_grantham_distance plugin" together with
>>>>>>>> "flanking_sequence". I've no idea what could be causing this.
>>>>>>>>
>>>>>>>> I'm running VEP on verbose mode but I can't get any usefull
>>>>>>>> information. How could I debug that?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Guillermo.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dev mailing list Dev at ensembl.org
>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130514/06c1ab86/attachment.html>
More information about the Dev
mailing list