[ensembl-dev] VEP error: Forked process failed.

Duarte Molha Duarte.Molha at ogt.com
Fri May 17 16:06:32 BST 2013


I believe you have to use the version in HEAD from the CSV repository.

Please correct me if I am wrong Will.

Best regards

Duarte


From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Guillermo Marco Puche
Sent: 16 May 2013 12:30
To: Ensembl developers list
Subject: Re: [ensembl-dev] VEP error: Forked process failed.

Hello,

I got lost in this topic.
So if I understand well.. to fix the VEP forking issue I must can re-download the VEP script from ensembl website?

Thank you !

Best regards,
Guillermo.

On 05/15/2013 01:35 PM, Will McLaren wrote:
This is now fixed on branch and head too.

Will

On 14 May 2013 17:30, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
Sorry Will... today you have loads of people bugging you :-)

I just updated to the latest VEP you pointed me to ... and now the fail in on line 1307

Use of uninitialized value $line in join or string at /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1307.

Cheers

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, May 14, 2013 at 4:26 PM, Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>> wrote:
Hi Duarte,

You are correct, it was the --individual flag causing the problem.

I have committed a fix to the 71 branch (does not include forking fixes) and to head (does include forking fixes, download from http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.94&root=ensembl).

Regards

Will

On 14 May 2013 15:17, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
However I believe it must be something related to the --individual [all|ind list] flag
And does not seem to be related to the --forks argument since it also thoughs that error when running single threaded.

However in single thread mode it simply displays the error and keeps going and with multiple forks the process dies entirely.

My vcf contains 3 samples genotypes and I have included the --individual all option.

Best regards
Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, May 14, 2013 at 2:00 PM, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
Hi Will

Unfortunately the script does not report in which line it fails and I cannot provide you with the entire file since it is private data.
Is there a way of reporting the line >

Thanks

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, May 14, 2013 at 1:05 PM, Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>> wrote:
Hi Duarte,

Do you have some input that causes this error?

Thanks

Will

On 14 May 2013 12:57, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
Another bug using the updated version... now using the --check_alleles and --check_existing options the script dies at line 4759

Use of uninitialized value in string ne at /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4759.


Best regards

Duarte

=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, May 14, 2013 at 11:22 AM, Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>> wrote:
Thanks - try http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.93&root=ensembl

Will

On 14 May 2013 11:09, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
It seems the problems are still there :

Here is my output:

perl variant_effect_predictor.pl<http://variant_effect_predictor.pl> --config vep_human.ini -i INPUT.vcf --fork 16

2013-05-14 10:33:38 - Read configuration from vep_human.ini
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

version 71

By Will McLaren (wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>)

Configuration options:

###
allow_non_variant    1
buffer_size                 500000
cache                1
canonical            1
ccds                 1
check_alleles        1
check_existing       1
config               vep_human.ini
core_type            core
custom               /ReferenceData/vep_additional_annotations/Somatic_variation_phenotypes.bed.gz,Somatic,bed,exact
                            /ReferenceData/vep_additional_annotations/dbsnp135_ensembl_variation_phenotype.bed.gz,dbsnp135,bed,exact
dir                       /ReferenceData/vep_cache/
domains                 1
force_overwrite      1
fork                          16
gmaf                 1
hgnc                 1
host                 ensembldb.ensembl.org<http://ensembldb.ensembl.org>
individual           all
input_file           INPUT.vcf
numbers              1
plugin               Blosum62   Condel,/ReferenceData/vep_cache/Plugins/config/Condel/config,b  Carol
polyphen             b
port                 5306
protein              1
regulatory           1
sift                 b
species              homo_sapiens
stats                HASH(0x35a8000)
terms                SO
toplevel_dir         /ReferenceData/vep_cache/
verbose              1
xref_refseq          1

--------------------

Will only load v71 databases
Species 'homo_sapiens' loaded from database 'homo_sapiens_core_71_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_71_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_vega_71_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_otherfeatures_71_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_rnaseq_71_37'
homo_sapiens_variation_71_37 loaded
homo_sapiens_funcgen_71_37 loaded
Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the following compara databases will be ignored: ensembl_compara_71
ensembl_ancestral_71 loaded
ensembl_ontology_71 loaded
2013-05-14 10:33:39 - Connected to core version 71 database and variation version 71 database
2013-05-14 10:33:39 - Read existing cache info
2013-05-14 10:33:39 - Loaded plugin: Blosum62
2013-05-14 10:33:39 - Loaded plugin: Condel
2013-05-14 10:33:39 - Loaded plugin: Carol
2013-05-14 10:33:40 - Starting...
2013-05-14 10:33:40 - Detected format of input file as vcf
2013-05-14 10:33:46 - Read 195789 variants into buffer
2013-05-14 10:33:46 - Skipping 67552 non-variant loci
2013-05-14 10:33:46 - Reading transcript data from cache and/or database
[======================================================================================================]  [ 100% ]
2013-05-14 10:40:19 - Retrieved 189344 transcripts (0 mem, 202901 cached, 0 DB, 13557 duplicates)
2013-05-14 10:40:19 - Reading regulatory data from cache and/or database
[======================================================================================================]  [ 100% ]
2013-05-14 10:50:09 - Retrieved 872092 regulatory features (0 mem, 872351 cached, 0 DB, 259 duplicates)
2013-05-14 10:50:12 - Calculating consequences
Use of uninitialized value $_ in pattern match (m//) at /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1022.
Use of uninitialized value $_ in pattern match (m//) at /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1030.
Use of uninitialized value $_ in pattern match (m//) at /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1037.
Use of uninitialized value $_ in pattern match (m//) at /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1089.
Use of uninitialized value $_ in concatenation (.) or string at /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1097.

ERROR: Forked process failed




=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, May 14, 2013 at 10:42 AM, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
Thanks

Running a annotation using 16 forks... lets see how it handles :)
I'll report back any issues.

Thanks for the update

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, May 14, 2013 at 10:16 AM, Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>> wrote:
Stuart, Guillermo, Duarte,

I'm currently working on some code as I stated above to improve stability and performance under forking.

I've committed some code to the HEAD of our CVS tree which should help the problems you are encountering. You'd all be welcome to test this out, with the obvious proviso that this is development code and may contain bugs!

To use this, you should download the copy of VEP.pm from:

http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.92&root=ensembl

and replace the VEP.pm under ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils (or just Bio/EnsEMBL/Variation/Utils if you use INSTALL.pl) with this one.

This code will appear in production in the next proper release of Ensembl.

Regards

Will

On 14 May 2013 09:55, Stuart Meacham <sm766 at cam.ac.uk<mailto:sm766 at cam.ac.uk>> wrote:
Hi,

I certainly don't want to hijack this thread but it seemed daft to start another. I am also getting forking errors. I don't use any custom plugins and am using a validated VCF as input (with about 600,000 variants). Trying to fork more than 4 threads is unstable even on my machine which has 64 cores and half a TB of RAM.

I haven't found anything reproducible, however if I do I'll report back to the list.

Thanks

Stuart


On 14/05/2013 09:42, Will McLaren wrote:
Hello,

Your aa_grantham_distance plugin is somewhat inefficient - it retrieves the peptide alleles from the HGVS annotation, which itself requires some database fetching and processing to produce. This is why it is slow.

You can get the peptides from the transcript variation object:

my @peps = split "/", $tva->transcript_variation->pep_allele_string();

This will give you single-letter AA codes, but you could either modify your hash or use BioPerl to convert:

$seqobj = Bio::PrimarySeq->new ( -seq => $single_letter_aa);
$three_letter_aa = Bio::SeqUtils->seq3($seqobj);

You should also declare your distances hash in the new() sub and store it on $self; this will also marginally speed up your plugin.

Regarding the forking issues, we are working on improving stability under forking.

Thanks for your patience

Will

On 14 May 2013 07:37, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hello,

I'm not really sure which one of those plugins is causing the fork error. I cannot recreate it now running each one of them separately.

Here are both:

https://github.com/guillermomarco/vep_plugins_71

They also slow the calculating consequences process a lot. aa_grantham_distance.pm<http://aa_grantham_distance.pm> is just a hardcoded plugin from one of the biologists in my work. It was just a pure copy paste and adaptation to make it work as a VEP plugin. Maybe the problem is in the matrix definition every time the sub routine is called. I'm not running out of memory nor CPU. I'm currently using it with 2 threads and buffersize of 500 for a 5000 variant vcf file.

I'm my honest opinion, I think one (or even both) of those plugins are slowing so much the calculating process that sometimes the fork just dies. Like when you have a timeout during to heavy network traffic. So when you use them together with lot of other plugins like Condel, Consequence, etc.. they may be causing the process to handle and die.

Best regards,
Guillermo.


On 05/13/2013 03:55 PM, Duarte Molha wrote:
I also get this error... it is so prevalent and so difficult to pinpoint what is causing it that I have given up on forking my annotation process.

I do think it is related to the number of forks. It seems to crash less often if you use a low number of forks... anything above 5 will undoubtedly crash the script at least in my experience.

Cheers

Duarte

=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Mon, May 13, 2013 at 2:50 PM, Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>> wrote:
Hi Guillermo,

Test each plugin individually until you find the one that causes the error. It is highly unlikely that a particular combination of plugins is causing the crash.

Check that there are no "print" (to STDOUT or STDERR) statements in your plugin - forking assumes that code remains silent otherwise it will throw errors like this.

Also, check what, if anything, is cached between runs of your plugin. If you are caching things (for example to avoid re-querying a database), you may need to write storable hooks to ensure the data is getting cached between forks - see https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pm for an example.

If you still have no luck, send me the code and an input file that recreates the problem.

Regards

Will

On 13 May 2013 13:18, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hello,

I've started to recently having problems with VEP script while using different plugins (most of them own plugins).

2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database

2013-05-13 13:59:44 - Loaded plugin: vcf_input

2013-05-13 13:59:44 - Loaded plugin: biobase

2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance

2013-05-13 13:59:44 - Loaded plugin: flanking_sequence

2013-05-13 13:59:44 - Loaded plugin: Condel

2013-05-13 13:59:44 - Output fields redefined (37 defined)

2013-05-13 13:59:44 - Starting...

2013-05-13 13:59:45 - Read 3888 variants into buffer

2013-05-13 13:59:54 - Reading transcript data from cache and/or database

[===============================================]  [ 100% ]

2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)

2013-05-13 14:02:38 - Calculating consequences

[===================================>           ]   [ 78% ]

ERROR: Forked process failed




I'm not getting any other error message. So I cannot debug properly. I thought my plugins were OK but it's seems they don't. I think the problem occurs when I use "aa_grantham_distance plugin" together with "flanking_sequence". I've no idea what could be causing this.

I'm running VEP on verbose mode but I can't get any usefull information. How could I debug that?

Best regards,
Guillermo.

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/




_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/





_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/


________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130517/576a9b1f/attachment.html>


More information about the Dev mailing list