[ensembl-dev] VEP error: Forked process failed.
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Thu May 16 12:29:45 BST 2013
Hello,
I got lost in this topic.
So if I understand well.. to fix the VEP forking issue I must can
re-download the VEP script from ensembl website?
Thank you !
Best regards,
Guillermo.
On 05/15/2013 01:35 PM, Will McLaren wrote:
> This is now fixed on branch and head too.
>
> Will
>
>
> On 14 May 2013 17:30, Duarte Molha <duartemolha at gmail.com
> <mailto:duartemolha at gmail.com>> wrote:
>
> Sorry Will... today you have loads of people bugging you :-)
>
> I just updated to the latest VEP you pointed me to ... and now the
> fail in on line 1307
>
> Use of uninitialized value $line in join or string at
> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm line 1307.
>
> Cheers
>
> Duarte
>
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
>
> On Tue, May 14, 2013 at 4:26 PM, Will McLaren <wm2 at ebi.ac.uk
> <mailto:wm2 at ebi.ac.uk>> wrote:
>
> Hi Duarte,
>
> You are correct, it was the --individual flag causing the problem.
>
> I have committed a fix to the 71 branch (does not include
> forking fixes) and to head (does include forking fixes,
> download from
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.94&root=ensembl).
>
> Regards
>
> Will
>
>
> On 14 May 2013 15:17, Duarte Molha <duartemolha at gmail.com
> <mailto:duartemolha at gmail.com>> wrote:
>
> However I believe it must be something related to the
> --individual [all|ind list] flag
> And does not seem to be related to the --forks argument
> since it also thoughs that error when running single
> threaded.
>
> However in single thread mode it simply displays the error
> and keeps going and with multiple forks the process dies
> entirely.
> My vcf contains 3 samples genotypes and I have included
> the --individual all option.
>
> Best regards
> Duarte
>
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
>
> On Tue, May 14, 2013 at 2:00 PM, Duarte Molha
> <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>
> Hi Will
>
> Unfortunately the script does not report in which line
> it fails and I cannot provide you with the entire file
> since it is private data.
> Is there a way of reporting the line >
>
> Thanks
>
> Duarte
>
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
>
> On Tue, May 14, 2013 at 1:05 PM, Will McLaren
> <wm2 at ebi.ac.uk <mailto:wm2 at ebi.ac.uk>> wrote:
>
> Hi Duarte,
>
> Do you have some input that causes this error?
>
> Thanks
>
> Will
>
>
> On 14 May 2013 12:57, Duarte Molha
> <duartemolha at gmail.com
> <mailto:duartemolha at gmail.com>> wrote:
>
> Another bug using the updated version... now
> using the --check_alleles and --check_existing
> options the script dies at line 4759
>
> Use of uninitialized value in string ne at
> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm
> line 4759.
>
>
> Best regards
>
> Duarte
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
>
> On Tue, May 14, 2013 at 11:22 AM, Will McLaren
> <wm2 at ebi.ac.uk <mailto:wm2 at ebi.ac.uk>> wrote:
>
> Thanks - try
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.93&root=ensembl
>
> Will
>
>
> On 14 May 2013 11:09, Duarte Molha
> <duartemolha at gmail.com
> <mailto:duartemolha at gmail.com>> wrote:
>
> It seems the problems are still there :
>
> Here is my output:
>
> perl variant_effect_predictor.pl
> <http://variant_effect_predictor.pl>
> --config vep_human.ini -i INPUT.vcf
> --fork 16
>
> 2013-05-14 10:33:38 - Read
> configuration from vep_human.ini
> #----------------------------------#
> # ENSEMBL VARIANT EFFECT PREDICTOR #
> #----------------------------------#
>
> version 71
>
> By Will McLaren (wm2 at ebi.ac.uk
> <mailto:wm2 at ebi.ac.uk>)
>
> Configuration options:
>
> ###
> allow_non_variant 1
> buffer_size 500000
> cache 1
> canonical 1
> ccds 1
> check_alleles 1
> check_existing 1
> config vep_human.ini
> core_type core
> custom
> /ReferenceData/vep_additional_annotations/Somatic_variation_phenotypes.bed.gz,Somatic,bed,exact
>
> /ReferenceData/vep_additional_annotations/dbsnp135_ensembl_variation_phenotype.bed.gz,dbsnp135,bed,exact
> dir /ReferenceData/vep_cache/
> domains 1
> force_overwrite 1
> fork 16
> gmaf 1
> hgnc 1
> host ensembldb.ensembl.org
> <http://ensembldb.ensembl.org>
> individual all
> input_file INPUT.vcf
> numbers 1
> plugin Blosum62
> Condel,/ReferenceData/vep_cache/Plugins/config/Condel/config,b
> Carol
> polyphen b
> port 5306
> protein 1
> regulatory 1
> sift b
> species homo_sapiens
> stats HASH(0x35a8000)
> terms SO
> toplevel_dir /ReferenceData/vep_cache/
> verbose 1
> xref_refseq 1
>
> --------------------
>
> Will only load v71 databases
> Species 'homo_sapiens' loaded from
> database 'homo_sapiens_core_71_37'
> Species 'homo_sapiens' loaded from
> database 'homo_sapiens_cdna_71_37'
> Species 'homo_sapiens' loaded from
> database 'homo_sapiens_vega_71_37'
> Species 'homo_sapiens' loaded from
> database
> 'homo_sapiens_otherfeatures_71_37'
> Species 'homo_sapiens' loaded from
> database 'homo_sapiens_rnaseq_71_37'
> homo_sapiens_variation_71_37 loaded
> homo_sapiens_funcgen_71_37 loaded
> Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not
> found so the following compara
> databases will be ignored:
> ensembl_compara_71
> ensembl_ancestral_71 loaded
> ensembl_ontology_71 loaded
> 2013-05-14 10:33:39 - Connected to
> core version 71 database and variation
> version 71 database
> 2013-05-14 10:33:39 - Read existing
> cache info
> 2013-05-14 10:33:39 - Loaded plugin:
> Blosum62
> 2013-05-14 10:33:39 - Loaded plugin:
> Condel
> 2013-05-14 10:33:39 - Loaded plugin: Carol
> 2013-05-14 10:33:40 - Starting...
> 2013-05-14 10:33:40 - Detected format
> of input file as vcf
> 2013-05-14 10:33:46 - Read 195789
> variants into buffer
> 2013-05-14 10:33:46 - Skipping 67552
> non-variant loci
> 2013-05-14 10:33:46 - Reading
> transcript data from cache and/or database
> [======================================================================================================]
> [ 100% ]
> 2013-05-14 10:40:19 - Retrieved 189344
> transcripts (0 mem, 202901 cached, 0
> DB, 13557 duplicates)
> 2013-05-14 10:40:19 - Reading
> regulatory data from cache and/or database
> [======================================================================================================]
> [ 100% ]
> 2013-05-14 10:50:09 - Retrieved 872092
> regulatory features (0 mem, 872351
> cached, 0 DB, 259 duplicates)
> 2013-05-14 10:50:12 - Calculating
> consequences
> Use of uninitialized value $_ in
> pattern match (m//) at
> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm
> line 1022.
> Use of uninitialized value $_ in
> pattern match (m//) at
> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm
> line 1030.
> Use of uninitialized value $_ in
> pattern match (m//) at
> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm
> line 1037.
> Use of uninitialized value $_ in
> pattern match (m//) at
> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm
> line 1089.
> Use of uninitialized value $_ in
> concatenation (.) or string at
> /NGS_Test/duarte/vep_71/Bio/EnsEMBL/Variation/Utils/VEP.pm
> line 1097.
>
> ERROR: Forked process failed
>
>
>
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
>
> On Tue, May 14, 2013 at 10:42 AM,
> Duarte Molha <duartemolha at gmail.com
> <mailto:duartemolha at gmail.com>> wrote:
>
> Thanks
>
> Running a annotation using 16
> forks... lets see how it handles :)
> I'll report back any issues.
>
> Thanks for the update
>
> Duarte
>
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
>
> On Tue, May 14, 2013 at 10:16 AM,
> Will McLaren <wm2 at ebi.ac.uk
> <mailto:wm2 at ebi.ac.uk>> wrote:
>
> Stuart, Guillermo, Duarte,
>
> I'm currently working on some
> code as I stated above to
> improve stability and
> performance under forking.
>
> I've committed some code to
> the HEAD of our CVS tree which
> should help the problems you
> are encountering. You'd all be
> welcome to test this out, with
> the obvious proviso that this
> is development code and may
> contain bugs!
>
> To use this, you should
> download the copy of VEP.pm from:
>
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?revision=1.92&root=ensembl
>
> and replace the VEP.pm under
> ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils
> (or just
> Bio/EnsEMBL/Variation/Utils if
> you use INSTALL.pl) with this one.
>
> This code will appear in
> production in the next proper
> release of Ensembl.
>
> Regards
>
> Will
>
>
> On 14 May 2013 09:55, Stuart
> Meacham <sm766 at cam.ac.uk
> <mailto:sm766 at cam.ac.uk>> wrote:
>
> Hi,
>
> I certainly don't want to
> hijack this thread but it
> seemed daft to start
> another. I am also getting
> forking errors. I don't
> use any custom plugins and
> am using a validated VCF
> as input (with about
> 600,000 variants). Trying
> to fork more than 4
> threads is unstable even
> on my machine which has 64
> cores and half a TB of RAM.
>
> I haven't found anything
> reproducible, however if I
> do I'll report back to the
> list.
>
> Thanks
>
> Stuart
>
>
> On 14/05/2013 09:42, Will
> McLaren wrote:
>> Hello,
>>
>> Your aa_grantham_distance
>> plugin is somewhat
>> inefficient - it
>> retrieves the peptide
>> alleles from the HGVS
>> annotation, which itself
>> requires some database
>> fetching and processing
>> to produce. This is why
>> it is slow.
>>
>> You can get the peptides
>> from the transcript
>> variation object:
>>
>> my @peps = split "/",
>> $tva->transcript_variation->pep_allele_string();
>>
>> This will give you
>> single-letter AA codes,
>> but you could either
>> modify your hash or use
>> BioPerl to convert:
>>
>> $seqobj =
>> Bio::PrimarySeq->new (
>> -seq => $single_letter_aa);
>> $three_letter_aa =
>> Bio::SeqUtils->seq3($seqobj);
>>
>>
>> You should also declare
>> your distances hash in
>> the new() sub and store
>> it on $self; this will
>> also marginally speed up
>> your plugin.
>>
>> Regarding the forking
>> issues, we are working on
>> improving stability under
>> forking.
>>
>> Thanks for your patience
>>
>> Will
>>
>>
>> On 14 May 2013 07:37,
>> Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com
>> <mailto:guillermo.marco at sistemasgenomicos.com>>
>> wrote:
>>
>> Hello,
>>
>> I'm not really sure
>> which one of those
>> plugins is causing
>> the fork error. I
>> cannot recreate it
>> now running each one
>> of them separately.
>>
>> Here are both:
>>
>> https://github.com/guillermomarco/vep_plugins_71
>>
>> They also slow the
>> calculating
>> consequences process
>> a lot.
>> aa_grantham_distance.pm
>> <http://aa_grantham_distance.pm>
>> is just a hardcoded
>> plugin from one of
>> the biologists in my
>> work. It was just a
>> pure copy paste and
>> adaptation to make it
>> work as a VEP plugin.
>> Maybe the problem is
>> in the matrix
>> definition every time
>> the sub routine is
>> called. I'm not
>> running out of memory
>> nor CPU. I'm
>> currently using it
>> with 2 threads and
>> buffersize of 500 for
>> a 5000 variant vcf file.
>>
>> I'm my honest
>> opinion, I think one
>> (or even both) of
>> those plugins are
>> slowing so much the
>> calculating process
>> that sometimes the
>> fork just dies. Like
>> when you have a
>> timeout during to
>> heavy network
>> traffic. So when you
>> use them together
>> with lot of other
>> plugins like Condel,
>> Consequence, etc..
>> they may be causing
>> the process to handle
>> and die.
>>
>> Best regards,
>> Guillermo.
>>
>>
>> On 05/13/2013 03:55
>> PM, Duarte Molha wrote:
>>> I also get this
>>> error... it is so
>>> prevalent and
>>> so difficult to
>>> pinpoint what is
>>> causing it that I
>>> have given up on
>>> forking my
>>> annotation process.
>>>
>>> I do think it is
>>> related to the
>>> number of forks. It
>>> seems to crash less
>>> often if you use a
>>> low number of
>>> forks... anything
>>> above 5
>>> will undoubtedly crash
>>> the script at least
>>> in my experience.
>>>
>>> Cheers
>>>
>>> Duarte
>>>
>>> =========================
>>> Duarte Miguel
>>> Paulo Molha
>>> http://about.me/duarte
>>> =========================
>>>
>>>
>>> On Mon, May 13, 2013
>>> at 2:50 PM, Will
>>> McLaren
>>> <wm2 at ebi.ac.uk
>>> <mailto:wm2 at ebi.ac.uk>>
>>> wrote:
>>>
>>> Hi Guillermo,
>>>
>>> Test each plugin
>>> individually
>>> until you find
>>> the one that
>>> causes the
>>> error. It is
>>> highly unlikely
>>> that a
>>> particular
>>> combination of
>>> plugins is
>>> causing the crash.
>>>
>>> Check that there
>>> are no "print"
>>> (to STDOUT or
>>> STDERR)
>>> statements in
>>> your plugin -
>>> forking assumes
>>> that code
>>> remains silent
>>> otherwise it
>>> will throw
>>> errors like this.
>>>
>>> Also, check
>>> what, if
>>> anything, is
>>> cached between
>>> runs of your
>>> plugin. If you
>>> are caching
>>> things (for
>>> example to avoid
>>> re-querying a
>>> database), you
>>> may need to
>>> write storable
>>> hooks to ensure
>>> the data is
>>> getting cached
>>> between forks -
>>> see
>>> https://github.com/ensembl-variation/VEP_plugins/blob/master/ProteinSeqs.pm
>>> for an example.
>>>
>>> If you still
>>> have no luck,
>>> send me the code
>>> and an input
>>> file that
>>> recreates the
>>> problem.
>>>
>>> Regards
>>>
>>> Will
>>>
>>>
>>> On 13 May 2013
>>> 13:18, Guillermo
>>> Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com
>>> <mailto:guillermo.marco at sistemasgenomicos.com>>
>>> wrote:
>>>
>>> Hello,
>>>
>>> I've started
>>> to recently
>>> having
>>> problems
>>> with VEP
>>> script while
>>> using
>>> different
>>> plugins
>>> (most of
>>> them own
>>> plugins).
>>>
>>> 2013-05-13 13:59:44 - Connected to core version 71 database and variation version 71 database
>>> 2013-05-13 13:59:44 - Loaded plugin: vcf_input
>>> 2013-05-13 13:59:44 - Loaded plugin: biobase
>>> 2013-05-13 13:59:44 - Loaded plugin: aa_grantham_distance
>>> 2013-05-13 13:59:44 - Loaded plugin: flanking_sequence
>>> 2013-05-13 13:59:44 - Loaded plugin: Condel
>>> 2013-05-13 13:59:44 - Output fields redefined (37 defined)
>>> 2013-05-13 13:59:44 - Starting...
>>> 2013-05-13 13:59:45 - Read 3888 variants into buffer
>>> 2013-05-13 13:59:54 - Reading transcript data from cache and/or database
>>> [===============================================] [ 100% ]
>>> 2013-05-13 14:02:38 - Retrieved 6463 transcripts (0 mem, 0 cached, 13743 DB, 7280 duplicates)
>>> 2013-05-13 14:02:38 - Calculating consequences
>>> [===================================> ] [ 78% ]
>>> ERROR: Forked process failed
>>>
>>>
>>> I'm not
>>> getting any
>>> other error
>>> message. So
>>> I cannot
>>> debug
>>> properly. I
>>> thought my
>>> plugins were
>>> OK but it's
>>> seems they
>>> don't. I
>>> think the
>>> problem
>>> occurs when
>>> I use
>>> "aa_grantham_distance
>>> plugin"
>>> together
>>> with
>>> "flanking_sequence".
>>> I've no idea
>>> what could
>>> be causing this.
>>>
>>> I'm running
>>> VEP on
>>> verbose mode
>>> but I can't
>>> get any
>>> usefull
>>> information.
>>> How could I
>>> debug that?
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>>
>>> _______________________________________________
>>> Dev mailing
>>> list
>>> Dev at ensembl.org
>>> <mailto:Dev at ensembl.org>
>>> Posting
>>> guidelines
>>> and
>>> subscribe/unsubscribe
>>> info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl
>>> Blog:
>>> http://www.ensembl.info/
>>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> <mailto:Dev at ensembl.org>
>> Posting guidelines
>> and
>> subscribe/unsubscribe
>> info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:
>> http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and
> subscribe/unsubscribe
> info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog:
> http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and
> subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog:
> http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and
> subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and
> subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe
> info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130516/5e4105df/attachment.html>
More information about the Dev
mailing list