[ensembl-dev] Problem running gene build pipeline

Thibaut Hourlier th3 at sanger.ac.uk
Tue May 14 12:33:32 BST 2013


Hi Marc
On 14 May 2013, at 11:26, Marc Hoeppner <mphoeppner at gmail.com> wrote:

> Hi!
> 
> I have been trying to complete a successful run with the gene build pipeline and am still stuck on the raw computes. I have tried running it in 'Local' mode as well as LSF (more on that below). In both cases, the rulemanager throws a bunch of errors I can't pin down. The pipeline is currently set up on a 48-core AMD machine, and I am using the latest checkout of branch 71. The local files for the pipeline configuration live in the sub-folder configs/pipeline-congigs/modules. All relevant folders are in my PERL5LIB. I made what I think to be the relevant modifications to BatchQueue.pm, Databases.pm and so on.
> 
> -> I have tested the individual modules I want to run using the test_RunnableDB script and all of them worked.
> 
> If I run the rulemanager on the full thing tho - or even on the subset of one module, it acts up. Here is my syntax:
> 
> perl /opt/bioinformatics/ensembl/ensembl-pipeline/scripts/rulemanager.pl -dbhost localhost -dbuser username -dbpass password -analysis RepeatMasker -submission_limit 10 -submission_number 10 -once -unlock
> 
the option -submission_limit does not need a parameter, it only says that you want to limit the number of jobs you submit then with -submission_number you give the limit (like you did)

Be careful when using -unlock as it removes the lock in the database and can allow several rulemanager to run which can be dangerous.

> This throws a bunch of errors right up front, starting with
> 
> 
> 1)
> 'Use of uninitialized value $max_retry in numeric le (<=) at /data2/ensembl-test/test2/configs/pipeline-configs/modules/Bio/EnsEMBL/Pipeline/Job.pm line 1029.'
> 
> And that comes for every job (i.e thousands for a full genome). Apparently it doesn't like me limiting the run to a smaller subset? Is that normal..?
Seeing the BatchQueue.pm file will help, but I'm guessing that you haven't set a DEFAULT_RETRIES in your BatchQueue.pm. In the BathQueue.pm file, all values (DEFAULT_*, JOB_*,…) that are "before" the QUEUE_CONFIG array need to be in the file.
You can also set in your analysis hash: retries => 3,


> 
> 2) Every submission (locally as well as LSF) then comes with this the following errors:
> 
> ---------------------------------------------------
> Use of uninitialized value in string eq at /data2/ensembl-test/test2/configs/pipeline-configs/modules/Bio/EnsEMBL/Pipeline/Job.pm line 1047.
> Use of uninitialized value in string eq at /data2/ensembl-test/test2/configs/pipeline-configs/modules/Bio/EnsEMBL/Pipeline/Job.pm line 1047.
> Job: Null submission ID for the following, but continuing: 878
> Use of uninitialized value in numeric ge (>=) at /data2/ensembl-test/test2/configs/pipeline-configs/modules/Bio/EnsEMBL/Pipeline/Job.pm line 541.
> Use of uninitialized value $this_runner in -x at /data2/ensembl-test/test2/configs/pipeline-configs/modules/Bio/EnsEMBL/Pipeline/Job.pm line 335.
> 

Probably the same problem as before, the BatchQueue file is missing some parameters.

> 3) The individual error messages for each RepeatMasker run then are:
> 
> -------------------- EXCEPTION --------------------
> MSG: Problems creating runnable RepeatMasker for contig:vcelegans_test:chrIV_124:1:50000:1 [Can't locate RepeatMasker.pm in @INC (@INC contains: /opt/bioinformatics/ensembl/ensembl/modules /opt/bioinformatics/ensembl/ensembl-compara/modules /opt/bioinformatics/ensembl/ensembl-variation/modules /opt/bioinformatics/ensembl/ensembl-functgenomics/modules /opt/bioinformatics/ensembl/ensembl-analysis/modules /opt/bioinformatics/ensembl/ensembl-pipeline/scripts /opt/bioinformatics/ensembl/ensembl-killllist/modules /data2/ensembl-test/test2/configs/pipeline-configs/modules /usr/bin/tRNAscan_SE /usr/bin/tRNAscan-SE/ /etc/perl /usr/local/lib/perl/5.14.2 /usr/local/share/perl/5.14.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.14 /usr/share/perl/5.14 /usr/local/lib/site_perl .) at /data2/ensembl-test/test2/configs/pipeline-configs/modules/Bio/EnsEMBL/Pipeline/Job.pm line 641.
> ]
> 
> This latest part is what bugs me the most, since the RepeatMasker.pm definitely is in /opt/bioinformatics/ensembl-71/ensembl-analysis/modules/Bio/EnsEMBL/Analysis/RunnableDB/RepeatMasker.pm

Here your path is /opt/bioinformatics/ensembl-71 whereas it is /opt/bioinformatics/ensembl in your PERL5LIB, unless you have a typo in your email

Hope this will help

Thibaut

> 
> So I am pretty much at a loss here.
> 
> Any sort of helpful advice would be greatly appreciated!
> 
> Cheers,
> Marc
> 
> P.S.: Regarding LSF - although probably unrelated to my issues - I should say that I am using openlava instead, which is an open source fork of the original Platform LSF and should feature the same sort of functionality and binaries/commands. Haven't tried the grid engine yet tho.
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list