[ensembl-dev] Gene build pipeline SQL deadlock

Tue May 14 17:20:09 BST 2013

Dear list,

a pipeline problem again, sorry ;)
So I fixed my earlier issue (turned out to be related to the logic_name 
and case sensitivity...), but when running a minimal raw compute stage 
(RepeatMasker and tRNAscan), the pipeline deadlocks after a while:

---------------------------------------------------
DBD::mysql::st execute failed: Deadlock found when trying to get lock; 
try restarting transaction at 
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/DBSQL/JobAdaptor.pm 
line 652.

-------------------- EXCEPTION --------------------
MSG: ERROR running job 2044 RepeatMasker 
/data2/ensembl-test/test2/output/RepeatMasker/6/contig:vcelegans_test:chrIII_80:1:50000:1.RepeatMasker.0.err 
[
-------------------- EXCEPTION --------------------
MSG: Error setting status to SUBMITTED
STACK Bio::EnsEMBL::Pipeline::DBSQL::JobAdaptor::set_status 
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/DBSQL/JobAdaptor.pm:677
STACK Bio::EnsEMBL::Pipeline::Job::set_status 
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/Job.pm:771
STACK Bio::EnsEMBL::Pipeline::Job::flush_runs 
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/Job.pm:505
STACK Bio::EnsEMBL::Pipeline::Job::batch_runRemote 
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/Job.pm:541
STACK (eval) 
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/RuleManager.pm:694
STACK Bio::EnsEMBL::Pipeline::RuleManager::can_job_run 
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/RuleManager.pm:692
STACK toplevel 
/opt/bioinformatics/ensembl-70/ensembl-pipeline/scripts/rulemanager.pl:336
Date (localtime)    = Tue May 14 18:14:34 2013
Ensembl API version = 70

Apparently, two processes are trying to update the same entry. How can 
this be avoided? Any suggestions an the level of the pipeline config, or 
using another SQL engine...?
My setup is a 48-core single-node opteron LSF cluster (for testing 
purposes) and the contig-level jobs finish quite rapidly, of course - 
but I am surprised that there seems to be no checks for this sort of thing.

Cheers,

Marc