[ensembl-dev] ensembl-funcgen SubmitPeaks error

Tue May 20 15:46:44 BST 2014

Hi Lel

In this version of the pipeline, the replicate definition was done by the subdirectory naming. IN the experiment directory, you would need to create numbered directories, each with the relevant fastq file in.  e.g.

	experiment_dir/1/replicate_1.fastq.gz
	experiment_dir/2/replicate_2.fastq.gz

This might be your problem?

We have since moved away from this in favour of a tracking DB, which is much richer in meta data. We are currently finishing development of this in parallel with an entirely new analysis pipeline which will be used for release 76. (I know you know this Lel, but for others out there)

Nathan Johnson

Ensembl Regulation
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

http://www.ensembl.info/
http://twitter.com/#!/ensembl
https://www.facebook.com/Ensembl.org

On 20 May 2014, at 15:03, Lel Eory <lel.eory at roslin.ed.ac.uk> wrote:

> Dear All,
> 
> I try to run some ChIP-seq analyses with the ensembl-functgenomics pipeline (version 72) using ensembl-hive (version lg4_pre_rel72_20130423).
> In the efg sequencing environment I have added the peak datasets with AddPeakDataSets and try to run beekeeper.pl to set-up the peak analysis pipelines.
> But the setup_pipeline step fails with the following (full beekeeper.pl output is at the end of this e-mail):
> 
> beekeeper.pl -url $DBURL/leory2_peaks_lel_sus_scrofa_funcgen_72_102 -hive_log_dir hive_log_dir -run
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> Storing new InputSet:   piPSC_H3K27me3_Xiao
> DBD::mysql::st execute failed: Column 'replicate' cannot be null at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/InputSetAdaptor.pm line 380.
> 
> job 1 : died in status 'RUN' for the following reason: DBD::mysql::st execute failed: Column 'replicate' cannot be null at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/InputSetAdaptor.pm line 380.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Where should I specify the number of replicates (in this case it is just 1) for setup_pipeline within the peak analysis step or where does setup-pipeline get this value from?
> 
> Thank you.
> 
> Kind regards,
> Lel
> 
> 
> 
> beekeeper.pl -url $DBURL/leory2_peaks_lel_sus_scrofa_funcgen_72_102 -hive_log_dir hive_log_dir -run
> 
> ~~~~~~~~~~~~~ beekeeper.pl output ~~~~~~~~~~~~~
> 
>       ======= beekeeper loop ** 1 **==========
>      GarbageCollector:       Checking for lost Workers...
>      GarbageCollector:       [Queen:] we have 0 Workers alive.
>      setup_pipeline             ( 1)     LOADING jobs(Sem:0, Rdy:3, InProg:0, Done+Pass:0, Fail:0)=3 Ave_msec:0, workers(Running:0, Reqired:0)   h.cap:1  a.cap:-  (sync'd 1400493652 sec ago)
>      run_peaks                  ( 2)       EMPTY jobs(Sem:0, Rdy:0, InProg:0, Done+Pass:0, Fail:0)=0 Ave_msec:0, workers(Running:0, Reqired:0)   h.cap:10  a.cap:-  (sync'd 1400493652 sec ago)
>      run_peaks_wide             ( 3)       EMPTY jobs(Sem:0, Rdy:0, InProg:0, Done+Pass:0, Fail:0)=0 Ave_msec:0, workers(Running:0, Reqired:0)   h.cap:10  a.cap:-  (sync'd 1400493652 sec ago)
>      run_macs                   ( 4)       EMPTY jobs(Sem:0, Rdy:0, InProg:0, Done+Pass:0, Fail:0)=0 Ave_msec:0, workers(Running:0, Reqired:0)   h.cap:10  a.cap:-  (sync'd 1400493652 sec ago)
> 
>        ===== Stats of live Workers according to the Queen: ======
>               ======= TOTAL ======= : 0 workers
> 
>      setup_pipeline             ( 1)       READY jobs(Sem:0, Rdy:3, InProg:0, Done+Pass:0, Fail:0)=3 Ave_msec:0, workers(Running:0, Reqired:3)   h.cap:1  a.cap:-  (sync'd 0 sec ago)
>      Before checking the Valley for pending jobs, Scheduler allocated 1 x LOCAL:default extra workers for 'setup_pipeline' [0.0000 hive_load remaining]
>      Scheduler is going to submit 1 x LOCAL:default workers
>      Submitting 1 workers (rc_name=default) to LOCAL/ris-lx10
>      SUBMITTING_CMD:         runWorker.pl -url '$DBURL/leory2_peaks_lel_sus_scrofa_funcgen_72_102' -rc_name default &
>      hive 0.000% complete (< 0.000 CPU_hrs) (3 todo + 0 done + 0 failed = 3 total)
>      The Beekeeper has stopped because the number of loops was limited by 1 and this limit expired
>      dbc 0 disconnect cycles
>      Queen picked analysis with dbID=1 for the worker
>      Worker: meadow=LOCAL/ris-lx10, process=35314 at ris-lx10.roslin.ed.ac.uk, resource_class_id=1, last_check_in=2014-05-19 11:00:52, analysis=setup_pipeline(1)
>              batch_size = 1
>              life_span  = 3600
>              worker_log_dir = STDOUT/STDERR
>      Setting name at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl/modules/Bio/EnsEMBL/Utils/ConfigRegistry.pm line 344.
>      :: Auto-selecting build 102 core DB as: anonymous at sus_scrofa_core_75_102:ensembldb.ensembl.org:5306
>      ParamWarning: value for param('set_name') is used before having been initialized!
>      ParamWarning: value for param('group') is used before having been initialized!
>      ParamWarning: value for param('input_dir') is used before having been initialized!
>      ParamWarning: value for param('data_file') is used before having been initialized!
> 
>      ------------------ DEPRECATED ---------------------
>      Deprecated method call in file /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/RunnableDB/SetupPeaksPipeline.pm line 41.
>      Method Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor::fetch_group_details is deprecated.
>      Please use ExperimentalGroupAdaptor
>      Ensembl API version = 72
>      ---------------------------------------------------
>      Preprocess cmd: gzip -dc /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz | grep -vE '^[^[:space:]]+[[:blank:]][^[:space:]]+[[:blank:]][^[:space:]]+:[^[:space:]]+:MT:' | grep -v '^MT' | grep -v '^chrM' | /groups2/avian_genomes/software/bin/ensembl-funcgen/samtools view -uSh -t /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/sam_header/sus_scrofa/sus_scrofa_male_Sscrofa10.2_unmasked.fasta.fai -F 4 - | /groups2/avian_genomes/software/bin/ensembl-funcgen/samtools sort - /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz_tmp ; /groups2/avian_genomes/software/bin/ensembl-funcgen/samtools rmdup -s /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz_tmp.bam - | /groups2/avian_genomes/software/bin/ensembl-funcgen/samtools view -h - | gzip -c > /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/output/lel_sus_scrofa_funcgen_72_102/peaks/results/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz ; rm -f /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz_tmp.bam
> 
>      Storing new InputSet:   piPSC_H3K27me3_Xiao
>      DBD::mysql::st execute failed: Column 'replicate' cannot be null at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/InputSetAdaptor.pm line 380.
> 
>      job 1 : died in status 'RUN' for the following reason: DBD::mysql::st execute failed: Column 'replicate' cannot be null at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/InputSetAdaptor.pm line 380.
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/