[ensembl-dev] ensembl-funcgen SubmitPeaks error

Tue May 20 16:37:48 BST 2014

Hi Nathan,

In this case I have the numbered directories (.../1/...) similar to the 
example you gave e.g.:

$WD/fastq/sus_scrofa/Xiao/piPSC_H2AZ/1/SRR414964.fastq.gz

The only difference is that there is no '_1' appended to the name of the 
fastq files. E.g the name I have is SRR414964.fastq.gz instead 
SRR414964_1.fastq.gz. Do I need to append '_1'?

After I run the raw read alignment steps (AddAlignmentDataSets & 
SubmitAlignments) the alignments are stored by default in the alignment 
folder, but without the replication numbers in the folder names e.g.
$WD/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_H2AZ_Xiao.samse.gz. Is 
this correct? Does the code fetches the replication numbers from the 
$WD/fastq folder-names or from the $WD/alignment folder-names?
(I assume that the missing numbers from $WD/alignment is because the 
mapped datasets are already merged in the alignment folder?)

(Looking forward to try the new funcgen pipeline with release 76!)

Thanks,
Lel

On 05/20/2014 03:46 PM, njohnson wrote:
> Hi Lel
>
> In this version of the pipeline, the replicate definition was done by the subdirectory naming. IN the experiment directory, you would need to create numbered directories, each with the relevant fastq file in.  e.g.
>
> 	experiment_dir/1/replicate_1.fastq.gz
> 	experiment_dir/2/replicate_2.fastq.gz
>
> This might be your problem?
>
> We have since moved away from this in favour of a tracking DB, which is much richer in meta data. We are currently finishing development of this in parallel with an entirely new analysis pipeline which will be used for release 76. (I know you know this Lel, but for others out there)
>
>
> Nathan Johnson
>
> Ensembl Regulation
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
>
> http://www.ensembl.info/
> http://twitter.com/#!/ensembl
> https://www.facebook.com/Ensembl.org
>
> On 20 May 2014, at 15:03, Lel Eory <lel.eory at roslin.ed.ac.uk> wrote:
>
>> Dear All,
>>
>> I try to run some ChIP-seq analyses with the ensembl-functgenomics pipeline (version 72) using ensembl-hive (version lg4_pre_rel72_20130423).
>> In the efg sequencing environment I have added the peak datasets with AddPeakDataSets and try to run beekeeper.pl to set-up the peak analysis pipelines.
>> But the setup_pipeline step fails with the following (full beekeeper.pl output is at the end of this e-mail):
>>
>> beekeeper.pl -url $DBURL/leory2_peaks_lel_sus_scrofa_funcgen_72_102 -hive_log_dir hive_log_dir -run
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Storing new InputSet:   piPSC_H3K27me3_Xiao
>> DBD::mysql::st execute failed: Column 'replicate' cannot be null at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/InputSetAdaptor.pm line 380.
>>
>> job 1 : died in status 'RUN' for the following reason: DBD::mysql::st execute failed: Column 'replicate' cannot be null at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/InputSetAdaptor.pm line 380.
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Where should I specify the number of replicates (in this case it is just 1) for setup_pipeline within the peak analysis step or where does setup-pipeline get this value from?
>>
>> Thank you.
>>
>> Kind regards,
>> Lel
>>
>>
>>
>> beekeeper.pl -url $DBURL/leory2_peaks_lel_sus_scrofa_funcgen_72_102 -hive_log_dir hive_log_dir -run
>>
>> ~~~~~~~~~~~~~ beekeeper.pl output ~~~~~~~~~~~~~
>>
>>        ======= beekeeper loop ** 1 **==========
>>       GarbageCollector:       Checking for lost Workers...
>>       GarbageCollector:       [Queen:] we have 0 Workers alive.
>>       setup_pipeline             ( 1)     LOADING jobs(Sem:0, Rdy:3, InProg:0, Done+Pass:0, Fail:0)=3 Ave_msec:0, workers(Running:0, Reqired:0)   h.cap:1  a.cap:-  (sync'd 1400493652 sec ago)
>>       run_peaks                  ( 2)       EMPTY jobs(Sem:0, Rdy:0, InProg:0, Done+Pass:0, Fail:0)=0 Ave_msec:0, workers(Running:0, Reqired:0)   h.cap:10  a.cap:-  (sync'd 1400493652 sec ago)
>>       run_peaks_wide             ( 3)       EMPTY jobs(Sem:0, Rdy:0, InProg:0, Done+Pass:0, Fail:0)=0 Ave_msec:0, workers(Running:0, Reqired:0)   h.cap:10  a.cap:-  (sync'd 1400493652 sec ago)
>>       run_macs                   ( 4)       EMPTY jobs(Sem:0, Rdy:0, InProg:0, Done+Pass:0, Fail:0)=0 Ave_msec:0, workers(Running:0, Reqired:0)   h.cap:10  a.cap:-  (sync'd 1400493652 sec ago)
>>
>>         ===== Stats of live Workers according to the Queen: ======
>>                ======= TOTAL ======= : 0 workers
>>
>>       setup_pipeline             ( 1)       READY jobs(Sem:0, Rdy:3, InProg:0, Done+Pass:0, Fail:0)=3 Ave_msec:0, workers(Running:0, Reqired:3)   h.cap:1  a.cap:-  (sync'd 0 sec ago)
>>       Before checking the Valley for pending jobs, Scheduler allocated 1 x LOCAL:default extra workers for 'setup_pipeline' [0.0000 hive_load remaining]
>>       Scheduler is going to submit 1 x LOCAL:default workers
>>       Submitting 1 workers (rc_name=default) to LOCAL/ris-lx10
>>       SUBMITTING_CMD:         runWorker.pl -url '$DBURL/leory2_peaks_lel_sus_scrofa_funcgen_72_102' -rc_name default &
>>       hive 0.000% complete (< 0.000 CPU_hrs) (3 todo + 0 done + 0 failed = 3 total)
>>       The Beekeeper has stopped because the number of loops was limited by 1 and this limit expired
>>       dbc 0 disconnect cycles
>>       Queen picked analysis with dbID=1 for the worker
>>       Worker: meadow=LOCAL/ris-lx10, process=35314 at ris-lx10.roslin.ed.ac.uk, resource_class_id=1, last_check_in=2014-05-19 11:00:52, analysis=setup_pipeline(1)
>>               batch_size = 1
>>               life_span  = 3600
>>               worker_log_dir = STDOUT/STDERR
>>       Setting name at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl/modules/Bio/EnsEMBL/Utils/ConfigRegistry.pm line 344.
>>       :: Auto-selecting build 102 core DB as: anonymous at sus_scrofa_core_75_102:ensembldb.ensembl.org:5306
>>       ParamWarning: value for param('set_name') is used before having been initialized!
>>       ParamWarning: value for param('group') is used before having been initialized!
>>       ParamWarning: value for param('input_dir') is used before having been initialized!
>>       ParamWarning: value for param('data_file') is used before having been initialized!
>>
>>       ------------------ DEPRECATED ---------------------
>>       Deprecated method call in file /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/RunnableDB/SetupPeaksPipeline.pm line 41.
>>       Method Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor::fetch_group_details is deprecated.
>>       Please use ExperimentalGroupAdaptor
>>       Ensembl API version = 72
>>       ---------------------------------------------------
>>       Preprocess cmd: gzip -dc /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz | grep -vE '^[^[:space:]]+[[:blank:]][^[:space:]]+[[:blank:]][^[:space:]]+:[^[:space:]]+:MT:' | grep -v '^MT' | grep -v '^chrM' | /groups2/avian_genomes/software/bin/ensembl-funcgen/samtools view -uSh -t /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/sam_header/sus_scrofa/sus_scrofa_male_Sscrofa10.2_unmasked.fasta.fai -F 4 - | /groups2/avian_genomes/software/bin/ensembl-funcgen/samtools sort - /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz_tmp ; /groups2/avian_genomes/software/bin/ensembl-funcgen/samtools rmdup -s /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz_tmp.bam - | /groups2/avian_genomes/software/bin/ensembl-funcgen/samtools view -h - | gzip -c > /groups2/pig_projec!
>   t/ensembl_funcgen/xiao_chip_seq/output/lel_sus_scrofa_funcgen_72_102/peaks/results/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz ; rm -f /groups2/pig_project/ensembl_funcgen/xiao_chip_seq/alignments/sus_scrofa/Sscrofa10.2/Xiao/piPSC_Pig-IgG_Xiao.samse.sam.gz_tmp.bam
>>       Storing new InputSet:   piPSC_H3K27me3_Xiao
>>       DBD::mysql::st execute failed: Column 'replicate' cannot be null at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/InputSetAdaptor.pm line 380.
>>
>>       job 1 : died in status 'RUN' for the following reason: DBD::mysql::st execute failed: Column 'replicate' cannot be null at /groups2/avian_genomes/software/src/ensembl/ens72/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/InputSetAdaptor.pm line 380.
>>
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.