[ensembl-dev] dust-ing

Lel Eory lel.eory at ed.ac.uk
Fri Aug 30 10:23:03 BST 2013

Hi Bronwen,

>> The cvs_checkout/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/Config/BatchQueue.pm.example file does not contain configuration section for logic_name 'dust'. Can someone possibly give me the configuration corresponding to dust? (Grid engine parameters would be a bonus.)
> I have added this to the example file:
>     {
>      # this example uses the new 'memory' options which is an alternative to specifying memory
>      # in the resource requirements. Each time a job is retried, the next element in the memory array will be used
>        logic_name     => 'dust',
>        batch_size     => 500, # calculate as approx. num toplevel slice / 20
>        memory   => ['700MB', '1500MB'],
>        rerty_batch_size     => 1, # assuming there are only a few, eg. less than 10 jobs
>        retries         => 3,
>      },
Thanks for the example, I managed to set up the pipeline based on this.
>> If not can someone possibly print out the detailed help from tcdust and e-mail it back, if such a help exists for tcdust, to understand the various parameters the program accepts?
>>  From Dust.pm I assume the output goes to STDOUT and has the format of START..END - where START/END is the start and end coordinates of the low-complexity region - is this correct?
> Yes, that sounds right, after looking here
> ensembl-analysis/modules/Bio/EnsEMBL/Analysis/Runnable/Dust.pm
> in the parse_results method.
Dustmasker from the blast++ package (v.2.2.28) run OK, once I changed 
the parsing from START..END format used by tcdust to START - END which 
is returned by dustmasker.


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

More information about the Dev mailing list