[ensembl-dev] dust-ing

Fri Aug 30 10:23:03 BST 2013

Hi Bronwen,

>> The cvs_checkout/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/Config/BatchQueue.pm.example file does not contain configuration section for logic_name 'dust'. Can someone possibly give me the configuration corresponding to dust? (Grid engine parameters would be a bonus.)
> I have added this to the example file:
>
>     {
>      # this example uses the new 'memory' options which is an alternative to specifying memory
>      # in the resource requirements. Each time a job is retried, the next element in the memory array will be used
>        logic_name     => 'dust',
>        batch_size     => 500, # calculate as approx. num toplevel slice / 20
>        memory   => ['700MB', '1500MB'],
>        rerty_batch_size     => 1, # assuming there are only a few, eg. less than 10 jobs
>        retries         => 3,
>      },
Thanks for the example, I managed to set up the pipeline based on this.
>> If not can someone possibly print out the detailed help from tcdust and e-mail it back, if such a help exists for tcdust, to understand the various parameters the program accepts?
>>
>>  From Dust.pm I assume the output goes to STDOUT and has the format of START..END - where START/END is the start and end coordinates of the low-complexity region - is this correct?
> Yes, that sounds right, after looking here
> ensembl-analysis/modules/Bio/EnsEMBL/Analysis/Runnable/Dust.pm
> in the parse_results method.
Dustmasker from the blast++ package (v.2.2.28) run OK, once I changed 
the parsing from START..END format used by tcdust to START - END which 
is returned by dustmasker.

Cheers,
Lel

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.