[ensembl-dev] dust-ing

Bronwen Aken ba1 at sanger.ac.uk
Thu Aug 29 17:23:16 BST 2013


Hi Lel,


On 2 Aug 2013, at 09:50, Lel Eory <lel.eory at ed.ac.uk> wrote:

> Dear Developers,
> 
> I try to identify low complexity regions for some species by running the ensembl 'dust' analysis pipeline.
> 
Yes, we run Dust for all species.

> The cvs_checkout/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/Config/BatchQueue.pm.example file does not contain configuration section for logic_name 'dust'. Can someone possibly give me the configuration corresponding to dust? (Grid engine parameters would be a bonus.)

I have added this to the example file:

   {
    # this example uses the new 'memory' options which is an alternative to specifying memory
    # in the resource requirements. Each time a job is retried, the next element in the memory array will be used
      logic_name     => 'dust',
      batch_size     => 500, # calculate as approx. num toplevel slice / 20
      memory   => ['700MB', '1500MB'],
      rerty_batch_size     => 1, # assuming there are only a few, eg. less than 10 jobs
      retries         => 3,
    },

> 
> The cvs_checkout/ensembl-doc/pipeline_docs/the_raw_computes.txt file says that the source for dust is coming from the NCBI blast suit (see. module description, line 245) and the "Analysis conf" section name the program as tcdust (same file line 573), which is consistent with the analysis tables from the databases. Is tcdust available to download from somewhere? (The NCBI blast+ package only have dustmasker, but no tcdust.)

We used a modified version of dust called 'tcdust'. I will contact you privately once I have a way to transfer it to you.

> 
> If not can someone possibly print out the detailed help from tcdust and e-mail it back, if such a help exists for tcdust, to understand the various parameters the program accepts?
> 
> From Dust.pm I assume the output goes to STDOUT and has the format of START..END - where START/END is the start and end coordinates of the low-complexity region - is this correct?

Yes, that sounds right, after looking here 
ensembl-analysis/modules/Bio/EnsEMBL/Analysis/Runnable/Dust.pm
in the parse_results method.

Cheers,
Bronwen


> Many thanks,
> Lel
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list