[ensembl-dev] too many jobs for my PairAlignement...

Zhang Di aureliano.jz at gmail.com
Mon Oct 31 06:29:43 GMT 2011


Hi,
  As described previously, I'm trying to run the low coverage annotation
pipeline for our Illumina GAII sequenced fish genome (~800m).
  The doc low_coverage_gen_build.txt tells me to prepare my own compara db,
so I go to encembl-compara.
  For my fish genome, I have ~75k scaffolds (length >= 200bp, N50 ~1M),
among which 2600 scaffolds are longer than 1000bp. my ref genome is
stickleback, and I followed the README-pairaligner doc.
  As the ref genome has ~2000 chunks (size 1M), there will be 2000 X 75000
= 150M pairaligner jobs. too many to run in my institute.
  here are my questions:
  1. should I only use these scaffolds longer than 1000bp?
  2. am I followed the right doc? Which doc should I read to produce such a
alignment that: 'each bp in the target genome should be represented
  at most once' (cited from low_coverage_gene_build.txt). I don't quite
understand the README-2xalignment and README-low-coverage-genome-aligner.

Thank you

Best reguards

-- 
Zhang Di
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20111031/f8912884/attachment.html>


More information about the Dev mailing list