[ensembl-dev] Setting up an 'EnsEMBL node'
Marc Hoeppner
mphoeppner at gmail.com
Tue Apr 23 13:32:04 BST 2013
Dear List (again),
I figured that this is probably the best place to ask this - so here goes:
I am currently in the process of establishing an annotation resource for
genome projects in Sweden and am very much determined to set this up
using EnsEMBL code/infrastructure (gene build pipeline, databases,
possibly website). I have seen a couple of such 'associated' non-EnsEMBL
projects (Neanderthal, Gramene, etc) and it just makes sense to me to go
this route. However, since this is obviously a somewhat ambitious
endeavor, it would be amazingly helpful to pick someones' brain
regarding some of the technical details (cluster architecture, personnel
requirements, additional information on the gene build pipeline and how
to sensibly adopt it to certain organismal groups). My feeling is that
this would not be best discussed on the mailing list, however.
I am considering to apply for the 'Geek for a week' program, but it
would perhaps be better to get some of the questions out of the way via
Email first (also not sure if such 'infrastructural' applications would
have a good chance to get supported). So if there is anyone around that
could spare a bit of time, I would be very grateful!
Cheers,
Marc
PS: Some questions include
a) Could Lava (openlava.org) be used to emulate the LSF architecture
used at Sanger, or would it be advisable to write a new adapter to make
the pipeline work on other systems?
b) What is a sensible number of CPUs to set aside for the annotation of
a vertebrate-size genome (assuming we don't want to wait forever for it
to finish)? 200?
c) Merits of different cluster setups (many smaller nods vs few large
nodes etc)
c) Are there any 'internal' best practices regarding the annotation of
non-vertebrates (e.g. plants)? Do the Gramene guys use the EnsEMBL gene
build?
d) What would be a minimum group size to be able to analyze 2-3 animal
genomes in parallel using the EnsEMBL infrastructure (setting up the
analysis, running the code, QA etc)
e) Any legal issues I need to be aware of? We currently consider taking
some money for the service to cover some of our costs (part of the
salaries, basically), but this would not be on a commercial scale or
profit oriented at all.
More information about the Dev
mailing list