[ensembl-dev] Setting up an 'EnsEMBL node'

Marc Hoeppner mphoeppner at gmail.com
Tue Apr 23 13:32:04 BST 2013


Dear List (again),

I figured that this is probably the best place to ask this - so here goes:

I am currently in the process of establishing an annotation resource for 
genome projects in Sweden and am very much determined to set this up 
using EnsEMBL code/infrastructure (gene build pipeline, databases, 
possibly website). I have seen a couple of such 'associated' non-EnsEMBL 
projects (Neanderthal, Gramene, etc) and it just makes sense to me to go 
this route. However, since this is obviously a somewhat ambitious 
endeavor, it would be amazingly helpful to pick someones' brain 
regarding some of the technical details (cluster architecture, personnel 
requirements, additional information on the gene build pipeline and how 
to sensibly adopt it to certain organismal groups). My feeling is that 
this would not be best discussed on the mailing list, however.

I am considering to apply for the 'Geek for a week' program, but it 
would perhaps be better to get some of the questions out of the way via 
Email first (also not sure if such 'infrastructural' applications would 
have a good chance to get supported). So if there is anyone around that 
could spare a bit of time, I would be very grateful!

Cheers,

Marc

PS: Some questions include

a) Could Lava (openlava.org) be used to emulate the LSF architecture 
used at Sanger, or would it be advisable to write a new adapter to make 
the pipeline work on other systems?
b) What is a sensible number of CPUs to set aside for the annotation of 
a vertebrate-size genome (assuming we don't want to wait forever for it 
to finish)? 200?
c) Merits of different cluster setups (many smaller nods vs few large 
nodes etc)
c) Are there any 'internal' best practices regarding the annotation of 
non-vertebrates (e.g. plants)? Do the Gramene guys use the EnsEMBL gene 
build?
d) What would be a minimum group size to be able to analyze 2-3 animal 
genomes in parallel using the EnsEMBL infrastructure (setting up the 
analysis, running the code, QA etc)
e) Any legal issues I need to be aware of? We currently consider taking 
some money for the service to cover some of our costs (part of the 
salaries, basically), but this would not be on a commercial scale or 
profit oriented at all.





More information about the Dev mailing list