[ensembl-dev] Building on Ensembl

Reece Hart reece at harts.net
Wed Mar 9 04:28:45 GMT 2011


Greetings Ensembl devs-

Our small San Francisco-area startup is planning to use Ensembl as a basis
for genome variation analysis tools. I'd appreciate some advice from the
community about how to build on top of Ensembl core and variation in a way
that enables future updates.

We anticipate extending Ensembl core and variation (at least) in two
ways: adding *content of the same types* as those already in Ensembl, such
as new variation_annotation or phenotype records, as well as *adding new
types of content* (new tables), such as internal data for genotype-phenotype
associations. That is, we will make both DML and DDL changes to an Ensembl
release and we'd like to make it as easy as possible to transfer these
changes to new releases.

Two specific questions:
*1) Do you have any advice about "layering" our content (DML changes) to
facilitate Ensembl updates?*
**I'm open to any idea here. We considered many approaches, but the two best
are:

   - Approach 1: For each Ensembl table, create a companion table with the
   same name in another schema. The companion table contains a foreign key to
   its Ensembl table and a hash of the Ensembl table's row in some canonical
   format. On upgrade, we compare keys that are unique to the Ensembl table,
   unique to the checksum table, or shared, in which case we compare hashes.
   - Approach 2: Create a parallel schema with empty tables that will
   contain in-house data. Then, unify the Ensembl and in-house data in a third
   schema that contains UNION ALL views. For example, "CREATE VIEW
   merged.variation AS SELECT * FROM homo_sapiens_variation_61_37f.variation
   UNION ALL SELECT * FROM inhouse.variation".

*2) Which primary keys, if any, are stable across Ensembl releases?*
Our DDL changes will involve only new tables (not dropped tables). Those
tables will contain foreign keys to primary keys within Ensembl. If Ensembl
primary keys are stable across releases, then adding tables and transferring
them to new releases is mostly straightforward. If the primary keys are not
stable, I need to go back to the drawing board.

I've already combed through docs and search results for the above questions.
Apologies if I overlooked existing information. Thanks for comments.

-Reece

-- 
Reece Hart, Ph.D.
Locus Development
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110308/6b3b14d8/attachment.html>


More information about the Dev mailing list