[ensembl-dev] Building on Ensembl
Reece Hart
reece at harts.net
Wed Mar 9 04:28:45 GMT 2011
Greetings Ensembl devs-
Our small San Francisco-area startup is planning to use Ensembl as a basis
for genome variation analysis tools. I'd appreciate some advice from the
community about how to build on top of Ensembl core and variation in a way
that enables future updates.
We anticipate extending Ensembl core and variation (at least) in two
ways: adding *content of the same types* as those already in Ensembl, such
as new variation_annotation or phenotype records, as well as *adding new
types of content* (new tables), such as internal data for genotype-phenotype
associations. That is, we will make both DML and DDL changes to an Ensembl
release and we'd like to make it as easy as possible to transfer these
changes to new releases.
Two specific questions:
*1) Do you have any advice about "layering" our content (DML changes) to
facilitate Ensembl updates?*
**I'm open to any idea here. We considered many approaches, but the two best
are:
- Approach 1: For each Ensembl table, create a companion table with the
same name in another schema. The companion table contains a foreign key to
its Ensembl table and a hash of the Ensembl table's row in some canonical
format. On upgrade, we compare keys that are unique to the Ensembl table,
unique to the checksum table, or shared, in which case we compare hashes.
- Approach 2: Create a parallel schema with empty tables that will
contain in-house data. Then, unify the Ensembl and in-house data in a third
schema that contains UNION ALL views. For example, "CREATE VIEW
merged.variation AS SELECT * FROM homo_sapiens_variation_61_37f.variation
UNION ALL SELECT * FROM inhouse.variation".
*2) Which primary keys, if any, are stable across Ensembl releases?*
Our DDL changes will involve only new tables (not dropped tables). Those
tables will contain foreign keys to primary keys within Ensembl. If Ensembl
primary keys are stable across releases, then adding tables and transferring
them to new releases is mostly straightforward. If the primary keys are not
stable, I need to go back to the drawing board.
I've already combed through docs and search results for the above questions.
Apologies if I overlooked existing information. Thanks for comments.
-Reece
--
Reece Hart, Ph.D.
Locus Development
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110308/6b3b14d8/attachment.html>
More information about the Dev
mailing list