[ensembl-dev] Suggestions re: Ensembl core schema

Youens-Clark, Ken kclark at cshl.edu
Tue Oct 2 21:47:41 BST 2012


I'm working to create DBIx::Class modules for the Ensembl core schema, and I'd like to respectfully submit the following suggestions that would make the schema work better with automatic schema generators like DBIx::Class::Schema::Loader.

- The following tables have no primary key definitions:

    alt_allele *
    analysis_description
    assembly
    gene_archive
    gene_attrib
    interpro
    meta_coord
    misc_attrib
    ontology_xref
    operon_transcript_gene
    qtl_feature
    seq_region_attrib
    seq_region_mapping
    stable_id_event
    supporting_feature
    transcript_attrib
    transcript_supporting_feature
    translation_attrib
    unconventional_transcript_association

It would be non-destructive/backwards-compatible to simply add a "${table}_id" primary key to these, e.g., "analysis_description_id."  The table "alt_allele" actually has an "alt_allele_id" defined as "integer(10) unsigned NOT NULL auto_increment" (typical PK def) but doesn't take the extra step to say "PRIMARY KEY (alt_allele_id)." 

- In order to create the DBIx::Class modules, I first had to create an InnoDB version of the core schema which highlighted some problems with the various INTEGER types and sizes used throughout the schema.  MySQL refuses to create foreign keys unless both keys are exactly the same.  For my purposes of generating the classes, I simply changed all integer fields to "integer unsigned."  Here are the variations currently used:

[brie@/usr/local/ensembl-live/ensembl/sql]$ grep -i int table.sql | grep -v @ | grep -i sign | sed "s/,//" | awk '{print $2 " " $3}' | sort | uniq -c
      4 INT UNSIGNED
    183 INT(10) UNSIGNED
      6 INTEGER UNSIGNED
     25 SMALLINT UNSIGNED
     11 SMALLINT(5) UNSIGNED
      1 int(11) unsigned
      1 smallint(6) UNSIGNED

It would be easiest to just move to "integer unsigned" for everything.

--
Ken Youens-Clark
kclark at cshl.edu
Ware Lab/CSHL/USDA-ARS





More information about the Dev mailing list