[ensembl-dev] Suggestions re: Ensembl core schema
Youens-Clark, Ken
kclark at cshl.edu
Tue Oct 2 21:47:41 BST 2012
I'm working to create DBIx::Class modules for the Ensembl core schema, and I'd like to respectfully submit the following suggestions that would make the schema work better with automatic schema generators like DBIx::Class::Schema::Loader.
- The following tables have no primary key definitions:
alt_allele *
analysis_description
assembly
gene_archive
gene_attrib
interpro
meta_coord
misc_attrib
ontology_xref
operon_transcript_gene
qtl_feature
seq_region_attrib
seq_region_mapping
stable_id_event
supporting_feature
transcript_attrib
transcript_supporting_feature
translation_attrib
unconventional_transcript_association
It would be non-destructive/backwards-compatible to simply add a "${table}_id" primary key to these, e.g., "analysis_description_id." The table "alt_allele" actually has an "alt_allele_id" defined as "integer(10) unsigned NOT NULL auto_increment" (typical PK def) but doesn't take the extra step to say "PRIMARY KEY (alt_allele_id)."
- In order to create the DBIx::Class modules, I first had to create an InnoDB version of the core schema which highlighted some problems with the various INTEGER types and sizes used throughout the schema. MySQL refuses to create foreign keys unless both keys are exactly the same. For my purposes of generating the classes, I simply changed all integer fields to "integer unsigned." Here are the variations currently used:
[brie@/usr/local/ensembl-live/ensembl/sql]$ grep -i int table.sql | grep -v @ | grep -i sign | sed "s/,//" | awk '{print $2 " " $3}' | sort | uniq -c
4 INT UNSIGNED
183 INT(10) UNSIGNED
6 INTEGER UNSIGNED
25 SMALLINT UNSIGNED
11 SMALLINT(5) UNSIGNED
1 int(11) unsigned
1 smallint(6) UNSIGNED
It would be easiest to just move to "integer unsigned" for everything.
--
Ken Youens-Clark
kclark at cshl.edu
Ware Lab/CSHL/USDA-ARS
More information about the Dev
mailing list