[ensembl-dev] Ensembl MySQL Database Engine

Andy Yates ayates at ebi.ac.uk
Tue Nov 7 11:30:35 GMT 2017


Hi Chuck

As Anne and Alessandro have eluded to there are a number of operational reasons why we have not adopted InnoDB as our default storage engine. A number of our production pipelines do use InnoDB to ensure data integrity and enable transactional rollbacks. They later convert into MyISAM primarily for the reason Anne cited; the operational ability to quickly move large numbers of tables between servers around the world. 

Using MyISAM also gives us a number of performance optimisations around querying for genomic annotation. This is covered in our recent paper Ruffier et al. (section "Efficient feature searching") [1]. MyISAM allows the storage of annotations in the same order as the covering index (seq_region_id, seq_region_start) without altering primary keys (an auto-increment). This minimises the number of disk seeks required to return contiguous data for a genomic region. InnoDB's primary storage method, to my knowledge, is built around the B-tree data structure. The primary key for a record influences data storage. The describe optimisation would not be possible without large scale primary key identifier re-writes or a re-structuring of how we assign primary keys into InnoDB tables. Both of which are larger pieces of engineering.

Should you convert any Ensembl database into InnoDB with this optimisation applied you should be aware of the potential for performance degradation, which is subject to the resources you have put behind your MySQL server. 

All the best,

Andy

1 - http://europepmc.org/articles/PMC5467575

------------
Andrew Yates - Genomics Technology Infrastructure Team Leader
The European Bioinformatics Institute (EMBL-EBI)
Wellcome Genome Campus
Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andy.yates.ebi
http://www.ebi.ac.uk/
http://www.ensembl.org/

> On 7 Nov 2017, at 09:59, Anne Lyle <annelyle at ebi.ac.uk> wrote:
> 
> Hi Chuck
> 
> Just to add some context:
> 
> Historically we’ve been reliant on MyISAM for the website, because we need to copy a massive amount of data to our overseas mirrors and InnoDB does not make that efficient or easy. However technologies change, so as Alessandro says we are looking into it.
> 
> Cheers
> 
> Anne
> 
> 
> 
> Anne Lyle
> Ensembl Web Developer
> EMBL-EBI
> 
> 
> 
>> On 7 Nov 2017, at 09:53, Alessandro Vullo <avullo at ebi.ac.uk> wrote:
>> 
>> Hi Chuck,
>> 
>> yes, we're actually considering switching to InnoDB and are in the process of testing the migration to this engine. Unfortunately, we cannot anticipate when exactly the move will occur, it'll be some time next year.
>> 
>> Hope that helps,
>> 
>> Alessandro
>> 
>> On 06/11/17 21:01, Chuck Lynch wrote:
>>> Hi Dev Team,
>>> I’m working with a client in the Boston area that has run into some problems using the Ensembl MySQL MyISAM database Engine.  Have you considered using the InnoDB engine for Ensembl?  MyISAM is older technology and will likely be replaced by InnoDB in the future.  It would save us a lot of time by obviating the need to convert to the InnoDB engine.
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list