[ensembl-dev] Ensembl MySQL Database Engine

Chuck Lynch chuck.lynch at oracle.com
Mon Nov 13 23:15:42 GMT 2017


Thanks Andy.

A response from one of our MySQL experts below --

CL

-----Original Message-----
From: Jesper Wisborg Krogh 
Sent: Monday, November 13, 2017 6:05 PM
To: Chuck Lynch
Subject: Re: [ensembl-dev] Ensembl MySQL Database Engine

Hi Chuck,

> MyISAM allows the storage of annotations in the same order as the covering index (seq_region_id, seq_region_start) without altering primary keys (an auto-increment).

That is correct, MyISAM allows you to change the order the rows are stored whereas InnoDB always store the rows in primary key order. 
However, if they are using a covering index, then I am not sure how much of a difference that does as the underlying row will never be accessed.

And InnoDB also has advantages as it can cache both indexes and data (whereas MyISAM can only cache indexes). For queries against secondary indexes, InnoDB also has the adaptive hash index that can speed up queries.

If they have a test case that demonstrates MyISAM being significantly quicker than InnoDB, then I suspect product management/development would like to know, so they can look at closing that gap.

Cheers,
Jesper
http://mysql.wisborg.dk/ 


On 14/11/2017 7:48 AM, Chuck Lynch wrote:
> Hi Jesper,
>
> Does this seem right to you (see below)?
>
> Thanks in advance.
>
> Chuck

-----Original Message-----
From: Andy Yates [mailto:ayates at ebi.ac.uk] 
Sent: Tuesday, November 7, 2017 6:31 AM
To: Ensembl developers list
Subject: Re: [ensembl-dev] Ensembl MySQL Database Engine

Hi Chuck

As Anne and Alessandro have eluded to there are a number of operational reasons why we have not adopted InnoDB as our default storage engine. A number of our production pipelines do use InnoDB to ensure data integrity and enable transactional rollbacks. They later convert into MyISAM primarily for the reason Anne cited; the operational ability to quickly move large numbers of tables between servers around the world. 

Using MyISAM also gives us a number of performance optimisations around querying for genomic annotation. This is covered in our recent paper Ruffier et al. (section "Efficient feature searching") [1]. MyISAM allows the storage of annotations in the same order as the covering index (seq_region_id, seq_region_start) without altering primary keys (an auto-increment). This minimises the number of disk seeks required to return contiguous data for a genomic region. InnoDB's primary storage method, to my knowledge, is built around the B-tree data structure. The primary key for a record influences data storage. The describe optimisation would not be possible without large scale primary key identifier re-writes or a re-structuring of how we assign primary keys into InnoDB tables. Both of which are larger pieces of engineering.

Should you convert any Ensembl database into InnoDB with this optimisation applied you should be aware of the potential for performance degradation, which is subject to the resources you have put behind your MySQL server. 

All the best,

Andy

1 - https://urldefense.proofpoint.com/v2/url?u=http-3A__europepmc.org_articles_PMC5467575&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=YX0kAMk2drCjzCtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDyik5BJKg-hKidUcFQNPSQFgQWnBZco0&s=l-Oeb-EuDyAk_FCHlPIhA3P2IAZnLjb7pRzKYHRDJJ0&e=

------------
Andrew Yates - Genomics Technology Infrastructure Team Leader The European Bioinformatics Institute (EMBL-EBI) Wellcome Genome Campus Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andy.yates.ebi
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ebi.ac.uk_&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=YX0kAMk2drCjzCtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDyik5BJKg-hKidUcFQNPSQFgQWnBZco0&s=b-nexufp3JGcy0Loafpyt81JtUY1LhBqJW7ZJ6uebPw&e=
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.org_&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=YX0kAMk2drCjzCtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDyik5BJKg-hKidUcFQNPSQFgQWnBZco0&s=xUNhkkxv70sj3ymRbBe77Y5cyoJK4SL6DqW8bx2b5eo&e=

> On 7 Nov 2017, at 09:59, Anne Lyle <annelyle at ebi.ac.uk> wrote:
> 
> Hi Chuck
> 
> Just to add some context:
> 
> Historically we’ve been reliant on MyISAM for the website, because we need to copy a massive amount of data to our overseas mirrors and InnoDB does not make that efficient or easy. However technologies change, so as Alessandro says we are looking into it.
> 
> Cheers
> 
> Anne
> 
> 
> 
> Anne Lyle
> Ensembl Web Developer
> EMBL-EBI
> 
> 
> 
>> On 7 Nov 2017, at 09:53, Alessandro Vullo <avullo at ebi.ac.uk> wrote:
>> 
>> Hi Chuck,
>> 
>> yes, we're actually considering switching to InnoDB and are in the process of testing the migration to this engine. Unfortunately, we cannot anticipate when exactly the move will occur, it'll be some time next year.
>> 
>> Hope that helps,
>> 
>> Alessandro
>> 
>> On 06/11/17 21:01, Chuck Lynch wrote:
>>> Hi Dev Team,
>>> I’m working with a client in the Boston area that has run into some problems using the Ensembl MySQL MyISAM database Engine.  Have you considered using the InnoDB engine for Ensembl?  MyISAM is older technology and will likely be replaced by InnoDB in the future.  It would save us a lot of time by obviating the need to convert to the InnoDB engine.
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ensembl.org
>> _mailman_listinfo_dev&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65e
>> apI_JnE&r=YX0kAMk2drCjzCtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDy
>> ik5BJKg-hKidUcFQNPSQFgQWnBZco0&s=rWB1W-BbrMn1kQb0y3g4Kf0PXcW9A99wUqsI
>> Ni-J7VQ&e= Ensembl Blog: 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_
>> &d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=YX0kAMk2drC
>> jzCtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDyik5BJKg-hKidUcFQNPSQF
>> gQWnBZco0&s=JNoZEWowUbW5jXLTKHQw2MYxPPVBU829QfS890FLWFc&e=
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ensembl.org_
> mailman_listinfo_dev&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eap
> I_JnE&r=YX0kAMk2drCjzCtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDyik5
> BJKg-hKidUcFQNPSQFgQWnBZco0&s=rWB1W-BbrMn1kQb0y3g4Kf0PXcW9A99wUqsINi-J
> 7VQ&e= Ensembl Blog: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&
> d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=YX0kAMk2drCjz
> CtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDyik5BJKg-hKidUcFQNPSQFgQW
> nBZco0&s=JNoZEWowUbW5jXLTKHQw2MYxPPVBU829QfS890FLWFc&e=

_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ensembl.org_mailman_listinfo_dev&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=YX0kAMk2drCjzCtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDyik5BJKg-hKidUcFQNPSQFgQWnBZco0&s=rWB1W-BbrMn1kQb0y3g4Kf0PXcW9A99wUqsINi-J7VQ&e=
Ensembl Blog: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=YX0kAMk2drCjzCtidXbkoTthZX500NGH8FGbKdexXA4&m=Q1Eers0Q34rDyik5BJKg-hKidUcFQNPSQFgQWnBZco0&s=JNoZEWowUbW5jXLTKHQw2MYxPPVBU829QfS890FLWFc&e=



More information about the Dev mailing list