[ensembl-dev] MySql database setting and configuration.

Thu Sep 10 17:05:58 BST 2020

Hi

Using a cache is the most efficient way to use VEP. However some VEP 
options require a connection to a database e.g. to look up the position 
when using an identifier (e.g. a dbSNP rsid) or to look up reference 
feature locations.

The Ensembl REST VEP endpoint /vep/human/hgvs should work with a core 
database and an empty variation database provided none of the options 
used require  information from the variation database that is not in the 
cache.

The Ensembl REST VEP endpoint /vep/human/id that uses an rsid, does 
require a variation database.

As the number of variants for human have increased loading the MySQL 
database is not trivial. The schema provided uses MyISAM storage engine 
and this is the storage engine on the Ensembl public MySQL Servers.

Did the load using the MyISAM storage engine fail on data import or on 
the index creation?
Is there a particular file/table that caused problems?  I suggest maybe 
trying to load the file in parts.

Regards
Helen

On 08/09/2020 01:19, Sage Hornung wrote:
>
> Hi,
>
> I have asked a few questions recently about setting up an ensemble 
> rest and an ensemble database. I appreciate the all the help.
>
> I was able to get all the Ensembl data imported after a couple weeks 
> but unfortunately its already corrupted one or more of my InnoDB tables.
>
> I previously asked I am not opposed to using the cached data and I 
> ended up adding it but one of the rest endpoints supposedly ignores 
> and bypasses the cache. This one in particular 
> https://grch37.rest.ensembl.org/vep/human/hgvs 
> <https://grch37.rest.ensembl.org/vep/human/hgvs>. We are only using a 
> few endpoints and the whole point of this was to speed some of these 
> requests up for ourselves.
>
> Our MySQL database server has 14GB ram and 4 cores and 2TB of storage. 
> I followed the directions on the Ensembl website for importing the 
> data but the only way I was able to get the larger tables to import 
> was changing the storage engine to InnoDB. Im not sure if this was the 
> best choice because its seems the MyISAM engine is better for read 
> data. But this is the only way I was able to get it to successfully  
> import the data.
>
> I guess my question is was this a good idea? Is there any specific 
> settings I should be using when configuring MySQL? I have used a few 
> tools online and most say my settings are good/ok doe my 
> configuration. The homo_sapiens_variation_100_37 is the only one 
> that’s giving me issues and only with the tables greater than 25GB. 
> Some of these tables are massive I am looking for any tips or 
> suggestions when dealing with them.
>
> Thank You,
>
> Sage Hornung
>
> *Sage Hornung*
>
> /Software Engineer /
>
> *NeoGenomics Laboratories, Inc.*
>
> 2131 Faraday Avenue, Carlsbad, CA 92008
>
> *Phone: *760.516.5114
>
> *Cell: *760.755.3930
>
> sage.hornung at neogenomics.com <mailto:sage.hornung at neogenomics.com>
>
> neogenomics.com <http://neogenomics.com/>
>
> cid:image001.png at 01D27B05.09A580E0 <http://neogenomics.com/>
>
> cid:image002.png at 01D2789A.338CAE00 
> <https://www.linkedin.com/company/neogenomics-laboratories/>cid:image003.png at 01D2789A.338CAE00 
> <https://twitter.com/NeoGenomics>cid:image004.png at 01D2789A.338CAE00 
> <https://www.youtube.com/channel/UCBa0vLyn1ZO7tW-RyJOrxPA>
>
> This communication and its attachments contain confidential 
> information and is intended only for the named addressee. If you are 
> not the named addressee you should not disseminate, distribute or copy 
> this communication. Please notify the sender immediately if you have 
> received this communication by mistake and delete or destroy this 
> communication. Communications cannot be guaranteed to be secured or 
> error-free as information could be intercepted, corrupted, lost, 
> destroyed, arrive late or incomplete, or contain viruses. The sender 
> therefore does not accept liability for any errors or omissions in the 
> contents of this communication which arise as a result of 
> transmission. If verification is required please request a hard-copy 
> version. NeoGenomics Laboratories, 12701 Commonwealth Dr, Fort Myers, 
> FL 33913, http://www.neogenomics.com (2020)
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200910/69d658ec/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 2709 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200910/69d658ec/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 508 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200910/69d658ec/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 523 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200910/69d658ec/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 569 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200910/69d658ec/attachment-0003.png>