[ensembl-dev] Database timeouts for Ensembl API script

Kieron Taylor ktaylor at ebi.ac.uk
Mon Jan 14 11:27:43 GMT 2019


Hi Simon,

True long-running scripts that revisit a database can fall afoul of MySQL 8 hour timeouts. We cannot remove this constraint on our public database servers for reasons of fair use, but if it's your own database it can be altered by a DBA. The Ensembl API offers the set_reconnect_when_lost() function [1] which operates on all connected adaptors, and a per-connection function reconnect_when_lost() [2]

In general it is better if your code knows when it can disconnect and does so manually for long running jobs which only occasionally make DB requests.

Looking at your script, I would say that you are being too iterative, and you should start considering bulk fetching of specific data so that you don't need to go back to the database so often. It makes the code a bit more complex, and uses plenty more RAM, but it usually makes a large increase in performance. As a stopgap, you could also trigger a reconnection to the database between chromosomes to reset your connection timer.

Hopefully one of these approaches will help you either speed up your script or allow it to keep running.

Regards,


Kieron


Kieron Taylor PhD.
Ensembl Developer

EMBL, European Bioinformatics Institute

[1] - http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Registry.html#ac77cb4f710542fe0d53f1c8f09db5c4d
[2] - http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1DBConnection.html#a131d35ef79d2d8ee1add8084baaeb1e8

> On 8 Jan 2019, at 11:33, Simon Andrews <simon.andrews at babraham.ac.uk> wrote:
> 
> A script we’ve been using for years has started to become flaky very recently, with repeated losses of connections to the back end database.
>  
> Eg:
>  
> DBD::mysql::st execute failed: Lost connection to MySQL server during query at /home/andrewss/EnsemblAPI/ensembl/modules/Bio/EnsEMBL/DBSQL/DBEntryAdaptor.pm line 109, <STDIN> line 1.
> DBD::mysql::st execute failed: MySQL server has gone away at /home/andrewss/EnsemblAPI/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm line 481, <IN> line 1243757.
>  
> The script runs for a pretty long time (a few hours) and just iterates through every chromosome / gene / transcript / CDS in a target genome.  We’ve not seen these types of timeout before.
>  
> The failures are not consistent - I’ve had the same chromosome fail a couple of times, and then work on a subsequent attempt.  Is there anything we can do from the client side to keep these connections alive (or reconnect?), or are there any known issues at the moment which might be affecting the stability of the database?
>  
> If it helps, the script I’m running can be seen at: 
> https://github.com/s-andrews/SeqMonk/blob/master/Scripts/export_annotated_embl_from_assembly.pl
>  
> Cheers
>  
> Simon.
>  
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
> The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at:www.babraham.ac.uk
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list