[ensembl-dev] Problem importing huge file into MySQL

Andy Yates ayates at ebi.ac.uk
Wed Apr 11 11:31:48 BST 2012

Hi Toni,

One other possible source of issues is if you have not used the correct syntax for LOAD DATA INFILE. Due to binary fields & other inline control characters (which MySQL could interpret as field/record separators) we always dump our files with escapes. To load this data correctly you should use the following syntax:


I've seen in the past that sometimes indexes fail to be created because of a broken load creating erroneous fields which increase the complexity of the index. You shouldn't need to disable keys explicitly as MySQL will automatically disable any non-unique key when a LOAD DATA INFILE command is issued against an empty table instead waiting to run this in a batch job later on. There is no way to disable the unique key building during loading so you'll have to take that hit.

If this is still causing an issue then increase your myisam_sort_buffer_size as large as it can go. This should be as large as the largest index in the table so long as there is sufficient RAM. How big did you make this buffer?

All the best,


Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 11 Apr 2012, at 10:41, Toni Hermoso Pulido wrote:

> Hello,
> I'm trying to import into a MySQL DB an ENSEMBL file (concretely, homo_sapiens_variation_66_37 allele.txt.gz ~ 20 GB) using MySQL LOAD DATA INFILE.
> However, it always gets stuck when the index file is around 5 GB. At that time, the process status is 'Repair with keycache'.
> After discovering this file:
> ftp://ftp.ensembl.org/pub/ensembl-functgenomics/scripts/DAS/load_bed_source.pl 
> I've been trying different values of myisam_sort_buffer_size and myisam_max_sort_file_size parameters with no luck.
> MySQL version is 5.1.45.
> Has anyone experienced problem like this?
> Thanks in advance,
> -- 
> Toni Hermoso Pulido
> Bioinformatician
> Bioinformatics Core Facility
> http://biocore.crg.cat
> CRG - Centre de Regulació Genòmica (Room 439)
> Parc de Recerca Biomèdica de Barcelona (PRBB)
> Doctor Aiguader, 88
> 08003 Barcelona
> toni.hermoso at crg.cat
> Phone: +34 93 316 02 02
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

More information about the Dev mailing list