[ensembl-dev] Problem importing huge file into MySQL

Dan Sheppard ds23 at sanger.ac.uk
Wed Apr 11 11:17:33 BST 2012


Hello Toni,

I've not seen this with your data size or on that schema, but have seen 
it on a very different, large database.

I think you've come to the right diagnosis: MySQL reverts to "repair 
with keycache" rather than "repair by sorting" when sorting is not 
possible due to insufficient disk space to undertake an external merge 
sort in its temporary filesystem. (In this case "repair" arguably being 
a misnomer for rebuilding indexes following changes to a table).

The difference in performance between the two techniques is so bad that 
if your data is so big that repair by sorting has insufficient disk 
space, repair with keycache is unlikely to ever succeed.

MySQL's can be a bit mysterious about reasons for its decisions in this 
area but it might be worth checking the server logs to see if it has, in 
your case, been forthcoming.

One option includes myisam_max_sort_file_size being too small, as you've 
tried. The files generated during sorting can be many multiples of the 
data size, so it's worth ramping this parameter way-way up (eg ten or a 
hundred times as big as the corresponding myd file). MySQL almost never 
uses that amount of disk, but won't proceed unless it's available: it 
gloomily assumes that all indexed fields will be of their maximum size 
in all cases, so the estimate for a usually short string indexed out to 
255 characters can be very pessimistic.

Another option is that the temporary filesystem which MySQL is planning 
to use is not actually large enough (tmpdir= in my.cnf) to contain the 
temporary files. If your tmpdir is set to /tmp and you're on a system 
which reveals via df that /tmp is a dinky tmpfs type affair it may well 
be worth changing it.

One possible solution if this all gets frustrating (which I've never 
tried, caveat emptor) is to prevent the repair step altogether by 
disabling keys and then using "myisamchk -n" which explicitly forces 
repair by sorting. If you're feeling adventurous, also take a look at 
the man page for the -p option.

Hope this helps,
Dan.

On 11/04/12 10:41, Toni Hermoso Pulido wrote:
> Hello,
>
> I'm trying to import into a MySQL DB an ENSEMBL file (concretely,
> homo_sapiens_variation_66_37 allele.txt.gz ~ 20 GB) using MySQL LOAD
> DATA INFILE.
> However, it always gets stuck when the index file is around 5 GB. At
> that time, the process status is 'Repair with keycache'.
>
> After discovering this file:
> ftp://ftp.ensembl.org/pub/ensembl-functgenomics/scripts/DAS/load_bed_source.pl
>
> I've been trying different values of myisam_sort_buffer_size and
> myisam_max_sort_file_size parameters with no luck.
>
> MySQL version is 5.1.45.
>
> Has anyone experienced problem like this?
>
> Thanks in advance,
>





More information about the Dev mailing list