[ensembl-dev] Problem importing huge file into MySQL
Dan Sheppard
ds23 at sanger.ac.uk
Wed Apr 11 11:17:33 BST 2012
Hello Toni,
I've not seen this with your data size or on that schema, but have seen
it on a very different, large database.
I think you've come to the right diagnosis: MySQL reverts to "repair
with keycache" rather than "repair by sorting" when sorting is not
possible due to insufficient disk space to undertake an external merge
sort in its temporary filesystem. (In this case "repair" arguably being
a misnomer for rebuilding indexes following changes to a table).
The difference in performance between the two techniques is so bad that
if your data is so big that repair by sorting has insufficient disk
space, repair with keycache is unlikely to ever succeed.
MySQL's can be a bit mysterious about reasons for its decisions in this
area but it might be worth checking the server logs to see if it has, in
your case, been forthcoming.
One option includes myisam_max_sort_file_size being too small, as you've
tried. The files generated during sorting can be many multiples of the
data size, so it's worth ramping this parameter way-way up (eg ten or a
hundred times as big as the corresponding myd file). MySQL almost never
uses that amount of disk, but won't proceed unless it's available: it
gloomily assumes that all indexed fields will be of their maximum size
in all cases, so the estimate for a usually short string indexed out to
255 characters can be very pessimistic.
Another option is that the temporary filesystem which MySQL is planning
to use is not actually large enough (tmpdir= in my.cnf) to contain the
temporary files. If your tmpdir is set to /tmp and you're on a system
which reveals via df that /tmp is a dinky tmpfs type affair it may well
be worth changing it.
One possible solution if this all gets frustrating (which I've never
tried, caveat emptor) is to prevent the repair step altogether by
disabling keys and then using "myisamchk -n" which explicitly forces
repair by sorting. If you're feeling adventurous, also take a look at
the man page for the -p option.
Hope this helps,
Dan.
On 11/04/12 10:41, Toni Hermoso Pulido wrote:
> Hello,
>
> I'm trying to import into a MySQL DB an ENSEMBL file (concretely,
> homo_sapiens_variation_66_37 allele.txt.gz ~ 20 GB) using MySQL LOAD
> DATA INFILE.
> However, it always gets stuck when the index file is around 5 GB. At
> that time, the process status is 'Repair with keycache'.
>
> After discovering this file:
> ftp://ftp.ensembl.org/pub/ensembl-functgenomics/scripts/DAS/load_bed_source.pl
>
> I've been trying different values of myisam_sort_buffer_size and
> myisam_max_sort_file_size parameters with no luck.
>
> MySQL version is 5.1.45.
>
> Has anyone experienced problem like this?
>
> Thanks in advance,
>
More information about the Dev
mailing list