[ensembl-dev] mySQL local ENSEMBL instalation

Will McLaren wm2 at ebi.ac.uk
Mon May 14 12:17:16 BST 2012


Hi Duarte

Copied from an earlier dev email reply from Dan Sheppard (thanks Dan).
You may need to check your MySQL server's tmp dir:

MySQL reverts to "repair with keycache" rather than "repair by
sorting" when sorting is not possible due to insufficient disk space
to undertake an external merge sort in its temporary filesystem. (In
this case "repair" arguably being a misnomer for rebuilding indexes
following changes to a table).

The difference in performance between the two techniques is so bad
that if your data is so big that repair by sorting has insufficient
disk space, repair with keycache is unlikely to ever succeed.

MySQL's can be a bit mysterious about reasons for its decisions in
this area but it might be worth checking the server logs to see if it
has, in your case, been forthcoming.

One option includes myisam_max_sort_file_size being too small, as
you've tried. The files generated during sorting can be many multiples
of the data size, so it's worth ramping this parameter way-way up (eg
ten or a hundred times as big as the corresponding myd file). MySQL
almost never uses that amount of disk, but won't proceed unless it's
available: it gloomily assumes that all indexed fields will be of
their maximum size in all cases, so the estimate for a usually short
string indexed out to 255 characters can be very pessimistic.

Another option is that the temporary filesystem which MySQL is
planning to use is not actually large enough (tmpdir= in my.cnf) to
contain the temporary files. If your tmpdir is set to /tmp and you're
on a system which reveals via df that /tmp is a dinky tmpfs type
affair it may well be worth changing it.

One possible solution if this all gets frustrating (which I've never
tried, caveat emptor) is to prevent the repair step altogether by
disabling keys and then using "myisamchk -n" which explicitly forces
repair by sorting. If you're feeling adventurous, also take a look at
the man page for the -p option.

Hope this helps,
Dan.

On 14 May 2012 12:08, Duarte Molha <Duarte.Molha at ogt.co.uk> wrote:
> Dear Developers
>
>
>
> In my group we have done local installations of the ensembl database in
> order to improve performance and also because we do not what to tax your
> servers with queries.
>
>
>
> However in the latest releases we have been having difficulties installing
> the tables in our internal database... especially the alleles table.
>
> The latest 66 release we installed took 8 days to process the alleles table
> and I am sure that it should not take this long!
>
>
>
> We have followed the instructions found here:
> http://www.ensembl.org/info/docs/webcode/install/ensembl-data.html
>
>
>
> On this page:
> http://www.ensembl.org/info/docs/variation/database.html#db_load
>
>
>
> You indicated that the alleles table (for version 61) should take about 3
> hours to process. Since the Alleles table for version 67 is more than double
> in size we would expect for it to take between 6 to 9 hours but never 8
> days!
>
>
>
> Can you tell me if we are missing something or give us some workaround in
> order to accelerate the process?
>
>
>
> Best regards
>
>
>
> Duarte Molha
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list