[ensembl-dev] Affymetrix Probesets

Nathan Johnson njohnson at ebi.ac.uk
Tue Jul 12 09:14:24 BST 2011


Hi Alexander

Great stuff!  All of our tables are routinely optimized and analyzed throughout production cycle. In fact it's part of our standard practise whenever we copy a database.

I can only assume that the problem arises when loading the data from flat files, as the indexes are generated afresh (i.e. a major change).  I would say it would be a good idea to run analyze and optimize over all of the tables, just to be sure.


Glad to be of help.

Nath

On 4 Jul 2011, at 23:09, Alexander Pico wrote:

> Remarkable...
> 
> After running both OPTIMIZE statements, I now get the exact same "explain"
> output as you guys (see below). And the formerly problematic lines now
> breeze by in seconds (rather than minutes):
> 
> $probe_feature_adaptor->fetch_all_by_linked_transcript_Gene($gene);
> -or-
> $probe_feature_adaptor->fetch_all_by_external_name('ENSMUST00000102049');
> 
> This is fantastic.  Should I be running OPTIMIZE on these two tables for
> every species and every release?  Is it common practice to OPTIMIZE all
> tables over a certain size?  A quick Google search seem to suggest that this
> is only needed after making major changes (e.g., deletions), but these
> tables are fresh from your ftp.
> 
> - Alex
> 
> 
> +----+-------------+-------+-------+-------------------+---------+---------+
> -----------------------------------------+------+---------------------------
> --+
> | id | select_type | table | type  | possible_keys     | key     | key_len |
> ref                                     | rows | Extra
> |
> +----+-------------+-------+-------+-------------------+---------+---------+
> -----------------------------------------+------+---------------------------
> --+
> |  1 | SIMPLE      | pf    | range | PRIMARY,probe_idx | PRIMARY | 4       |
> NULL                                    | 2048 | Using where; Using filesort
> | 
> |  1 | SIMPLE      | p     | ref   | PRIMARY           | PRIMARY | 4       |
> mus_musculus_funcgen_62_37o.pf.probe_id |    1 |
> | 
> +----+-------------+-------+-------+-------------------+---------+---------+
> -----------------------------------------+------+---------------------------
> --+
> 
> 
> 
> On 7/4/11 1:46 PM, "Ian Sealy" <is1 at sanger.ac.uk> wrote:
> 
>> Dear Alex,
>> 
>>>> Also, have you tried optimising the tables?
>>> 
>>> You mentioned optimizing the tables? I simply installed databases from
>>> Ensembl, the corresponding API and then started running my scripts.
>>> Is "optimizing" a command I run on certain tables?
>> 
>> Yep, I think Nathan is suggesting you try:
>> 
>> OPTIMIZE TABLE probe_feature;
>> OPTIMIZE TABLE probe;
>> 
>> See http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html for what
>> it does.
>> 
>> Running ANALYZE TABLE might be enough though.
>> 
>> Cheers,
>> Ian
>> 
> 

Nathan Johnson
Senior Scientific Programmer
Ensembl Regulation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD

http://www.ensembl.info/
http://twitter.com/#!/ensembl










More information about the Dev mailing list