[ensembl-dev] EFG Array Mapping inconsistent performance

Oliver, Gavin gavin.oliver at almacgroup.com
Thu Oct 7 15:45:26 BST 2010


For anyone following this:

 

The speed problem appears to have been overcome by running an OPTIMISE
command on the database via mysql.

 

Speed is approximately x37 faster.

 

Nervous breakdown narrowly averted.

 

________________________________

From: Oliver, Gavin 
Sent: 06 October 2010 15:02
To: dev at ensembl.org
Subject: Re: [ensembl-dev] EFG Array Mapping inconsistent performance

 

Hi, 

 

With the v60 EFG patches applied I have been looking into these queries
more as they are still very slow.

 

They take around 66 seconds on my desktop and 36 seconds on my server.  

 

As a reminder, an example of the query looks like this:

>SELECT pf.probe_feature_id, pf.seq_region_id, pf.seq_region_start,
pf.seq_region_end, pf.seq_region_strand, pf.probe_id, pf.analysis_id,
pf.mismatches, pf.cigar_line, p.name, p.probe_set_id FROM (probe_feature
pf, probe p, array_chip ac) WHERE ac.array_id IN (56,52,54,55,53,58,57)
and ac.array_chip_id=p.array_chip_id AND pf.seq_region_id = 581 AND
pf.seq_region_start <= 95818078 AND pf.seq_region_end >= 95815747 AND
pf.seq_region_start >= 94655311 AND pf.probe_id = p.probe_id GROUP by
pf.probe_feature_id ORDER BY pf.seq_region_id, >pf.seq_region_start,
pf.probe_feature_id;

 

 

 

Some details from the sloq query log:

# Query_time: 66.536002 Lock_time: 0.000324 Rows_sent: 96 Rows_examined:
13635584

 

Explain shows

# Using where; Using temporary; Using filesort

 

Apparently 'filesort' can mean a sort on disk or in memory but it
indicates that the sort can't be performed from an index.

 

Using the profiling ability in mysql I was able to see that 99% of the
query time is spent "copying to tmp table".

 

I don't know if this is happening in disk or memory. I know that there
are no TEXT/BLOB fields in the tables being queried though. Also, I
should point out there is no real load on the server.

 

Does anyone have any thoughts on this?

 

 

________________________________

From: Nathan Johnson [mailto:njohnson at ebi.ac.uk] 
Sent: 15 September 2010 10:59
To: Oliver, Gavin
Cc: dev at ensembl.org
Subject: [SPAM] - Re: [ensembl-dev] EFG Array Mapping inconsistent
performance - Email found in subject

 

Hi Gavin

 

For v60 we have now set the cigar_line field to a VARCHAR with a
sensible field length of 50. The current largest cigar_line in human is
around 30 characters long.

 

This should remove the issue of tmp tables being created on disk.

 

Nath

 

 

 

On 25 Aug 2010, at 10:12, Oliver, Gavin wrote:

 

Hi all,

 

I've been running some tests on our server, annotating a test batch of
around 50 Affy probesets (25-mer probes, 11 probes per set).

 

Running the same batch of probesets has completed in anywhere from 30
mins to just over 2 hours!

 

The load on the server does not seem to be the issue.  I have run a
number of mysql tuners to optimize things and the only complaint I am
seeing is that 50% of our temporary tables are being created on disk
rather than in memory.  Apparently this is a major cause of inconsistent
mySQL performance.  I have increased all relevant variables but it does
not seem to affect the % of temp tables being created on disk.  I know
that BLOB and TEXT fields cannot be stored in memory and must be created
on disk so I'm beginning to think this may be the problem.  Has anyone
else experienced similar problems or does anyone have any ideas on how
the BLOB and TEXT fields could be redefined to enable in-memory storage?

 

 

Best,

 

Gavin

 

The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in
error, you should delete it from your system immediately and advise the
sender.

 

Almac Group (UK) Limited, registered no. NI061368.  Almac Sciences
Limited, registered no. NI041550.  Almac Discovery Limited, registered
no. NI046249.  Almac Pharma Services Limited, registered no. NI045055.
Almac Clinical Services Limited, registered no. NI041905.  Almac
Clinical Technologies Limited, registered no. NI061202.  Almac
Diagnostics Limited, registered no. NI043067.  All preceding companies
are registered in Northern Ireland with a registered office address of
Almac House, 20 Seagoe Industrial Estate,Craigavon, BT63 5QD, UK.  

 

Almac Sciences (Scotland) Limited, registered in Scotland no. SC154034.

 

Almac Clinical Services LLC, Almac Clinical Technologies LLC and Almac
Diagnostics LLC are Delaware limited liability companies and Almac Group
Incorporated is a Delaware Corporation.  More information on the Almac
Group can be found on the Almac website:www.almacgroup.com

_______________________________________________
Dev mailing list
Dev at ensembl.org
http://lists.ensembl.org/mailman/listinfo/dev

 

Nathan Johnson

Scientific Programmer

European Bioinformatics Institute

Wellcome Trust Genome Campus

Hinxton

Cambridge CB10 1SD

Email: njohnson at ebi.ac.uk

TelNo: (+44)1223 492629

 

 

 

 


The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender.

Almac Group (UK) Limited, registered no. NI061368.  Almac Sciences Limited, registered no. NI041550.  Almac Discovery Limited, registered no. NI046249.  Almac Pharma Services Limited, registered no. NI045055.  Almac Clinical Services Limited, registered no. NI041905.  Almac Clinical Technologies Limited, registered no. NI061202.  Almac Diagnostics Limited, registered no. NI043067.  All preceding companies are registered in Northern Ireland with a registered office address of Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK.  

Almac Sciences (Scotland) Limited, registered in Scotland no. SC154034. 

Almac Clinical Services LLC, Almac Clinical Technologies LLC and Almac Diagnostics LLC are Delaware limited liability companies and Almac Group Incorporated is a Delaware Corporation.  More information on the Almac Group can be found on the Almac website: www.almacgroup.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20101007/b714739c/attachment.html>


More information about the Dev mailing list