[ensembl-dev] Slow funcgen queries

Alexander Pico apico at gladstone.ucsf.edu
Fri Jan 7 03:02:05 GMT 2011


I'm running through all genes per species and collecting xrefs as well as
associated probes from funcgen.  This is all working fine and most species I
have tried are processed within a day or so. Except for human and mouse.

The processing times correlate with the size of the funcgen databases (which
I have local copies of), with human being slower than mouse being slower
than rat, and so on. Human takes over a month to process! I get through
~1000 genes per day.

I also observe that mysql spends a lot of time in this state (copied from
'show processlist'):
homo_sapiens_funcgen_60_37e | Query   |  208 | Copying to tmp table | SELECT
pf.probe_feature_id, pf.seq_region_id, pf.seq_region_start,
pf.seq_region_end, pf.seq_region |

This leads me to believe that something the in funcgen API calls for a copy
to tmp table and this is a major performance sink.

Any tips on ways around this or optimizations of either the tables, queries
or API calls that would help here? I'm currently calling:

$probe_adaptor = $registry->get_adaptor($species, "funcgen",
"ProbeFeature");
    
my $probe_features =
$probe_adaptor->fetch_all_by_linked_transcript_Gene($gene);
 
foreach my $pf (@$probe_features) {
      my $probe = $pf->probe();
      my $array_list = $probe->get_all_Arrays();

      foreach my $array (@$array_list){
            ...save $array values
      }
}

Thanks!
 - Alex







More information about the Dev mailing list