[ensembl-dev] Slow funcgen queries
Alexander Pico
apico at gladstone.ucsf.edu
Tue May 17 01:01:31 BST 2011
I think the key to efficiency here is skipping the Affy Exon arrays. But I'm
guessing that filtering them out AFTER making the $probe->get_all_Arrays();
call will not help (see code snippet below). Is there an alternative way to
retrieve arrays and exclude a particular $array->name() ?
- Alex
On 1/6/11 7:02 PM, "Alexander Pico" <apico at gladstone.ucsf.edu> wrote:
> I'm running through all genes per species and collecting xrefs as well as
> associated probes from funcgen. This is all working fine and most species I
> have tried are processed within a day or so. Except for human and mouse.
>
> The processing times correlate with the size of the funcgen databases (which
> I have local copies of), with human being slower than mouse being slower
> than rat, and so on. Human takes over a month to process! I get through
> ~1000 genes per day.
>
> I also observe that mysql spends a lot of time in this state (copied from
> 'show processlist'):
> homo_sapiens_funcgen_60_37e | Query | 208 | Copying to tmp table | SELECT
> pf.probe_feature_id, pf.seq_region_id, pf.seq_region_start,
> pf.seq_region_end, pf.seq_region |
>
> This leads me to believe that something the in funcgen API calls for a copy
> to tmp table and this is a major performance sink.
>
> Any tips on ways around this or optimizations of either the tables, queries
> or API calls that would help here? I'm currently calling:
>
> $probe_adaptor = $registry->get_adaptor($species, "funcgen",
> "ProbeFeature");
>
> my $probe_features =
> $probe_adaptor->fetch_all_by_linked_transcript_Gene($gene);
>
> foreach my $pf (@$probe_features) {
> my $probe = $pf->probe();
> my $array_list = $probe->get_all_Arrays();
>
> foreach my $array (@$array_list){
> ...save $array values
> }
> }
>
> Thanks!
> - Alex
>
>
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
More information about the Dev
mailing list