[ensembl-dev] Slow funcgen queries

Alexander Pico apico at gladstone.ucsf.edu
Tue May 17 01:07:03 BST 2011


Ah, I just had an idea. I'm using a local copy of the funcgen database, so I
could just go in and delete all Exon Array entries.  Advice on doing this
efficiently and robustly?

1. I could go through all rows in 'array' table and delete 'name' matching
%Ex%.

2. Would it work to just remove it from the 'array_chip' list? That would be
easy.

Other ideas?  Patching API?

 - Alex


On 5/16/11 5:01 PM, "Alexander Pico" <apico at gladstone.ucsf.edu> wrote:

> I think the key to efficiency here is skipping the Affy Exon arrays. But I'm
> guessing that filtering them out AFTER making the $probe->get_all_Arrays();
> call will not help (see code snippet below). Is there an alternative way to
> retrieve arrays and exclude a particular $array->name() ?
> 
>  - Alex
> 
> 
> On 1/6/11 7:02 PM, "Alexander Pico" <apico at gladstone.ucsf.edu> wrote:
> 
>> I'm running through all genes per species and collecting xrefs as well as
>> associated probes from funcgen.  This is all working fine and most species I
>> have tried are processed within a day or so. Except for human and mouse.
>> 
>> The processing times correlate with the size of the funcgen databases (which
>> I have local copies of), with human being slower than mouse being slower
>> than rat, and so on. Human takes over a month to process! I get through
>> ~1000 genes per day.
>> 
>> I also observe that mysql spends a lot of time in this state (copied from
>> 'show processlist'):
>> homo_sapiens_funcgen_60_37e | Query   |  208 | Copying to tmp table | SELECT
>> pf.probe_feature_id, pf.seq_region_id, pf.seq_region_start,
>> pf.seq_region_end, pf.seq_region |
>> 
>> This leads me to believe that something the in funcgen API calls for a copy
>> to tmp table and this is a major performance sink.
>> 
>> Any tips on ways around this or optimizations of either the tables, queries
>> or API calls that would help here? I'm currently calling:
>> 
>> $probe_adaptor = $registry->get_adaptor($species, "funcgen",
>> "ProbeFeature");
>>     
>> my $probe_features =
>> $probe_adaptor->fetch_all_by_linked_transcript_Gene($gene);
>>  
>> foreach my $pf (@$probe_features) {
>>       my $probe = $pf->probe();
>>       my $array_list = $probe->get_all_Arrays();
>> 
>>       foreach my $array (@$array_list){
>>             ...save $array values
>>       }
>> }
>> 
>> Thanks!
>>  - Alex
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/






More information about the Dev mailing list