[ensembl-dev] Best way to collect probesets (excluding all exon arrays)? -> workaround found

Thu May 26 03:37:42 BST 2011

I found a workaround that dramatically improves performance that others
might find useful. It involves commenting out lines that add unnecessary
(and slow) table query parameters for Probe and ProbeSet queries based on
transcript.  I had to skip querying ProbeFeature altogether due to the
massive table size in human and mouse. With these minor edits, I can
retrieve all probe and probeset annotations (though not feature details) for
every human gene in a few hours, rather than weeks!

Comment out the following lines:
ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/DBSQL/DBEntryAdaptor.pm
359,360c359,360
<       $from_sql  = 'probe p, ';
<       $where_sql = qq( p.probe_id = oxr.ensembl_id AND );
---
> #     $from_sql  = 'probe p, ';
> #     $where_sql = qq( p.probe_id = oxr.ensembl_id AND );
363,364c363,364
<       $from_sql  = 'probe_set ps, ';
<       $where_sql = qq( ps.probe_set_id = oxr.ensembl_id AND );
---
> #     $from_sql  = 'probe_set ps, ';
> #     $where_sql = qq( ps.probe_set_id = oxr.ensembl_id AND );

On 5/24/11 10:14 AM, "Alexander Pico" <apico at gladstone.ucsf.edu> wrote:

> Looks like a known problem. The API code has the following comment notes:
> 
> Funcgen/DBSQL/ResultFeatureAdaptor.pm: line 1259
>  #Not straight forward without creating tmp table
> 
> In version 60, the note in this area stated:
>  #This join between sr and pf is causing the slow down.  Need to select
> right join for this.
>  #just do two separate queries for now.
> 
> 
> Indeed, the tmp table triggered by the join is still causing a slow down.
> Let us know if you come up with any workarounds or solutions to this tmp
> table issue. Thanks!
>  - Alex
> 
> 
> On 5/23/11 6:28 PM, "Alexander Pico" <apico at gladstone.ucsf.edu> wrote:
> 
>> Hi,
>> 
>> I'm looking for a better way to get probe features. I'm currently using
>> 'fetch_all_by_linked_transcript_Gene()', but for species with all exon
>> arrays, this can take days...
>> 
>> Other than going in and deleting probesets from the funcgen databases (local
>> copies), how can I get around processing certain arrays, like the all exon
>> arrays, and just collect everything else?
>> 
>> 
>> Here's my current code snippet:
>> 
>> my $probe_adaptor = $registry->get_adaptor($species, "funcgen",
>> "ProbeFeature");
>>      
>> my $probe_features =
>> $probe_adaptor->fetch_all_by_linked_transcript_Gene($gene);
>>   
>> foreach my $pf (@$probe_features) {
>>     // do stuff
>> }
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/