[ensembl-dev] Best way to collect probesets (excluding all exon arrays)? -> workaround found

Fri May 27 23:34:38 BST 2011

Thanks Nath!

> I'm a but confused as to where the slow down is coming from exactly so I will
> run the script on our local DBs to see where the bottle neck is coming from.
> It maybe that we can turn it on it's head and start with an array restricted
> probe feature query.  This will allow us to filter out the affy ST arrays up
> front, with the caveat that there will be probe features with no xrefs.

An array-restricted probe feature query would be nice.

> Can you also send me the full sql which is causing the tmp table to be
> created?

Here is what is listed in 'show processlist', but it's truncated:
| Copying to tmp table | SELECT  pf.probe_feature_id, pf.seq_region_id,
pf.seq_region_start, pf.seq_region_end, pf.seq_region |

I think it's trigger by this query from Funcgen DBEntryAdaptor.pm (line
399):
SELECT oxr.ensembl_id
       FROM probe_feature pf, external_db xdb,  xref x, object_xref oxr,
external_synonym syn
      WHERE pf.probe_feature_id = oxr.ensembl_id AND xdb.db_name LIKE
'homo_sapiens_core_Transcript%' AND xdb.external_db_id = x.external_db_id
AND syn.synonym = ? AND
             x.xref_id = oxr.xref_id AND
             oxr.ensembl_object_type= ? AND
             syn.xref_id = oxr.xref_id

> One thing I also like to point out is that you are fetching ProbeFeature xref
> data, for Affy arrays the associated probe set may actually fail our
> transcript mapping pipeline.  The transcript xrefs are actually stored at the
> Probe or ProbeSet level, not the feature level.

Well, Affy probesets were coming through just fine via ProbeFeatures.  But
since the feature route was inefficient for just getting basic probe info
(don't need feature info), I'm now querying Probes and ProbeSets instead of
ProbeFeatures. This makes things a lot faster (as long as I comment out the
joins with probe and probe_set tables).

 - Alex