[ensembl-dev] Best way to collect probesets (excluding all exon arrays)? -> workaround found
Nathan Johnson
njohnson at ebi.ac.uk
Tue May 31 15:20:19 BST 2011
Hi Alexander
This sounds like we are making a little head way here.
You can get non-truncated output from the mysql server by specifying:
show full processlist;
If you don't need probe feature info, then we won't be needing the
array restricted probe feature method, but fyi this does already
exists, just in case you need it:
ProbeFeatureAdaptor::fetch_all_by_Slice_Arrays
If you are now coming at this from a Probe/Set perspective you will
probably be wanting to use:
Array->get_all_Probes/Sets
And then perform the DBEntry queries on each Probe/Set returned. If
you haven't already seen it, there are also some good examples of the
Array/Probe/Set API available here:
ensembl-functgenomics/scripts/examples/microarray_annotation_example.pl
Am currently off tending to sick child, but will take a closer look at
the underlying methods when I'm back in.
Nath
On 27 May 2011, at 23:34, Alexander Pico wrote:
> Thanks Nath!
>
>> I'm a but confused as to where the slow down is coming from exactly
>> so I will
>> run the script on our local DBs to see where the bottle neck is
>> coming from.
>> It maybe that we can turn it on it's head and start with an array
>> restricted
>> probe feature query. This will allow us to filter out the affy ST
>> arrays up
>> front, with the caveat that there will be probe features with no
>> xrefs.
>
> An array-restricted probe feature query would be nice.
>
>> Can you also send me the full sql which is causing the tmp table to
>> be
>> created?
>
> Here is what is listed in 'show processlist', but it's truncated:
> | Copying to tmp table | SELECT pf.probe_feature_id,
> pf.seq_region_id,
> pf.seq_region_start, pf.seq_region_end, pf.seq_region |
>
> I think it's trigger by this query from Funcgen DBEntryAdaptor.pm
> (line
> 399):
> SELECT oxr.ensembl_id
> FROM probe_feature pf, external_db xdb, xref x, object_xref
> oxr,
> external_synonym syn
> WHERE pf.probe_feature_id = oxr.ensembl_id AND xdb.db_name LIKE
> 'homo_sapiens_core_Transcript%' AND xdb.external_db_id =
> x.external_db_id
> AND syn.synonym = ? AND
> x.xref_id = oxr.xref_id AND
> oxr.ensembl_object_type= ? AND
> syn.xref_id = oxr.xref_id
>
>> One thing I also like to point out is that you are fetching
>> ProbeFeature xref
>> data, for Affy arrays the associated probe set may actually fail our
>> transcript mapping pipeline. The transcript xrefs are actually
>> stored at the
>> Probe or ProbeSet level, not the feature level.
>
> Well, Affy probesets were coming through just fine via
> ProbeFeatures. But
> since the feature route was inefficient for just getting basic probe
> info
> (don't need feature info), I'm now querying Probes and ProbeSets
> instead of
> ProbeFeatures. This makes things a lot faster (as long as I comment
> out the
> joins with probe and probe_set tables).
>
> - Alex
>
More information about the Dev
mailing list