[ensembl-dev] Affymetrix Probesets

Nathan Johnson njohnson at ebi.ac.uk
Tue Jul 5 17:08:07 BST 2011


Hi Alexander

Comments below...

On 4 Jul 2011, at 21:25, Alexander Pico wrote:

> Thanks Nath,
> 
> Despite this long, dragged out issue, you continue to be responsive and
> helpful. Much appreciated!

No problem, I am keen to get this sorted out and to better understand some of the issues people can have with the funcgen DBs.

> 
> I've pasted our mysql specs and explain-query below. The spec numbers seem
> comparable. But the explain results are significantly different. I'm not
> really familiar with this output, but it looks like I'm not using the same
> keys and end up " Using temporary", which is my ultimate downfall...
> 
> Do I need to add indexes beyond what come with the 'core' and 'funcgen'
> databases?  You mentioned optimizing the tables? I simply installed
> databases from Ensembl, the corresponding API and then started running my
> scripts.  Is "optimizing" a command I run on certain tables? Or is it the
> process of identifying new/better indexes?

You shouldn't really need to optimize the tables as this is normally done as part of the release process. The mysql optimize command runs on each table and will defrag the table and sort the indexes. So it's a good starting point if you're having performance problems.

To run this on all tables in your db you could do something like this:

mysql -N -uUSER -pPASS -P3306 -e'show tables' rattus_norvegicus_funcgen_63_34 | while read t; do mysql -uUSER -pPASS -P3306 -e"optimize $t" rattus_norvegicus_funcgen_63_34 ; done

To be safe you could also do this with the ANALYZE command. These analyze and optimize status can change how the MySQL query planner works, so can alter which indexes are chosen.

You shouldn't need to create any new indexes, as the existing ones are sufficient, it's just the the probe PRIMARY key is not being used for some reason.

I will take a close look at the mysql config params request and get back to you.

wrt to the method request. The fetch_all is method is implemented in the BaseAdaptor, so this is already available.  There is a fetch_all_by_class method, which is probably quite useful for you. The arrays from each individual class are run through the pipeline together, as they share array design parameters and hence alignment params etc.  This results from a fetch_all_by_vendor method would be identical, apart from that AFFY has two classes, AFFY_ST and AFFY_UTR.

Nath


> 
> Regarding my workaround. Yes, I ended up using a hash to avoid multiple
> "gene" hits per probe/probeset. The request for a function to list all
> arrays would help resolve the following issue.
> 
> Currently, I have to call this to get an array:
> my $array = $array_adaptor->fetch_by_name_vendor($array_name);
> Which means I have to know all the $array_names a priori.
> Ideally, I could call:
> my @arrayList = $array_adaptor->fetch_all_arrays();
> This could take vendor as an arg, but then it'd also be nice to have:
> my @vendorList = $array_adaptor->fetch_all_vendors();
> 
> - Alex
> 
> 
> mysql> show variables like "%buffer%";
> +-------------------------------+-----------+
> | Variable_name                 | Value     |
> +-------------------------------+-----------+
> | bdb_log_buffer_size           | 262144    |
> | bulk_insert_buffer_size       | 8388608   |
> | innodb_buffer_pool_awe_mem_mb | 0         |
> | innodb_buffer_pool_size       | 524288000 |
> | innodb_log_buffer_size        | 1048576   |
> | join_buffer_size              | 131072    |
> | key_buffer_size               | 499998720 |
> | myisam_sort_buffer_size       | 8388608   |
> | net_buffer_length             | 16384     |
> | preload_buffer_size           | 32768     |
> | read_buffer_size              | 131072    |
> | read_rnd_buffer_size          | 262144    |
> | sort_buffer_size              | 2097144   |
> +-------------------------------+-----------+
> 13 rows in set (0.00 sec)
> 
> mysql> show variables like "%heap%";
> +---------------------+----------+
> | Variable_name       | Value    |
> +---------------------+----------+
> | max_heap_table_size | 16777216 |
> +---------------------+----------+
> 
> mysql> explain ...
> +----+-------------+-------+------+-------------------+-----------+---------
> +----------------------------------------+---------+------------------------
> ---------+
> | id | select_type | table | type | possible_keys     | key       | key_len
> | ref                                    | rows    | Extra
> |
> +----+-------------+-------+------+-------------------+-----------+---------
> +----------------------------------------+---------+------------------------
> ---------+
> |  1 | SIMPLE      | p     | ALL  | PRIMARY           | NULL      | NULL
> | NULL                                   | 8759225 | Using temporary; Using
> filesort | 
> |  1 | SIMPLE      | pf    | ref  | PRIMARY,probe_idx | probe_idx | 4
> | mus_musculus_funcgen_62_37o.p.probe_id |       3 | Using where
> | 
> +----+-------------+-------+------+-------------------+-----------+---------
> +----------------------------------------+---------+------------------------
> ---------+
> 
> 
> 
> On 7/4/11 7:35 AM, "Nathan Johnson" <njohnson at ebi.ac.uk> wrote:
> 
>> Hi Alexander
>> 
>> I have been looking at your original fetch_all_by_external_name example.
>> ENSMUST00000102049 is a particularly popular gene in terms of probe mappings,
>> I count 2471 in all. But for me, these return in under 10 seconds. Which
>> considering the number of objects being created, is certainly in the right
>> ball park, and no where near 5 mins.
>> 
>> This method effectively performs two queries. 1 to get the probe feature IDs
>> from the xref schema, given the stable ID. And a second to fetch all the
>> ProbeFeatures given the list of internal DB ids, in this case ~2500 ids long.
>> It's a fairly simple join to retrieve the probe features, but we do return by
>> a default order, which will trigger a filesort for this query at present.
>> (Currently choosing to prioritise probe based queries for sorting using the
>> index).
>> 
>> Considering you were triggering a tmp table, and the filesort is fairly quick
>> for me, I guess this is not your problem.
>> 
>> Can you do an explain your mysql server and post the results? Have pasted the
>> query below. I've also pasted all out buffer sizes, in case this gives you any
>> clues.  
>> 
>> I doubt this is the case, but it maybe that your key_buffer_size is too small
>> to handle all the key blocks from the query, hence triggering a tmp table. You
>> would probably be seeing general slowness with all large queries if this was
>> the case.  
>> 
>> Also, have you tried optimising the tables?
>> 
>> It's good to see you've found a work around. However, the problem with working
>> like this is that you will be getting many redundant xrefs(genes), for each of
>> the Affy arrays, hence the original suggestion to start with the
>> Gene/Transcript stable IDs.  I'm not quite sure I understand you request about
>> the list of Arrays. Where exactly do you want this functionality?
>> 
>> 
>> Nath
>> 
>> mysql> show variables like "%buffer%";
>> +-------------------------+------------+
>> | Variable_name           | Value      |
>> +-------------------------+------------+
>> | bulk_insert_buffer_size | 8388608    |
>> | innodb_buffer_pool_size | 4294967296 |
>> | innodb_log_buffer_size  | 8388608    |
>> | join_buffer_size        | 131072     |
>> | key_buffer_size         | 4193255424 |
>> | myisam_sort_buffer_size | 8388608    |
>> | net_buffer_length       | 16384      |
>> | preload_buffer_size     | 32768      |
>> | read_buffer_size        | 262144     |
>> | read_rnd_buffer_size    | 262144     |
>> | sort_buffer_size        | 2097152    |
>> | sql_buffer_result       | OFF        |
>> +-------------------------+------------+
>> 
>> mysql> show variables like "%heap%";
>> +---------------------+----------+
>> | Variable_name       | Value    |
>> +---------------------+----------+
>> | max_heap_table_size | 16777216 |
>> +---------------------+----------+
>> 
>> 
>> explain SELECT  pf.probe_feature_id, pf.seq_region_id, pf.seq_region_start,
>> pf.seq_region_end, pf.seq_region_strand, pf.probe_id, pf.analysis_id,
>> pf.mismatches, pf.cigar_line, p.name, p.probe_set_id
>> FROM  (probe_feature pf, probe p)
>> WHERE pf.probe_feature_id  IN
>> (24011933,31639410,19145471,35203641,17815601,30037965,22305816,23610466,21378
>> 222,26080812,34380111,33167220,28996153,33442533,19346150,19134061,28702753,26
>> 494030,20336756,29071313,23897509,19969266,23008789,32197650,36395239,30204881
>> ,24585842,32269563,27487159,22923378,28001558,19639979,23534197,20037959,36245
>> 940,34512206,30104507,29108586,23902516,21424238,28811060,23573354,29263042,28
>> 973380,21604078,33546702,32881972,29726616,28571494,18078956,20703325,22266734
>> ,20658466,18674862,34432014,28111887,33097500,31450834,20173612,30894526,23007
>> 732,22147758,25709984,22165564,21620519,26953745,32872914,29329303,21263505,24
>> 651190,22897875,20848891,34234476,22936219,28114340,24969629,31243578,28462298
>> ,28950321,34079884,32839648,32764720,30807771,30477257,27433894,26714299,19977
>> 103,27289280,35128819,21362153,21600381,18506280,21196049,29551627,18682058,27
>> 455633,30618421,27123094,27275877,22913832,35823767,20658865,18630657,24571105
>> ,28538169,24022713,22259830,27113418,28114857,22854180,23011160,17929941,30602
>> 298,34702984,21386104,17728662,25038178,21397353,30126926,33823200,17933929,26
>> 560878,26114508,20662249,21431505,23006655,23900708,23569039,20176107,27262897
>> ,35027194,21383478,19961800,21549260,18327642,18099135,22577238,20419236,22533
>> 188,21550317,20054254,32835967,34974786,27608265,33424554,25899176,20276120,25
>> 385115,20750593,29890643,34857538,33563252,28723578,27902776,23599445,28426935
>> ,27384085,35849971,33493863,24007760,17814444,22857100,22309707,29989949,23212
>> 499,19643243,19335625,31715992,25895474,17909788,21599946,18745809,30451719,21
>> 196830,26138128,23937759,26526475,18730137,23606403,21125851,25228787,21605862
>> ,31974620,25386382,24314592,21550723,24025499,25396061,22915668,30811835,35123
>> 376,26924902,34344495,29333028,25230319,33138818,17897928,34559749,33669876,24
>> 152427,32852599,34942587,29264817,36614691,17898494,32040352,31011108,31762166
>> ,26117709,24584031,29183782,24135522,23585812,19129339,32755466,25431300,33041
>> 964,19369364,20441407,34377973,32065162,34068475,19314411,23590335,21604454,36
>> 450563,27146233,34462615,23111656,20030686,31239156,31443940,25430602,17887695
>> ,31856797,35007445,24574878,27961786,25397509,31475459,17812582,18755857,30446
>> 114,20027432,26134959,21432307,33870530,31259975,29669704,20660828,26123628,29
>> 475901,32383134,31472780,17895275,18738236,17680190,29552221,30073102,18821428
>> ,32885413,27720017,17890765,23535943,32280364,25776709,24647342,31831571,23602
>> 731,19972130,30539903,17732292,29960883,27794948,32882003,24150248,21506431,34
>> 716124,21427824,24599663,21430976,19662980,24149670,27706769,21201230,23576980
>> ,30825832,24621301,20262457,32669480,29262042,21123300,25154443,30569956,28373
>> 190,18077166,25158705,21425973,31052820,34927911,36335525,27692150,24595666,23
>> 532226,23576999,26534521,32347493,17681958,18575246,23070620,17903391,22934064
>> ,21508998,20849925,32956945,26074173,18331961,20666339,29301343,30384808,28605
>> 103,20853949,29404401,28095220,36454544,21506019,17917425,18079578,36402698,20
>> 017171,33349513,18742206,34796403,20172245,20269282,25759588,31016758,27760370
>> ,25896865,24597898,35575177,26121706,19330311,34097379,28233893,26970666,34345
>> 458,30746856,21603929,28734928,27854378,31502918,26997597,35234094,29919031,31
>> 394098,21316298,32030643,33972479,31515598,19978929,27621369,33131800,30387944
>> ,36532949,21607438,18914961,31311830,30858601,30814085,34523280,24136675,35968
>> 270,30174240,31619742,32072368,25238090,28585928,34603687,20170327,33462142,21
>> 425312,17877889,35289813,34175975,28973419,20177671,35559001,23072483,29779750
>> ,20039442,21604259,19363102,23095116,23006108,29988821,18329827,34904806,21288
>> 852,18080061,30903001,20268027,32379775,27656242,27914159,33878795,21507583,17
>> 899919,34653714,28849493,26569972,32810204,34883109,23900790,19135975,24598080
>> ,24477512,34732932,29744822,18299892,27476013,24968011,35007119,17921386,18535
>> 896,20666503,17937017,21202398,35387506,18646596,35494820,26749283,18077266,35
>> 775908,29701072,19721792,25387497,26872116,20341703,33471852,20034315,20176971
>> ,33388597,28663991,19977866,24967598,20030285,34805889,36155838,32594938,34839
>> 613,25898347,20038638,21547778,21107352,34593091,36394745,33933979,24616528,28
>> 504678,32745303,25151100,21320552,17812783,20173557,18740999,25898717,36587923
>> ,22587055,17907760,24622710,21546811,18654498,28603749,30775407,22776687,17906
>> 940,29397431,28759512,19631185,27548375,18537027,24577287,20195052,36273004,34
>> 452062,33463481,34083589,20176606,17931181,17905532,29279698,24196495,33309497
>> ,32729723,20196618,34138058,23090895,29674966,32790755,23968678,31805145,20771
>> 854,33854012,17709965,34507492,17746257,34135872,30607168,23900307,23593938,19
>> 141680,19313305,26810921,21127690,23955464,18670273,35737361,23096607,19135244
>> ,21395642,35092497,29294000,33311697,26936788,30675097,21388821,35459136,21429
>> 535,18823648,17930238,32021257,22910592,29397631,23533894,21428127,28651921,31
>> 607673,30871815,32715291,23901030,34710242,33114983,32855926,30706381,20665691
>> ,24306327,17678138,20279008,31209359,33081392,20430028,32571306,30354942,23213
>> 727,28573716,20662577,26947200,23100281,30806669,25398612,21598994,26082612,30
>> 394565,17750931,35311615,34288777,19691040,18080469,19370211,36565189,23562214
>> ,25890315,27364940,31518565,17767916,26534191,23094727,18526295,17932707,26080
>> 983,36000904,30496314,33150414,33747715,35693162,26530092,29578590,27580000,19
>> 683752,28718624,30261122,33255673,31876254,28598476,31233597,33840622,30794846
>> ,21543160,18742558,32571346,17606920,19372046,21605551,18322974,35641229,18078
>> 343,32168839,36497990,32438183,21318471,26123940,18910519,30915502,28867271,23
>> 531997,32354962,31939542,31802464,33011813,29081410,23069703,28818800,25711596
>> ,34192532,26562680,32769512,33097147,23211903,31023390,29245411,18328341,20016
>> 536,27218987,20976639,32276526,26588638,33666579,20657784,25897157,23108681,35
>> 729820,32572428,17813598,21196427,36117626,34815893,28171113,24594806,19137040
>> ,30811309,25577719,28670391,20434200,20199877,31634403,35604289,36196748,31181
>> 693,35509027,20971557,26568864,24971221,20267364,27860948,35731560,21430246,22
>> 978245,35342368,25392636,21607574,20442453,18572630,22562838,27940703,23082403
>> ,17758124,35860161,32634549,21607048,34023518,32677282,21596918,27782822,21623
>> 004,33062545,22308295,29626854,36274933,34659284,33377652,35583483,20176762,17
>> 929806,20411505,29276022,27354075,30283278,21550140,28204781,28155664,29548708
>> ,17751911,29679619,24149227,34198397,21112370,32760805,25146331,27965327,28079
>> 246,29848413,27856606,21367689,28939388,24014461,29789865,25387959,18079334,18
>> 677076,18499041,30631989,18741419,19134800,32330434,34290012,18625119,24580736
>> ,24897407,24015380,30999526,34395195,21430435,24621563,35531308,31322848,31569
>> 886,23569409,32808391,22328732,18333529,19370032,20660418,20658584,34269981,29
>> 888533,17728706,18653752,20410662,19365014,26136275,18734116,24598638,22263794
>> ,28087899,30885178,28051268,17935181,35378800,28099233,32603621,17618651,25889
>> 508,19709551,23574264,30880911,21295372,25159113,28986874,35222916,33987749,19
>> 128524,20662533,25772045,17739786,21321239,22925271,24573015,34883857,36333143
>> ,19959639,35187404,26498913,36014401,18335592,27014206,21431863,24154827,28794
>> 228,25387052,35251999,29126905,17700349,20171798,19375528,31455911,35212477,21
>> 622066,32780431,18576127,29649303,21605682,33081462,31697957,21203092,17930434
>> ,30612147,26724920,20170878,20196518,33120674,32928352,21122116,30592518,20017
>> 606,20033614,21432692,20024902,18080722,18538818,33331331,25768577,20667402,19
>> 376376,35963109,30835065,34514676,21319202,20029350,36111601,36237411,22309503
>> ,34771734,32984571,26563293,32889166,22916156,20666912,31170809,28705374,26490
>> 227,25232531,29065052,29461495,34988658,31047748,27997168,34763462,23006428,23
>> 584276,32214583,22265320,34780145,17935298,18502692,35094400,30947656,23007220
>> ,19648666,18326073,23571101,28803149,33748112,31711771,32278779,25233067,20660
>> 030,23571801,19137179,17622124,19333242,29285103,23059455,34089640,25785248,29
>> 492763,25148068,20174399,20776803,18324215,22535296,23958525,28234329,35305339
>> ,22773025,32826060,28825762,19968115,34521327,20663937,21203567,18507537,20174
>> 535,18649773,34265479,34649540,21386962,24024024,32589731,26070944,32663295,20
>> 049682,22780174,33649804,30840162,27989292,21290787,27840832,34478140,24020582
>> ,36452214,34443814,28136506,25893747,24863629,19969615,36472189,26117122,17811
>> 531,18321669,34022546,33375932,28476697,29673250,25585525,24468088,35954341,22
>> 308792,22857564,30986749,35643553,19628430,36598965,32210453,18681418,26701806
>> ,17884566,24615703,36152760,27632502,26785966,26691036,24651907,34756815,20433
>> 572,26966560,30796057,34814098,27569636,20698350,32446338,21367336,23605062,29
>> 711630,29982622,21285171,33846829,19157241,24602917,20667995,22912477,36526573
>> ,36156867,28584154,34721465,24157864,35685935,34168612,33276918,24649528,18554
>> 064,19635359,20659800,31662554,21420515,22168930,35763204,31093579,36124925,30
>> 271245,27553257,26665398,30782679,25760583,29252225,27897265,31648459,26565700
>> ,20269626,22588807,34287876,19373628,18525958,33137369,32994770,22573911,35626
>> 153,18099568,23214556,28026802,24149558,34155534,19139576,26509204,19955970,27
>> 120106,18571983,32106916,24446734,26717324,18960158,21123614,25725849,32916203
>> ,22306556,17648325,21385722,32892171,28163753,18957658,36510161,25898050,17711
>> 229,36063248,31224821,29038326,35531727,21369258,31690368,19969988,17929412,27
>> 882345,20688733,27053897,24317875,24325075,18570907,23533982,21322802,30400057
>> ,25394467,23578134,34481032,19718898,26751597,17916205,35768873,30158997,35479
>> 525,36476189,26569791,19124689,35306828,20348421,31301947,20264778,22305554,34
>> 356673,33438991,18959295,31778027,24012732,21426520,31188397,27960601,27317703
>> ,23564974,26704537,24654189,28473126,22938801,25892219,30083548,33428242,32852
>> 985,35612799,20662713,32222005,36160194,26568195,33630389,20041550,17752243,18
>> 300984,35542649,34022312,35453930,33192305,20659325,18098499,31106156,30146579
>> ,24134445,22863166,36553792,32812118,32518090,19659932,36103800,22862188,22155
>> 983,21601560,23005753,18748832,17862985,28494863,21503269,34264772,28541910,31
>> 602898,36346943,35326062,26455853,24010619,31398857,22310300,23762337,23898864
>> ,31468714,35905264,22964991,34977780,23981415,34086225,18533997,20970539,36309
>> 423,35592293,34627502,34556982,20344381,22581342,24149063,21115057,18324307,24
>> 149976,34364440,22852191,26729295,24156491,30146176,25233521,24137458,18658617
>> ,20031848,23592811,28739571,22911610,24653336,23897891,36429335,34065389,30979
>> 108,23899934,26567573,31833292,30614176,19376615,23443715,25900612,29151065,28
>> 275928,23007106,22321767,24022410,22788405,24151190,33053085,31461732,21550639
>> ,23064351,35718176,18334131,25896424,25895470,30487264,35496538,23963467,23082
>> 498,25896207,34721350,34882332,20666597,20347653,20964694,20275437,35897661,26
>> 533840,21370717,22905520,24138259,31040363,20278737,33027098,36262271,35231002
>> ,35069477,26561711,36218689,21105103,30829846,17904452,34279715,32198168,33904
>> 594,23603843,34878360,19974511,32564965,18736655,21371987,18660517,20973167,27
>> 097482,17895833,34492553,27110047,26671824,34898020,32377598,27264691,21427981
>> ,26959413,21501718,32218445,17655792,18331319,23069153,28530882,18675154,28915
>> 760,19378258,20047265,20174573,27598791,22308366,19129943,17900526,20198057,33
>> 286883,20342601,28467416,23009143,34842410,17755095,27851708,23534855,32322790
>> ,18631669,27033351,27598649,34461527,19365213,21116246,32025757,23090294,24859
>> 755,18813904,32935349,28831422,20199284,34900449,18081744,28265087,30779657,31
>> 000504,36205587,18806087,31562669,21297572,27146592,36553262,35683330,23582715
>> ,30522820,31981157,33413398,23531572,32087503,35441994,27673784,17870610,21382
>> 803,26121391,20962928,33125334,35996135,35871416,23221464,30678650,32711188,28
>> 149176,21102171,23533202,23007464,33880598,19326603,35751186,29916885,34763624
>> ,34365646,19963648,34892867,20343034,21365680,34719288,22534266,26589376,35882
>> 477,25587759,24305232,20665496,31419621,27482779,24866518,28202665,21260309,32
>> 084915,30974734,18335263,35514300,20199543,35465863,32589495,36319831,23598150
>> ,25764795,25381094,27163555,23104933,23579372,26531003,30138094,17751698,30978
>> 302,25393289,26520096,23574939,31322147,36508494,18671149,29571928,25890463,29
>> 744305,35546126,24019373,17689433,20661135,23006868,32607187,22973930,18678832
>> ,30403976,28333480,35807243,27415143,33960711,22782900,18541332,20425046,24654
>> 874,17721662,21406451,29689854,22156352,24020010,35363836,21429188,34246801,21
>> 505284,25895916,31743398,33629236,19311933,21599246,24968818,18337893,19353042
>> ,22900227,18676228,21428363,19316598,21071674,18683030,24025215,35411753,20172
>> 195,32394479,21548479,18551117,28019179,35306575,19702507,24594140,18096626,34
>> 143690,21602655,22575690,33709127,18819222,24303603,28747517,17686451,33342628
>> ,22977835,36357312,35560000,32442410,36380454,34963133,18079207,34722154,20012
>> 538,24138790,33505818,26515948,25394188,22900856,24158381,31344616,35901220,28
>> 987586,23897783,23591458,35264064,34755602,31593455,17891796,18945199,34517930
>> ,18957145,26904207,18678315,24024217,34954265,22589400,19374963,28768818,25236
>> 460,24904205,24154249,36450664,35148431,31428693,23946371,35178635,18651152,17
>> 815735,17680244,25731040,29121836,20962503,32305082,32466959,18633248,35715906
>> ,25231849,30347203,23901576,20033272,23898477,28785799,20417633,28645122,17933
>> 696,21427618,32103261,32377700,35504568,30035494,31662171,24961347,35804490,23
>> 212855,22787719,19131460,21426100,32802561,24648073,17864541,28636969,35761595
>> ,17905945,36332906,35545612,23580203,30306025,36425940,26086089,34191527,31744
>> 994,18647121,27333696,31969521,18330550,21601734,18960987,36178532,20969535,32
>> 900463,21621487,35512806,20192561,25766287,20419657,21549752,24024567,18080811
>> ,22309091,19665821,24014102,18096050,20659499,33809917,22781818,23898618,31043
>> 360,35053268,28467420,18681171,21371906,17935024,36356831,27674670,27956396,21
>> 117882,18529033,23091963,25731305,34481619,18077629,23568510,30809069,31371555
>> ,31843301,31507623,19654925,23085341,35018530,17812099,33412788,31785064,22973
>> 169,22309011,24619497,20174824,24624883,27653056,19963025,27343180,26116855,29
>> 593796,20198237,20339969,36606086,27899744,21379922,27061397,33684933,20427580
>> ,24157232,25229770,27549582,21432569,26816015,22905036,29797519,30069953,31207
>> 420,24594519,26884433,32695611,33243276,21548674,28012185,19359541,34105431,36
>> 074161,20262905,36136480,35619621,31400701,31245520,32935449,35132877,28371129
>> ,30028854,30602465,27178970,18672952,30402066,18553245,30165719,24657485,18551
>> 886,31096281,23010537,29206862,27702326,34411273,23096154,21429576,31315294,29
>> 162132,29779574,30258692,24576275,29797135,31906909,21391072,32518478,24132614
>> ,27170932,36564178,21107848,31705355,30307587,31551092,32156644,32989453,32294
>> 915,19373132,21361408,30902038,27689144,35721861,20040355,33134097,18525539,19
>> 334638,36296360,34823499,30675071,28985169,31050855,19328166,31523978,28312995
>> ,30916153,24596612,25888701,33589553,20016685,24022213,20190864,24965794,29147
>> 184,35237641,28210577,18505847,21299348,29269849,30826215,22310245,21542576,25
>> 398189,30600382,18670080,35717913,19140940,28343986,35385847,34477621,27062401
>> ,27401917,27528491,33678132,21363176,28432722,25575653,31477306,24623837,35267
>> 113,30219538,35126817,19353693,24909832,25234204,23570060,19363796,19331840,26
>> 568730,23007003,28385549,29030127,28998341,30902818,31234798,21259945,27404083
>> ,28641050,25757712,27769573,23929974,33535505,21370030,31762666,21542350,31185
>> 454,20020297,19371464,21319058,31961676,36255315,23105941,30361025,23094118,33
>> 766920,22924276,35354161,17910133,20340267,35652520,20022694,25238761,27464083
>> ,28575022,30583608,21376500,28299994,28147709,30246423,35929367,33967482,19336
>> 476,25240396,17936500,21108235,22160963,22566341,21111625,34776184,22330133,24
>> 971659,18661397,26521683,18528682,30664620,35882018,19642123,18646183,30870273
>> ,18076911,21408494,27898399,20265515,26450993,20854492,29528783,25239068,28775
>> 964,19375995,22306063,19974707,31067783,30656798,31893915,24849555,32351675,35
>> 835325,27164143,34744490,20662910,20198464,21506929,32643834,18913964,28702929
>> ,19137254,17667564,31004728,23586998,33991917,29938171,31506469,20975937,23087
>> 246,28916519,33591919,34515035,17814785,34594942,34592881,20196010,25382693,17
>> 931900,24651611,31231438,26567914,25892383,21297057,36313180,18577948,24024815
>> ,33085028,18957906,27828641,17694523,20038669,28713230,22140133,23071665,20273
>> 563,33852674,17618069,31417712,23212064,32069864,27193807,20341234,35900632,17
>> 745602,31329815,26746916,31999931,28650109,24967181,28437549,23007943,29090797
>> ,25235327,34204736,18629051,21105931,34976884,23008122,28447060,28910784,34053
>> 116,29630710,34160981,31510281,23082850,36067936,20177785,30706168,31973081,18
>> 504147,27859555,28623912,17930308,33461001,25235680,28831713,19363648,23608380
>> ,32728966,18078581,18626280,23901730,20029175,36617767,17653549,30815958,28896
>> 040,20177463,32554610,21115716,28313409,31237471,35972968,31713664,28393276,32
>> 636090,31549107,26735826,29371931,19133395,18751630,25396798,21319651,18672655
>> ,22308107,18545303,33542156,32912154,25785957,32624867,32522301,23307444,19140
>> 788,21420243,30693597,28002004,23109273,22305880,34624633,21296685,21602050,33
>> 225750,36573288,24307469,19355074,18575439,33502816,33076539,34515895,28686423
>> ,33648774,24596266,33555767,18657756,17699802,36531431,20013044,27939484,31543
>> 059,35852134,21503818,22907890,18529715,28406715,29870820,21109130,26936003,35
>> 027552,20423970,29373112,32679955,23101173,24136227,31208041,24929808,24131011
>> ,17650440,28567183,29083059,31428993,34050799,28899897,20663580,35863229,31871
>> 346,34298377,23071358,25897490,33570345,20775771,31962228,19636911,30900521,25
>> 418566,34671458,34503878,23598323,28819761,17906099,27220309,30923561,30350010
>> ,34308130,23063842,31238001,34478183,28361493,18754008,24599926,22970144,22909
>> 439,36433744,34434510,21317113,29185407,32496878,32478183,23057861,30307590,19
>> 350418,20973685,35657750,24877414,36386105,24014838,25145543,18079726,31684615
>> ,33534042,17896361,22967654,17662973,36440673,20660727,30769347,23587813,36465
>> 971,22902324,31699279,18672016,25149578,28319698,32809462,17693853,27191033,31
>> 254324,31466970,21113244,32634472,19959104,28142569,29483467,20660491,18674791
>> ,35451155,24624247,29558397,23571296,23060269,20197964,20271303,21505468,20276
>> 462,22268317,30378756,22921092,28797847,36438767,21400609,35208886,21545844,30
>> 225138,33253632,27086590,25157591,33939266,17898249,23563490,20276791,36594447
>> ,27350647,36416038,34722884,26084793,28768318,24008101,25712745,32212261,25577
>> 431,19315480,28664034,28821676,29195666,29043759,34764662,23818705,21502021,24
>> 150834,23533243,36171110)  AND
>>       pf.probe_id = p.probe_id
>> ORDER BY pf.seq_region_id, pf.seq_region_start, pf.probe_feature_id
>> 
>> +----+-------------+-------+-------+-------------------+---------+---------+--
>> ---------------------------------------+------+-----------------------------+
>> | id | select_type | table | type  | possible_keys     | key     | key_len |
>> ref                                     | rows | Extra                       |
>> +----+-------------+-------+-------+-------------------+---------+---------+--
>> ---------------------------------------+------+-----------------------------+
>> |  1 | SIMPLE      | pf    | range | PRIMARY,probe_idx | PRIMARY | 4       |
>> NULL                                    | 1263 | Using where; Using filesort |
>> |  1 | SIMPLE      | p     | ref   | PRIMARY           | PRIMARY | 4       |
>> mus_musculus_funcgen_62_37o.pf.probe_id |    1 |                             |
>> +----+-------------+-------+-------+-------------------+---------+---------+--
>> ---------------------------------------+------+-----------------------------+
>> 
>> 
>> On 3 Jul 2011, at 03:47, Alexander Pico wrote:
>> 
>>> Hello again,
>>> 
>>> I flipped the problem around and am now retrieving all gene IDs per
>>> probe/probeset. This turns out to be much faster: ~3 minutes per array for
>>> the mouse genome (versus 5 minutes to retrieve the probe_features for a
>>> single mouse gene, in some cases).
>>> 
>>> Here's a code example that others may find helpful. It's different from the
>>> other examples found here:
>>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics/scripts/exa
>>> mples/microarray_annotation_example.pl?revision=1.1&root=ensembl&view=markup
>>> 
>>> #fetch example array
>>> my $array_name = 'MOE430B';
>>> my $array = $array_adaptor->fetch_by_name_vendor($array_name);
>>> my $vendor = $array->vendor();
>>> 
>>> #get all probes/probesets
>>> my @plist = ();
>>> if ($vendor  =~ /^\'AFFY/i) {
>>>   @plist = @{$array->get_all_ProbeSets()};
>>> } else {
>>>   @plist = @{$array->get_all_Probes()};
>>> }
>>> 
>>> #get probe/probeset name
>>> foreach my $p (@plist) {
>>> my $p_name = '';
>>> if ($vendor  =~ /^\'AFFY/i) {
>>>    $p_name = $p->name();
>>> } else {
>>>    $p_name = $p->get_probename($array_name());
>>> }
>>> 
>>> #extract gene from dbentries
>>> my @dbeList = @{$p->get_all_DBEntries()};
>>> my $dbe_dbname = $dbe->dbname();
>>> my $gene = '';
>>> if ($dbe_dbname =~ /core_Transcript$/){
>>>    $gene = 
>>> $gene_adaptor->fetch_by_transcript_stable_id($dbe->primary_id());
>>> } else {
>>>   next;
>>> }
>>> 
>>> Works with both probes and probesets, and you end up with a Gene object.
>>> Not exactly straightforward, but it seems to work. Any advice on
>>> improvements?
>>> 
>>> It'd also be great if there was a method that returned an array of all the
>>> "Arrays", either by vendor or a complete list.  Without this method, I'm
>>> left with hardcoding a list of arrays to collect and updating each release,
>>> rather than just collecting everything you guys have put into the release.
>>> 
>>> - Alex
>>> 
>>> 
>>> On 7/2/11 3:00 PM, "Alexander Pico" <apico at gladstone.ucsf.edu> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm running into a number of transcripts that have so many probe_features
>>>> that my mysql service stalls while "Copying to tmp table".  Here is a
>>>> specific example:
>>>> 
>>>> my @probe_features = @{$probe_feature_adaptor->fetch_all_by_external_name('
>>>> ENSMUST00000102049')};
>>>> 
>>>> This one line of code can take 5 minutes to run. Multiply that by thousands
>>>> and you can see why it takes over a week to gather probe information for all
>>>> genes in the mouse genome.
>>>> 
>>>> I'm running my script and mysql on a powerful cluster and we've tried
>>>> cranking a few parameters to avoid the 'Copying to tmp table' step, but no
>>>> luck. Specifically, we tried increasing tmp_table_size and
>>>> max_heap_table_size to 4GB each.
>>>> 
>>>> Any tips on the mysql parameters you run at Ensembl?
>>>> 
>>>> Any alternative suggestions for how to get the probe/probeset IDs per gene
>>>> (like we used to be able to from core using get_all_DBEntries)?
>>>> 
>>>> - Alex
>>>> 
>>>> 
>>>> On 6/27/11 3:11 AM, "Nathan Johnson" <njohnson at ebi.ac.uk> wrote:
>>>> 
>>>>> Hi Alex
>>>>> 
>>>>> This code looks like is is trying to get the probe/set xref info from the
>>>>> core
>>>>> DB. These data were moved to the funcgen DB quite some time ago.
>>>>> 
>>>>> For some example on how to retrieve PorbeSet level annotation see this doc:
>>>>> 
>>>>> 
>>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics/scripts/exam
>>>>> 
>>> p
>>>>> les/microarray_annotation_example.pl?revision=1.3&root=ensembl&view=markup
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Nath
>>>>> 
>>>>> On 14 Jun 2011, at 15:32, Alex Kalderimis wrote:
>>>>> 
>>>>>> Dear Listizens,
>>>>>> 
>>>>>> In trying to debug why code for getting Affymetrix Probeset
>>>>>> information had stopped working, I added some debug statements and it
>>>>>> seems that the data is no longer modelled as we expected it to be. The
>>>>>> code is below:
>>>>>> 
>>>>>> 69  for my $slice (@slices) {
>>>>>> 70  my @genes = @{ $slice->get_all_Genes };
>>>>>> 71  $self->debug("Processing " . scalar(@genes) . " genes");
>>>>>> 72  my $processed_genes = 0;
>>>>>> 73  for my $gene (@genes) {
>>>>>> 74      my @transcripts = @{ $gene->get_all_Transcripts };
>>>>>> 75      for my $transcript (@transcripts) {
>>>>>> 76          my @xrefs = @{ $transcript->get_all_DBEntries };
>>>>>> 77          for my $xref (@xrefs) {
>>>>>> 78              $xref_types{$xref->dbname} = 1;
>>>>>> 79              if ( $xref->dbname eq $db_name ) {
>>>>>> 80                  my @probe_features = @{
>>>>>> $self->get_feature_adaptor->fetch_all_by_probeset( $xref->display_id ) };
>>>>>> 81                  for my $probe_feature (@probe_features) {
>>>>>> 82                      my $line = join("\t",
>>>>>> 83                          $gene->stable_id,
>>>>>> 84                          $transcript->stable_id,
>>>>>> 85                          $xref->display_id,
>>>>>> 86                          $probe_feature->seq_region_name,
>>>>>> 87                          $probe_feature->seq_region_start,
>>>>>> 88                          $probe_feature->seq_region_end);
>>>>>> 89                      $self->debug($line);
>>>>>> 90                      print $out $line, "\n";
>>>>>> 91                  }
>>>>>> 92              }
>>>>>> 93          }
>>>>>> 94      }
>>>>>> 95      $processed_genes++;
>>>>>> 96      if ($processed_genes % 100 == 0) {
>>>>>> 97          $self->debug("Processed $processed_genes genes, with the
>>>>>> following XREF types: " . join(", ", sort keys %xref_types));
>>>>>> 98      }
>>>>>> 99  }
>>>>>> 
>>>>>> The dbnames "AFFY_Drosophila_1" and "AFFY_Drosphila_2" (which are what I
>>>>>> am looking for) never appear. How can I better structure my code to
>>>>>> get the information I am after?
>>>>>> 
>>>>>> Alex.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> List admin (including subscribe/unsubscribe):
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>> 
>>>>> Nathan Johnson
>>>>> Senior Scientific Programmer
>>>>> Ensembl Regulation
>>>>> European Bioinformatics Institute
>>>>> Wellcome Trust Genome Campus
>>>>> Hinxton
>>>>> Cambridge CB10 1SD
>>>>> 
>>>>> http://www.ensembl.info/
>>>>> http://twitter.com/#!/ensembl
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> List admin (including subscribe/unsubscribe):
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> List admin (including subscribe/unsubscribe):
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>> 
>> Nathan Johnson
>> Senior Scientific Programmer
>> Ensembl Regulation
>> European Bioinformatics Institute
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge CB10 1SD
>> 
>> http://www.ensembl.info/
>> http://twitter.com/#!/ensembl
>> 
>> 
>> 
>> 
>> 
>> 
> 

Nathan Johnson
Senior Scientific Programmer
Ensembl Regulation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD

http://www.ensembl.info/
http://twitter.com/#!/ensembl










More information about the Dev mailing list