[ensembl-dev] Affymetrix Probesets

Nathan Johnson njohnson at ebi.ac.uk
Mon Jul 4 15:35:25 BST 2011


Hi Alexander

I have been looking at your original fetch_all_by_external_name example. ENSMUST00000102049 is a particularly popular gene in terms of probe mappings, I count 2471 in all. But for me, these return in under 10 seconds. Which considering the number of objects being created, is certainly in the right ball park, and no where near 5 mins.

This method effectively performs two queries. 1 to get the probe feature IDs from the xref schema, given the stable ID. And a second to fetch all the ProbeFeatures given the list of internal DB ids, in this case ~2500 ids long.  It's a fairly simple join to retrieve the probe features, but we do return by a default order, which will trigger a filesort for this query at present. (Currently choosing to prioritise probe based queries for sorting using the index).

Considering you were triggering a tmp table, and the filesort is fairly quick for me, I guess this is not your problem.  

Can you do an explain your mysql server and post the results? Have pasted the query below. I've also pasted all out buffer sizes, in case this gives you any clues.  

I doubt this is the case, but it maybe that your key_buffer_size is too small to handle all the key blocks from the query, hence triggering a tmp table. You would probably be seeing general slowness with all large queries if this was the case.  

Also, have you tried optimising the tables?

It's good to see you've found a work around. However, the problem with working like this is that you will be getting many redundant xrefs(genes), for each of the Affy arrays, hence the original suggestion to start with the Gene/Transcript stable IDs.  I'm not quite sure I understand you request about the list of Arrays. Where exactly do you want this functionality?


Nath

mysql> show variables like "%buffer%";
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| bulk_insert_buffer_size | 8388608    | 
| innodb_buffer_pool_size | 4294967296 | 
| innodb_log_buffer_size  | 8388608    | 
| join_buffer_size        | 131072     | 
| key_buffer_size         | 4193255424 | 
| myisam_sort_buffer_size | 8388608    | 
| net_buffer_length       | 16384      | 
| preload_buffer_size     | 32768      | 
| read_buffer_size        | 262144     | 
| read_rnd_buffer_size    | 262144     | 
| sort_buffer_size        | 2097152    | 
| sql_buffer_result       | OFF        | 
+-------------------------+------------+

mysql> show variables like "%heap%";
+---------------------+----------+
| Variable_name       | Value    |
+---------------------+----------+
| max_heap_table_size | 16777216 | 
+---------------------+----------+


explain SELECT  pf.probe_feature_id, pf.seq_region_id, pf.seq_region_start, pf.seq_region_end, pf.seq_region_strand, pf.probe_id, pf.analysis_id, pf.mismatches, pf.cigar_line, p.name, p.probe_set_id
FROM  (probe_feature pf, probe p)
 WHERE pf.probe_feature_id  IN (24011933,31639410,19145471,35203641,17815601,30037965,22305816,23610466,21378222,26080812,34380111,33167220,28996153,33442533,19346150,19134061,28702753,26494030,20336756,29071313,23897509,19969266,23008789,32197650,36395239,30204881,24585842,32269563,27487159,22923378,28001558,19639979,23534197,20037959,36245940,34512206,30104507,29108586,23902516,21424238,28811060,23573354,29263042,28973380,21604078,33546702,32881972,29726616,28571494,18078956,20703325,22266734,20658466,18674862,34432014,28111887,33097500,31450834,20173612,30894526,23007732,22147758,25709984,22165564,21620519,26953745,32872914,29329303,21263505,24651190,22897875,20848891,34234476,22936219,28114340,24969629,31243578,28462298,28950321,34079884,32839648,32764720,30807771,30477257,27433894,26714299,19977103,27289280,35128819,21362153,21600381,18506280,21196049,29551627,18682058,27455633,30618421,27123094,27275877,22913832,35823767,20658865,18630657,24571105,28538169,24022713,22259830,27113418,28114857,22854180,23011160,17929941,30602298,34702984,21386104,17728662,25038178,21397353,30126926,33823200,17933929,26560878,26114508,20662249,21431505,23006655,23900708,23569039,20176107,27262897,35027194,21383478,19961800,21549260,18327642,18099135,22577238,20419236,22533188,21550317,20054254,32835967,34974786,27608265,33424554,25899176,20276120,25385115,20750593,29890643,34857538,33563252,28723578,27902776,23599445,28426935,27384085,35849971,33493863,24007760,17814444,22857100,22309707,29989949,23212499,19643243,19335625,31715992,25895474,17909788,21599946,18745809,30451719,21196830,26138128,23937759,26526475,18730137,23606403,21125851,25228787,21605862,31974620,25386382,24314592,21550723,24025499,25396061,22915668,30811835,35123376,26924902,34344495,29333028,25230319,33138818,17897928,34559749,33669876,24152427,32852599,34942587,29264817,36614691,17898494,32040352,31011108,31762166,26117709,24584031,29183782,24135522,23585812,19129339,32755466,25431300,33041964,19369364,20441407,34377973,32065162,34068475,19314411,23590335,21604454,36450563,27146233,34462615,23111656,20030686,31239156,31443940,25430602,17887695,31856797,35007445,24574878,27961786,25397509,31475459,17812582,18755857,30446114,20027432,26134959,21432307,33870530,31259975,29669704,20660828,26123628,29475901,32383134,31472780,17895275,18738236,17680190,29552221,30073102,18821428,32885413,27720017,17890765,23535943,32280364,25776709,24647342,31831571,23602731,19972130,30539903,17732292,29960883,27794948,32882003,24150248,21506431,34716124,21427824,24599663,21430976,19662980,24149670,27706769,21201230,23576980,30825832,24621301,20262457,32669480,29262042,21123300,25154443,30569956,28373190,18077166,25158705,21425973,31052820,34927911,36335525,27692150,24595666,23532226,23576999,26534521,32347493,17681958,18575246,23070620,17903391,22934064,21508998,20849925,32956945,26074173,18331961,20666339,29301343,30384808,28605103,20853949,29404401,28095220,36454544,21506019,17917425,18079578,36402698,20017171,33349513,18742206,34796403,20172245,20269282,25759588,31016758,27760370,25896865,24597898,35575177,26121706,19330311,34097379,28233893,26970666,34345458,30746856,21603929,28734928,27854378,31502918,26997597,35234094,29919031,31394098,21316298,32030643,33972479,31515598,19978929,27621369,33131800,30387944,36532949,21607438,18914961,31311830,30858601,30814085,34523280,24136675,35968270,30174240,31619742,32072368,25238090,28585928,34603687,20170327,33462142,21425312,17877889,35289813,34175975,28973419,20177671,35559001,23072483,29779750,20039442,21604259,19363102,23095116,23006108,29988821,18329827,34904806,21288852,18080061,30903001,20268027,32379775,27656242,27914159,33878795,21507583,17899919,34653714,28849493,26569972,32810204,34883109,23900790,19135975,24598080,24477512,34732932,29744822,18299892,27476013,24968011,35007119,17921386,18535896,20666503,17937017,21202398,35387506,18646596,35494820,26749283,18077266,35775908,29701072,19721792,25387497,26872116,20341703,33471852,20034315,20176971,33388597,28663991,19977866,24967598,20030285,34805889,36155838,32594938,34839613,25898347,20038638,21547778,21107352,34593091,36394745,33933979,24616528,28504678,32745303,25151100,21320552,17812783,20173557,18740999,25898717,36587923,22587055,17907760,24622710,21546811,18654498,28603749,30775407,22776687,17906940,29397431,28759512,19631185,27548375,18537027,24577287,20195052,36273004,34452062,33463481,34083589,20176606,17931181,17905532,29279698,24196495,33309497,32729723,20196618,34138058,23090895,29674966,32790755,23968678,31805145,20771854,33854012,17709965,34507492,17746257,34135872,30607168,23900307,23593938,19141680,19313305,26810921,21127690,23955464,18670273,35737361,23096607,19135244,21395642,35092497,29294000,33311697,26936788,30675097,21388821,35459136,21429535,18823648,17930238,32021257,22910592,29397631,23533894,21428127,28651921,31607673,30871815,32715291,23901030,34710242,33114983,32855926,30706381,20665691,24306327,17678138,20279008,31209359,33081392,20430028,32571306,30354942,23213727,28573716,20662577,26947200,23100281,30806669,25398612,21598994,26082612,30394565,17750931,35311615,34288777,19691040,18080469,19370211,36565189,23562214,25890315,27364940,31518565,17767916,26534191,23094727,18526295,17932707,26080983,36000904,30496314,33150414,33747715,35693162,26530092,29578590,27580000,19683752,28718624,30261122,33255673,31876254,28598476,31233597,33840622,30794846,21543160,18742558,32571346,17606920,19372046,21605551,18322974,35641229,18078343,32168839,36497990,32438183,21318471,26123940,18910519,30915502,28867271,23531997,32354962,31939542,31802464,33011813,29081410,23069703,28818800,25711596,34192532,26562680,32769512,33097147,23211903,31023390,29245411,18328341,20016536,27218987,20976639,32276526,26588638,33666579,20657784,25897157,23108681,35729820,32572428,17813598,21196427,36117626,34815893,28171113,24594806,19137040,30811309,25577719,28670391,20434200,20199877,31634403,35604289,36196748,31181693,35509027,20971557,26568864,24971221,20267364,27860948,35731560,21430246,22978245,35342368,25392636,21607574,20442453,18572630,22562838,27940703,23082403,17758124,35860161,32634549,21607048,34023518,32677282,21596918,27782822,21623004,33062545,22308295,29626854,36274933,34659284,33377652,35583483,20176762,17929806,20411505,29276022,27354075,30283278,21550140,28204781,28155664,29548708,17751911,29679619,24149227,34198397,21112370,32760805,25146331,27965327,28079246,29848413,27856606,21367689,28939388,24014461,29789865,25387959,18079334,18677076,18499041,30631989,18741419,19134800,32330434,34290012,18625119,24580736,24897407,24015380,30999526,34395195,21430435,24621563,35531308,31322848,31569886,23569409,32808391,22328732,18333529,19370032,20660418,20658584,34269981,29888533,17728706,18653752,20410662,19365014,26136275,18734116,24598638,22263794,28087899,30885178,28051268,17935181,35378800,28099233,32603621,17618651,25889508,19709551,23574264,30880911,21295372,25159113,28986874,35222916,33987749,19128524,20662533,25772045,17739786,21321239,22925271,24573015,34883857,36333143,19959639,35187404,26498913,36014401,18335592,27014206,21431863,24154827,28794228,25387052,35251999,29126905,17700349,20171798,19375528,31455911,35212477,21622066,32780431,18576127,29649303,21605682,33081462,31697957,21203092,17930434,30612147,26724920,20170878,20196518,33120674,32928352,21122116,30592518,20017606,20033614,21432692,20024902,18080722,18538818,33331331,25768577,20667402,19376376,35963109,30835065,34514676,21319202,20029350,36111601,36237411,22309503,34771734,32984571,26563293,32889166,22916156,20666912,31170809,28705374,26490227,25232531,29065052,29461495,34988658,31047748,27997168,34763462,23006428,23584276,32214583,22265320,34780145,17935298,18502692,35094400,30947656,23007220,19648666,18326073,23571101,28803149,33748112,31711771,32278779,25233067,20660030,23571801,19137179,17622124,19333242,29285103,23059455,34089640,25785248,29492763,25148068,20174399,20776803,18324215,22535296,23958525,28234329,35305339,22773025,32826060,28825762,19968115,34521327,20663937,21203567,18507537,20174535,18649773,34265479,34649540,21386962,24024024,32589731,26070944,32663295,20049682,22780174,33649804,30840162,27989292,21290787,27840832,34478140,24020582,36452214,34443814,28136506,25893747,24863629,19969615,36472189,26117122,17811531,18321669,34022546,33375932,28476697,29673250,25585525,24468088,35954341,22308792,22857564,30986749,35643553,19628430,36598965,32210453,18681418,26701806,17884566,24615703,36152760,27632502,26785966,26691036,24651907,34756815,20433572,26966560,30796057,34814098,27569636,20698350,32446338,21367336,23605062,29711630,29982622,21285171,33846829,19157241,24602917,20667995,22912477,36526573,36156867,28584154,34721465,24157864,35685935,34168612,33276918,24649528,18554064,19635359,20659800,31662554,21420515,22168930,35763204,31093579,36124925,30271245,27553257,26665398,30782679,25760583,29252225,27897265,31648459,26565700,20269626,22588807,34287876,19373628,18525958,33137369,32994770,22573911,35626153,18099568,23214556,28026802,24149558,34155534,19139576,26509204,19955970,27120106,18571983,32106916,24446734,26717324,18960158,21123614,25725849,32916203,22306556,17648325,21385722,32892171,28163753,18957658,36510161,25898050,17711229,36063248,31224821,29038326,35531727,21369258,31690368,19969988,17929412,27882345,20688733,27053897,24317875,24325075,18570907,23533982,21322802,30400057,25394467,23578134,34481032,19718898,26751597,17916205,35768873,30158997,35479525,36476189,26569791,19124689,35306828,20348421,31301947,20264778,22305554,34356673,33438991,18959295,31778027,24012732,21426520,31188397,27960601,27317703,23564974,26704537,24654189,28473126,22938801,25892219,30083548,33428242,32852985,35612799,20662713,32222005,36160194,26568195,33630389,20041550,17752243,18300984,35542649,34022312,35453930,33192305,20659325,18098499,31106156,30146579,24134445,22863166,36553792,32812118,32518090,19659932,36103800,22862188,22155983,21601560,23005753,18748832,17862985,28494863,21503269,34264772,28541910,31602898,36346943,35326062,26455853,24010619,31398857,22310300,23762337,23898864,31468714,35905264,22964991,34977780,23981415,34086225,18533997,20970539,36309423,35592293,34627502,34556982,20344381,22581342,24149063,21115057,18324307,24149976,34364440,22852191,26729295,24156491,30146176,25233521,24137458,18658617,20031848,23592811,28739571,22911610,24653336,23897891,36429335,34065389,30979108,23899934,26567573,31833292,30614176,19376615,23443715,25900612,29151065,28275928,23007106,22321767,24022410,22788405,24151190,33053085,31461732,21550639,23064351,35718176,18334131,25896424,25895470,30487264,35496538,23963467,23082498,25896207,34721350,34882332,20666597,20347653,20964694,20275437,35897661,26533840,21370717,22905520,24138259,31040363,20278737,33027098,36262271,35231002,35069477,26561711,36218689,21105103,30829846,17904452,34279715,32198168,33904594,23603843,34878360,19974511,32564965,18736655,21371987,18660517,20973167,27097482,17895833,34492553,27110047,26671824,34898020,32377598,27264691,21427981,26959413,21501718,32218445,17655792,18331319,23069153,28530882,18675154,28915760,19378258,20047265,20174573,27598791,22308366,19129943,17900526,20198057,33286883,20342601,28467416,23009143,34842410,17755095,27851708,23534855,32322790,18631669,27033351,27598649,34461527,19365213,21116246,32025757,23090294,24859755,18813904,32935349,28831422,20199284,34900449,18081744,28265087,30779657,31000504,36205587,18806087,31562669,21297572,27146592,36553262,35683330,23582715,30522820,31981157,33413398,23531572,32087503,35441994,27673784,17870610,21382803,26121391,20962928,33125334,35996135,35871416,23221464,30678650,32711188,28149176,21102171,23533202,23007464,33880598,19326603,35751186,29916885,34763624,34365646,19963648,34892867,20343034,21365680,34719288,22534266,26589376,35882477,25587759,24305232,20665496,31419621,27482779,24866518,28202665,21260309,32084915,30974734,18335263,35514300,20199543,35465863,32589495,36319831,23598150,25764795,25381094,27163555,23104933,23579372,26531003,30138094,17751698,30978302,25393289,26520096,23574939,31322147,36508494,18671149,29571928,25890463,29744305,35546126,24019373,17689433,20661135,23006868,32607187,22973930,18678832,30403976,28333480,35807243,27415143,33960711,22782900,18541332,20425046,24654874,17721662,21406451,29689854,22156352,24020010,35363836,21429188,34246801,21505284,25895916,31743398,33629236,19311933,21599246,24968818,18337893,19353042,22900227,18676228,21428363,19316598,21071674,18683030,24025215,35411753,20172195,32394479,21548479,18551117,28019179,35306575,19702507,24594140,18096626,34143690,21602655,22575690,33709127,18819222,24303603,28747517,17686451,33342628,22977835,36357312,35560000,32442410,36380454,34963133,18079207,34722154,20012538,24138790,33505818,26515948,25394188,22900856,24158381,31344616,35901220,28987586,23897783,23591458,35264064,34755602,31593455,17891796,18945199,34517930,18957145,26904207,18678315,24024217,34954265,22589400,19374963,28768818,25236460,24904205,24154249,36450664,35148431,31428693,23946371,35178635,18651152,17815735,17680244,25731040,29121836,20962503,32305082,32466959,18633248,35715906,25231849,30347203,23901576,20033272,23898477,28785799,20417633,28645122,17933696,21427618,32103261,32377700,35504568,30035494,31662171,24961347,35804490,23212855,22787719,19131460,21426100,32802561,24648073,17864541,28636969,35761595,17905945,36332906,35545612,23580203,30306025,36425940,26086089,34191527,31744994,18647121,27333696,31969521,18330550,21601734,18960987,36178532,20969535,32900463,21621487,35512806,20192561,25766287,20419657,21549752,24024567,18080811,22309091,19665821,24014102,18096050,20659499,33809917,22781818,23898618,31043360,35053268,28467420,18681171,21371906,17935024,36356831,27674670,27956396,21117882,18529033,23091963,25731305,34481619,18077629,23568510,30809069,31371555,31843301,31507623,19654925,23085341,35018530,17812099,33412788,31785064,22973169,22309011,24619497,20174824,24624883,27653056,19963025,27343180,26116855,29593796,20198237,20339969,36606086,27899744,21379922,27061397,33684933,20427580,24157232,25229770,27549582,21432569,26816015,22905036,29797519,30069953,31207420,24594519,26884433,32695611,33243276,21548674,28012185,19359541,34105431,36074161,20262905,36136480,35619621,31400701,31245520,32935449,35132877,28371129,30028854,30602465,27178970,18672952,30402066,18553245,30165719,24657485,18551886,31096281,23010537,29206862,27702326,34411273,23096154,21429576,31315294,29162132,29779574,30258692,24576275,29797135,31906909,21391072,32518478,24132614,27170932,36564178,21107848,31705355,30307587,31551092,32156644,32989453,32294915,19373132,21361408,30902038,27689144,35721861,20040355,33134097,18525539,19334638,36296360,34823499,30675071,28985169,31050855,19328166,31523978,28312995,30916153,24596612,25888701,33589553,20016685,24022213,20190864,24965794,29147184,35237641,28210577,18505847,21299348,29269849,30826215,22310245,21542576,25398189,30600382,18670080,35717913,19140940,28343986,35385847,34477621,27062401,27401917,27528491,33678132,21363176,28432722,25575653,31477306,24623837,35267113,30219538,35126817,19353693,24909832,25234204,23570060,19363796,19331840,26568730,23007003,28385549,29030127,28998341,30902818,31234798,21259945,27404083,28641050,25757712,27769573,23929974,33535505,21370030,31762666,21542350,31185454,20020297,19371464,21319058,31961676,36255315,23105941,30361025,23094118,33766920,22924276,35354161,17910133,20340267,35652520,20022694,25238761,27464083,28575022,30583608,21376500,28299994,28147709,30246423,35929367,33967482,19336476,25240396,17936500,21108235,22160963,22566341,21111625,34776184,22330133,24971659,18661397,26521683,18528682,30664620,35882018,19642123,18646183,30870273,18076911,21408494,27898399,20265515,26450993,20854492,29528783,25239068,28775964,19375995,22306063,19974707,31067783,30656798,31893915,24849555,32351675,35835325,27164143,34744490,20662910,20198464,21506929,32643834,18913964,28702929,19137254,17667564,31004728,23586998,33991917,29938171,31506469,20975937,23087246,28916519,33591919,34515035,17814785,34594942,34592881,20196010,25382693,17931900,24651611,31231438,26567914,25892383,21297057,36313180,18577948,24024815,33085028,18957906,27828641,17694523,20038669,28713230,22140133,23071665,20273563,33852674,17618069,31417712,23212064,32069864,27193807,20341234,35900632,17745602,31329815,26746916,31999931,28650109,24967181,28437549,23007943,29090797,25235327,34204736,18629051,21105931,34976884,23008122,28447060,28910784,34053116,29630710,34160981,31510281,23082850,36067936,20177785,30706168,31973081,18504147,27859555,28623912,17930308,33461001,25235680,28831713,19363648,23608380,32728966,18078581,18626280,23901730,20029175,36617767,17653549,30815958,28896040,20177463,32554610,21115716,28313409,31237471,35972968,31713664,28393276,32636090,31549107,26735826,29371931,19133395,18751630,25396798,21319651,18672655,22308107,18545303,33542156,32912154,25785957,32624867,32522301,23307444,19140788,21420243,30693597,28002004,23109273,22305880,34624633,21296685,21602050,33225750,36573288,24307469,19355074,18575439,33502816,33076539,34515895,28686423,33648774,24596266,33555767,18657756,17699802,36531431,20013044,27939484,31543059,35852134,21503818,22907890,18529715,28406715,29870820,21109130,26936003,35027552,20423970,29373112,32679955,23101173,24136227,31208041,24929808,24131011,17650440,28567183,29083059,31428993,34050799,28899897,20663580,35863229,31871346,34298377,23071358,25897490,33570345,20775771,31962228,19636911,30900521,25418566,34671458,34503878,23598323,28819761,17906099,27220309,30923561,30350010,34308130,23063842,31238001,34478183,28361493,18754008,24599926,22970144,22909439,36433744,34434510,21317113,29185407,32496878,32478183,23057861,30307590,19350418,20973685,35657750,24877414,36386105,24014838,25145543,18079726,31684615,33534042,17896361,22967654,17662973,36440673,20660727,30769347,23587813,36465971,22902324,31699279,18672016,25149578,28319698,32809462,17693853,27191033,31254324,31466970,21113244,32634472,19959104,28142569,29483467,20660491,18674791,35451155,24624247,29558397,23571296,23060269,20197964,20271303,21505468,20276462,22268317,30378756,22921092,28797847,36438767,21400609,35208886,21545844,30225138,33253632,27086590,25157591,33939266,17898249,23563490,20276791,36594447,27350647,36416038,34722884,26084793,28768318,24008101,25712745,32212261,25577431,19315480,28664034,28821676,29195666,29043759,34764662,23818705,21502021,24150834,23533243,36171110)  AND
       pf.probe_id = p.probe_id
 ORDER BY pf.seq_region_id, pf.seq_region_start, pf.probe_feature_id

+----+-------------+-------+-------+-------------------+---------+---------+-----------------------------------------+------+-----------------------------+
| id | select_type | table | type  | possible_keys     | key     | key_len | ref                                     | rows | Extra                       |
+----+-------------+-------+-------+-------------------+---------+---------+-----------------------------------------+------+-----------------------------+
|  1 | SIMPLE      | pf    | range | PRIMARY,probe_idx | PRIMARY | 4       | NULL                                    | 1263 | Using where; Using filesort | 
|  1 | SIMPLE      | p     | ref   | PRIMARY           | PRIMARY | 4       | mus_musculus_funcgen_62_37o.pf.probe_id |    1 |                             | 
+----+-------------+-------+-------+-------------------+---------+---------+-----------------------------------------+------+-----------------------------+


On 3 Jul 2011, at 03:47, Alexander Pico wrote:

> Hello again,
> 
> I flipped the problem around and am now retrieving all gene IDs per
> probe/probeset. This turns out to be much faster: ~3 minutes per array for
> the mouse genome (versus 5 minutes to retrieve the probe_features for a
> single mouse gene, in some cases).
> 
> Here's a code example that others may find helpful. It's different from the
> other examples found here:
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics/scripts/exa
> mples/microarray_annotation_example.pl?revision=1.1&root=ensembl&view=markup
> 
> #fetch example array
> my $array_name = 'MOE430B';
> my $array = $array_adaptor->fetch_by_name_vendor($array_name);
> my $vendor = $array->vendor();
> 
> #get all probes/probesets
> my @plist = ();
>  if ($vendor  =~ /^\'AFFY/i) {
>    @plist = @{$array->get_all_ProbeSets()};
>  } else {
>    @plist = @{$array->get_all_Probes()};
>  }
> 
> #get probe/probeset name
> foreach my $p (@plist) {
>  my $p_name = '';
>  if ($vendor  =~ /^\'AFFY/i) {
>     $p_name = $p->name();
>  } else {
>     $p_name = $p->get_probename($array_name());
>  }
> 
>  #extract gene from dbentries
>  my @dbeList = @{$p->get_all_DBEntries()};
>  my $dbe_dbname = $dbe->dbname();
>  my $gene = '';
>  if ($dbe_dbname =~ /core_Transcript$/){
>     $gene = 
> $gene_adaptor->fetch_by_transcript_stable_id($dbe->primary_id());
>  } else {
>    next;
>  }
> 
> Works with both probes and probesets, and you end up with a Gene object.
> Not exactly straightforward, but it seems to work. Any advice on
> improvements?
> 
> It'd also be great if there was a method that returned an array of all the
> "Arrays", either by vendor or a complete list.  Without this method, I'm
> left with hardcoding a list of arrays to collect and updating each release,
> rather than just collecting everything you guys have put into the release.
> 
> - Alex
> 
> 
> On 7/2/11 3:00 PM, "Alexander Pico" <apico at gladstone.ucsf.edu> wrote:
> 
>> Hi,
>> 
>> I'm running into a number of transcripts that have so many probe_features
>> that my mysql service stalls while "Copying to tmp table".  Here is a
>> specific example:
>> 
>> my @probe_features = @{$probe_feature_adaptor->fetch_all_by_external_name('
>> ENSMUST00000102049')};
>> 
>> This one line of code can take 5 minutes to run. Multiply that by thousands
>> and you can see why it takes over a week to gather probe information for all
>> genes in the mouse genome.
>> 
>> I'm running my script and mysql on a powerful cluster and we've tried
>> cranking a few parameters to avoid the 'Copying to tmp table' step, but no
>> luck. Specifically, we tried increasing tmp_table_size and
>> max_heap_table_size to 4GB each.
>> 
>> Any tips on the mysql parameters you run at Ensembl?
>> 
>> Any alternative suggestions for how to get the probe/probeset IDs per gene
>> (like we used to be able to from core using get_all_DBEntries)?
>> 
>> - Alex
>> 
>> 
>> On 6/27/11 3:11 AM, "Nathan Johnson" <njohnson at ebi.ac.uk> wrote:
>> 
>>> Hi Alex
>>> 
>>> This code looks like is is trying to get the probe/set xref info from the
>>> core
>>> DB. These data were moved to the funcgen DB quite some time ago.
>>> 
>>> For some example on how to retrieve PorbeSet level annotation see this doc:
>>> 
>>> 
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics/scripts/exam>>
> p
>>> les/microarray_annotation_example.pl?revision=1.3&root=ensembl&view=markup
>>> 
>>> Thanks
>>> 
>>> Nath
>>> 
>>> On 14 Jun 2011, at 15:32, Alex Kalderimis wrote:
>>> 
>>>> Dear Listizens,
>>>> 
>>>> In trying to debug why code for getting Affymetrix Probeset
>>>> information had stopped working, I added some debug statements and it
>>>> seems that the data is no longer modelled as we expected it to be. The
>>>> code is below:
>>>> 
>>>> 69  for my $slice (@slices) {
>>>> 70  my @genes = @{ $slice->get_all_Genes };
>>>> 71  $self->debug("Processing " . scalar(@genes) . " genes");
>>>> 72  my $processed_genes = 0;
>>>> 73  for my $gene (@genes) {
>>>> 74      my @transcripts = @{ $gene->get_all_Transcripts };
>>>> 75      for my $transcript (@transcripts) {
>>>> 76          my @xrefs = @{ $transcript->get_all_DBEntries };
>>>> 77          for my $xref (@xrefs) {
>>>> 78              $xref_types{$xref->dbname} = 1;
>>>> 79              if ( $xref->dbname eq $db_name ) {
>>>> 80                  my @probe_features = @{
>>>> $self->get_feature_adaptor->fetch_all_by_probeset( $xref->display_id ) };
>>>> 81                  for my $probe_feature (@probe_features) {
>>>> 82                      my $line = join("\t",
>>>> 83                          $gene->stable_id,
>>>> 84                          $transcript->stable_id,
>>>> 85                          $xref->display_id,
>>>> 86                          $probe_feature->seq_region_name,
>>>> 87                          $probe_feature->seq_region_start,
>>>> 88                          $probe_feature->seq_region_end);
>>>> 89                      $self->debug($line);
>>>> 90                      print $out $line, "\n";
>>>> 91                  }
>>>> 92              }
>>>> 93          }
>>>> 94      }
>>>> 95      $processed_genes++;
>>>> 96      if ($processed_genes % 100 == 0) {
>>>> 97          $self->debug("Processed $processed_genes genes, with the
>>>> following XREF types: " . join(", ", sort keys %xref_types));
>>>> 98      }
>>>> 99  }
>>>> 
>>>> The dbnames "AFFY_Drosophila_1" and "AFFY_Drosphila_2" (which are what I
>>>> am looking for) never appear. How can I better structure my code to
>>>> get the information I am after?
>>>> 
>>>> Alex.
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> List admin (including subscribe/unsubscribe):
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>>> Nathan Johnson
>>> Senior Scientific Programmer
>>> Ensembl Regulation
>>> European Bioinformatics Institute
>>> Wellcome Trust Genome Campus
>>> Hinxton
>>> Cambridge CB10 1SD
>>> 
>>> http://www.ensembl.info/
>>> http://twitter.com/#!/ensembl
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 

Nathan Johnson
Senior Scientific Programmer
Ensembl Regulation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD

http://www.ensembl.info/
http://twitter.com/#!/ensembl










More information about the Dev mailing list