[ensembl-dev] [ Compara ] proposal: prioritise hcoffee_himem over hcoffee and hcoffee_short

Wojciech Bazant wojtek.bazant at sanger.ac.uk
Thu Feb 15 10:58:14 GMT 2018



Hi,

I am currently running Compara for a release of Wormbase ParaSite, corresponding to Ensembl 89.

The pipeline is currently finishing to run mcoffee_himem, and some of the runs take very long - my pipeline currently stuck on finishing two remaining jobs, and otherwise it isn't doing very much - everything it could have done is done.

I am thinking if the jobs that sometimes take very long (here,  mcoffee_himem) were given capacity first, the situation I am in now wouldn't have happened: the remaining long taking jobs would be taking their sweet time but meanwhile the capacity would go to shorter jobs.

I originally thought something's wrong with the long running mcoffee_himem jobs but I think it's just the nature of the problem they're given - they're keeping CPU 100% busy as they should. Also I'm not sure how to express in Hive the concept of "give resources to this job first if possible but let the other job run in parallel" but I think it's just a matter of reordering the jobs within a fan or something.

Do you think it's a good idea? Do you think it's a worthwhile enough idea for me to mess around with job orderings in future runs of WBPS Compara :) ? Would you want it as a default setting for Compara 92?

This is what the jobs usually take - rounded to nearest power of ten:
select count(*), pow(10, round(log(10, cpu_sec),0)) as bucket  from worker inner join worker_resource_usage using (worker_id) where (worker.resource_class_id=34) group by bucket order by bucket;
+----------+--------+
| count(*) | bucket |
+----------+--------+
|        3   |        0.1 |
|      644  |         1  |
|      148  |       10  |
|       17   |      100 |
|     255   |    1000 |
|    2274  |  10000  |
|       11  | 100000  |
+----------+--------+

These are the longest runs:
select cpu_sec from worker inner join worker_resource_usage using (worker_id) where (worker.resource_class_id=34) order by -cpu_sec limit 15;
+---------+
| cpu_sec |
+---------+
| 46893.5 |
| 45129.3 |
|   43254 |
| 41565.2 |
| 41472.6 |
| 38532.7 |
| 37886.8 |
| 37643.3 |
| 36617.7 |
| 36336.8 |
| 32764.8 |
|   31295 |
|   28305 |
| 28287.9 |
| 27565.6 |
+---------+

Wojtek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180215/c743893b/attachment.html>


More information about the Dev mailing list