Skip to content

Commit aa6fde9

Browse files
committed
workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000
wq_cpu_intensive_thresh_us is used to detect CPU-hogging per-cpu work items. Once detected, they're excluded from concurrency management to prevent them from blocking other per-cpu work items. If CONFIG_WQ_CPU_INTENSIVE_REPORT is enabled, repeat offenders are also reported so that the code can be updated. The default threshold is 10ms which is long enough to do fair bit of work on modern CPUs while short enough to be usually not noticeable. This unfortunately leads to a lot of, arguable spurious, detections on very slow CPUs. Using the same threshold across CPUs whose performance levels may be apart by multiple levels of magnitude doesn't make whole lot of sense. This patch scales up wq_cpu_intensive_thresh_us upto 1 second when BogoMIPS is below 4000. This is obviously very inaccurate but it doesn't have to be accurate to be useful. The mechanism is still useful when the threshold is fully scaled up and the benefits of reports are usually shared with everyone regardless of who's reporting, so as long as there are sufficient number of fast machines reporting, we don't lose much. Some (or is it all?) ARM CPUs systemtically report significantly lower BogoMIPS. While this doesn't break anything, given how widespread ARM CPUs are, it's at least a missed opportunity and it probably would be a good idea to teach workqueue about it. Signed-off-by: Tejun Heo <[email protected]> Reported-and-Tested-by: Geert Uytterhoeven <[email protected]>
1 parent b2ec116 commit aa6fde9

File tree

1 file changed

+42
-1
lines changed

1 file changed

+42
-1
lines changed

kernel/workqueue.c

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
#include <linux/sched/debug.h>
5353
#include <linux/nmi.h>
5454
#include <linux/kvm_para.h>
55+
#include <linux/delay.h>
5556

5657
#include "workqueue_internal.h"
5758

@@ -338,8 +339,10 @@ static cpumask_var_t *wq_numa_possible_cpumask;
338339
* Per-cpu work items which run for longer than the following threshold are
339340
* automatically considered CPU intensive and excluded from concurrency
340341
* management to prevent them from noticeably delaying other per-cpu work items.
342+
* ULONG_MAX indicates that the user hasn't overridden it with a boot parameter.
343+
* The actual value is initialized in wq_cpu_intensive_thresh_init().
341344
*/
342-
static unsigned long wq_cpu_intensive_thresh_us = 10000;
345+
static unsigned long wq_cpu_intensive_thresh_us = ULONG_MAX;
343346
module_param_named(cpu_intensive_thresh_us, wq_cpu_intensive_thresh_us, ulong, 0644);
344347

345348
static bool wq_disable_numa;
@@ -6513,6 +6516,42 @@ void __init workqueue_init_early(void)
65136516
!system_freezable_power_efficient_wq);
65146517
}
65156518

6519+
static void __init wq_cpu_intensive_thresh_init(void)
6520+
{
6521+
unsigned long thresh;
6522+
unsigned long bogo;
6523+
6524+
/* if the user set it to a specific value, keep it */
6525+
if (wq_cpu_intensive_thresh_us != ULONG_MAX)
6526+
return;
6527+
6528+
/*
6529+
* The default of 10ms is derived from the fact that most modern (as of
6530+
* 2023) processors can do a lot in 10ms and that it's just below what
6531+
* most consider human-perceivable. However, the kernel also runs on a
6532+
* lot slower CPUs including microcontrollers where the threshold is way
6533+
* too low.
6534+
*
6535+
* Let's scale up the threshold upto 1 second if BogoMips is below 4000.
6536+
* This is by no means accurate but it doesn't have to be. The mechanism
6537+
* is still useful even when the threshold is fully scaled up. Also, as
6538+
* the reports would usually be applicable to everyone, some machines
6539+
* operating on longer thresholds won't significantly diminish their
6540+
* usefulness.
6541+
*/
6542+
thresh = 10 * USEC_PER_MSEC;
6543+
6544+
/* see init/calibrate.c for lpj -> BogoMIPS calculation */
6545+
bogo = max_t(unsigned long, loops_per_jiffy / 500000 * HZ, 1);
6546+
if (bogo < 4000)
6547+
thresh = min_t(unsigned long, thresh * 4000 / bogo, USEC_PER_SEC);
6548+
6549+
pr_debug("wq_cpu_intensive_thresh: lpj=%lu BogoMIPS=%lu thresh_us=%lu\n",
6550+
loops_per_jiffy, bogo, thresh);
6551+
6552+
wq_cpu_intensive_thresh_us = thresh;
6553+
}
6554+
65166555
/**
65176556
* workqueue_init - bring workqueue subsystem fully online
65186557
*
@@ -6528,6 +6567,8 @@ void __init workqueue_init(void)
65286567
struct worker_pool *pool;
65296568
int cpu, bkt;
65306569

6570+
wq_cpu_intensive_thresh_init();
6571+
65316572
/*
65326573
* It'd be simpler to initialize NUMA in workqueue_init_early() but
65336574
* CPU to node mapping may not be available that early on some

0 commit comments

Comments
 (0)