Skip to content

Commit 68d124b

Browse files
committed
rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
If a CPU is running either a userspace application or a guest OS in nohz_full mode, it is possible for a system call to occur just as an RCU grace period is starting. If that CPU also has the scheduling-clock tick enabled for any reason (such as a second runnable task), and if the system was booted with rcutree.use_softirq=0, then RCU can add insult to injury by awakening that CPU's rcuc kthread, resulting in yet another task and yet more OS jitter due to switching to that task, running it, and switching back. In addition, in the common case where that system call is not of excessively long duration, awakening the rcuc task is pointless. This pointlessness is due to the fact that the CPU will enter an extended quiescent state upon returning to the userspace application or guest OS. In this case, the rcuc kthread cannot do anything that the main RCU grace-period kthread cannot do on its behalf, at least if it is given a few additional milliseconds (for example, given the time duration specified by rcutree.jiffies_till_first_fqs, give or take scheduling delays). This commit therefore adds a rcutree.nohz_full_patience_delay kernel boot parameter that specifies the grace period age (in milliseconds, rounded to jiffies) before which RCU will refrain from awakening the rcuc kthread. Preliminary experimentation suggests a value of 1000, that is, one second. Increasing rcutree.nohz_full_patience_delay will increase grace-period latency and in turn increase memory footprint, so systems with constrained memory might choose a smaller value. Systems with less-aggressive OS-jitter requirements might choose the default value of zero, which keeps the traditional immediate-wakeup behavior, thus avoiding increases in grace-period latency. [ paulmck: Apply Leonardo Bras feedback. ] Link: https://lore.kernel.org/all/[email protected]/ Reported-by: Leonardo Bras <[email protected]> Suggested-by: Leonardo Bras <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Leonardo Bras <[email protected]>
1 parent 4b56b0f commit 68d124b

File tree

3 files changed

+27
-2
lines changed

3 files changed

+27
-2
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5018,6 +5018,14 @@
50185018
the ->nocb_bypass queue. The definition of "too
50195019
many" is supplied by this kernel boot parameter.
50205020

5021+
rcutree.nohz_full_patience_delay= [KNL]
5022+
On callback-offloaded (rcu_nocbs) CPUs, avoid
5023+
disturbing RCU unless the grace period has
5024+
reached the specified age in milliseconds.
5025+
Defaults to zero. Large values will be capped
5026+
at five seconds. All values will be rounded down
5027+
to the nearest value representable by jiffies.
5028+
50215029
rcutree.qhimark= [KNL]
50225030
Set threshold of queued RCU callbacks beyond which
50235031
batch limiting is disabled.

kernel/rcu/tree.c

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,9 @@ static int gp_init_delay;
176176
module_param(gp_init_delay, int, 0444);
177177
static int gp_cleanup_delay;
178178
module_param(gp_cleanup_delay, int, 0444);
179+
static int nohz_full_patience_delay;
180+
module_param(nohz_full_patience_delay, int, 0444);
181+
static int nohz_full_patience_delay_jiffies;
179182

180183
// Add delay to rcu_read_unlock() for strict grace periods.
181184
static int rcu_unlock_delay;
@@ -4335,11 +4338,15 @@ static int rcu_pending(int user)
43354338
return 1;
43364339

43374340
/* Is this a nohz_full CPU in userspace or idle? (Ignore RCU if so.) */
4338-
if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu())
4341+
gp_in_progress = rcu_gp_in_progress();
4342+
if ((user || rcu_is_cpu_rrupt_from_idle() ||
4343+
(gp_in_progress &&
4344+
time_before(jiffies, READ_ONCE(rcu_state.gp_start) +
4345+
nohz_full_patience_delay_jiffies))) &&
4346+
rcu_nohz_full_cpu())
43394347
return 0;
43404348

43414349
/* Is the RCU core waiting for a quiescent state from this CPU? */
4342-
gp_in_progress = rcu_gp_in_progress();
43434350
if (rdp->core_needs_qs && !rdp->cpu_no_qs.b.norm && gp_in_progress)
43444351
return 1;
43454352

kernel/rcu/tree_plugin.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,16 @@ static void __init rcu_bootup_announce_oddness(void)
9393
pr_info("\tRCU debug GP init slowdown %d jiffies.\n", gp_init_delay);
9494
if (gp_cleanup_delay)
9595
pr_info("\tRCU debug GP cleanup slowdown %d jiffies.\n", gp_cleanup_delay);
96+
if (nohz_full_patience_delay < 0) {
97+
pr_info("\tRCU NOCB CPU patience negative (%d), resetting to zero.\n", nohz_full_patience_delay);
98+
nohz_full_patience_delay = 0;
99+
} else if (nohz_full_patience_delay > 5 * MSEC_PER_SEC) {
100+
pr_info("\tRCU NOCB CPU patience too large (%d), resetting to %ld.\n", nohz_full_patience_delay, 5 * MSEC_PER_SEC);
101+
nohz_full_patience_delay = 5 * MSEC_PER_SEC;
102+
} else if (nohz_full_patience_delay) {
103+
pr_info("\tRCU NOCB CPU patience set to %d milliseconds.\n", nohz_full_patience_delay);
104+
}
105+
nohz_full_patience_delay_jiffies = msecs_to_jiffies(nohz_full_patience_delay);
96106
if (!use_softirq)
97107
pr_info("\tRCU_SOFTIRQ processing moved to rcuc kthreads.\n");
98108
if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG))

0 commit comments

Comments
 (0)