Skip to content

Commit 3abf176

Browse files
committed
doc: Update stallwarn.rst
This commit updates stallwarn.rst to reflect RCU additions and changes over the past few years. Signed-off-by: Paul E. McKenney <[email protected]>
1 parent 647dd4c commit 3abf176

File tree

1 file changed

+25
-18
lines changed

1 file changed

+25
-18
lines changed

Documentation/RCU/stallwarn.rst

Lines changed: 25 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ warnings:
2525

2626
- A CPU looping with bottom halves disabled.
2727

28-
- For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the kernel
29-
without invoking schedule(). If the looping in the kernel is
30-
really expected and desirable behavior, you might need to add
31-
some calls to cond_resched().
28+
- For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the
29+
kernel without potentially invoking schedule(). If the looping
30+
in the kernel is really expected and desirable behavior, you
31+
might need to add some calls to cond_resched().
3232

3333
- Booting Linux using a console connection that is too slow to
3434
keep up with the boot-time console-message rate. For example,
@@ -108,16 +108,17 @@ warnings:
108108

109109
- A bug in the RCU implementation.
110110

111-
- A hardware failure. This is quite unlikely, but has occurred
112-
at least once in real life. A CPU failed in a running system,
113-
becoming unresponsive, but not causing an immediate crash.
114-
This resulted in a series of RCU CPU stall warnings, eventually
115-
leading the realization that the CPU had failed.
111+
- A hardware failure. This is quite unlikely, but is not at all
112+
uncommon in large datacenter. In one memorable case some decades
113+
back, a CPU failed in a running system, becoming unresponsive,
114+
but not causing an immediate crash. This resulted in a series
115+
of RCU CPU stall warnings, eventually leading the realization
116+
that the CPU had failed.
116117

117-
The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning.
118-
Note that SRCU does *not* have CPU stall warnings. Please note that
119-
RCU only detects CPU stalls when there is a grace period in progress.
120-
No grace period, no CPU stall warnings.
118+
The RCU, RCU-sched, RCU-tasks, and RCU-tasks-trace implementations have
119+
CPU stall warning. Note that SRCU does *not* have CPU stall warnings.
120+
Please note that RCU only detects CPU stalls when there is a grace period
121+
in progress. No grace period, no CPU stall warnings.
121122

122123
To diagnose the cause of the stall, inspect the stack traces.
123124
The offending function will usually be near the top of the stack.
@@ -205,16 +206,21 @@ RCU_STALL_RAT_DELAY
205206
rcupdate.rcu_task_stall_timeout
206207
-------------------------------
207208

208-
This boot/sysfs parameter controls the RCU-tasks stall warning
209-
interval. A value of zero or less suppresses RCU-tasks stall
210-
warnings. A positive value sets the stall-warning interval
211-
in seconds. An RCU-tasks stall warning starts with the line:
209+
This boot/sysfs parameter controls the RCU-tasks and
210+
RCU-tasks-trace stall warning intervals. A value of zero or less
211+
suppresses RCU-tasks stall warnings. A positive value sets the
212+
stall-warning interval in seconds. An RCU-tasks stall warning
213+
starts with the line:
212214

213215
INFO: rcu_tasks detected stalls on tasks:
214216

215217
And continues with the output of sched_show_task() for each
216218
task stalling the current RCU-tasks grace period.
217219

220+
An RCU-tasks-trace stall warning starts (and continues) similarly:
221+
222+
INFO: rcu_tasks_trace detected stalls on tasks
223+
218224

219225
Interpreting RCU's CPU Stall-Detector "Splats"
220226
==============================================
@@ -248,7 +254,8 @@ dynticks counter, which will have an even-numbered value if the CPU
248254
is in dyntick-idle mode and an odd-numbered value otherwise. The hex
249255
number between the two "/"s is the value of the nesting, which will be
250256
a small non-negative number if in the idle loop (as shown above) and a
251-
very large positive number otherwise.
257+
very large positive number otherwise. The number following the final
258+
"/" is the NMI nesting, which will be a small non-negative number.
252259

253260
The "softirq=" portion of the message tracks the number of RCU softirq
254261
handlers that the stalled CPU has executed. The number before the "/"

0 commit comments

Comments
 (0)