Skip to content

Commit f2286ab

Browse files
mchehabpaulmckrcu
authored andcommitted
docs: RCU: Convert stallwarn.txt to ReST
- Add a SPDX header; - Adjust document and section titles; - Fix list markups; - Some whitespace fixes and new line breaks; - Mark literal blocks as such; - Add it to RCU/index.rst. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
1 parent 90c73cb commit f2286ab

File tree

3 files changed

+37
-23
lines changed

3 files changed

+37
-23
lines changed

Documentation/RCU/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ RCU concepts
2020
rculist_nulls
2121
rcuref
2222
torture
23+
stallwarn
2324
listRCU
2425
NMI-RCU
2526
UP

Documentation/RCU/stallwarn.txt renamed to Documentation/RCU/stallwarn.rst

Lines changed: 34 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
==============================
14
Using RCU's CPU Stall Detector
5+
==============================
26

37
This document first discusses what sorts of issues RCU's CPU stall
48
detector can locate, and then discusses kernel parameters and Kconfig
@@ -7,39 +11,40 @@ this document explains the stall detector's "splat" format.
711

812

913
What Causes RCU CPU Stall Warnings?
14+
===================================
1015

1116
So your kernel printed an RCU CPU stall warning. The next question is
1217
"What caused it?" The following problems can result in RCU CPU stall
1318
warnings:
1419

15-
o A CPU looping in an RCU read-side critical section.
20+
- A CPU looping in an RCU read-side critical section.
1621

17-
o A CPU looping with interrupts disabled.
22+
- A CPU looping with interrupts disabled.
1823

19-
o A CPU looping with preemption disabled.
24+
- A CPU looping with preemption disabled.
2025

21-
o A CPU looping with bottom halves disabled.
26+
- A CPU looping with bottom halves disabled.
2227

23-
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
28+
- For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
2429
without invoking schedule(). If the looping in the kernel is
2530
really expected and desirable behavior, you might need to add
2631
some calls to cond_resched().
2732

28-
o Booting Linux using a console connection that is too slow to
33+
- Booting Linux using a console connection that is too slow to
2934
keep up with the boot-time console-message rate. For example,
3035
a 115Kbaud serial console can be -way- too slow to keep up
3136
with boot-time message rates, and will frequently result in
3237
RCU CPU stall warning messages. Especially if you have added
3338
debug printk()s.
3439

35-
o Anything that prevents RCU's grace-period kthreads from running.
40+
- Anything that prevents RCU's grace-period kthreads from running.
3641
This can result in the "All QSes seen" console-log message.
3742
This message will include information on when the kthread last
3843
ran and how often it should be expected to run. It can also
39-
result in the "rcu_.*kthread starved for" console-log message,
44+
result in the ``rcu_.*kthread starved for`` console-log message,
4045
which will include additional debugging information.
4146

42-
o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
47+
- A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
4348
happen to preempt a low-priority task in the middle of an RCU
4449
read-side critical section. This is especially damaging if
4550
that low-priority task is not permitted to run on any other CPU,
@@ -48,7 +53,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
4853
While the system is in the process of running itself out of
4954
memory, you might see stall-warning messages.
5055

51-
o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
56+
- A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
5257
is running at a higher priority than the RCU softirq threads.
5358
This will prevent RCU callbacks from ever being invoked,
5459
and in a CONFIG_PREEMPT_RCU kernel will further prevent
@@ -63,28 +68,28 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
6368
can increase your system's context-switch rate and thus degrade
6469
performance.
6570

66-
o A periodic interrupt whose handler takes longer than the time
71+
- A periodic interrupt whose handler takes longer than the time
6772
interval between successive pairs of interrupts. This can
6873
prevent RCU's kthreads and softirq handlers from running.
6974
Note that certain high-overhead debugging options, for example
7075
the function_graph tracer, can result in interrupt handler taking
7176
considerably longer than normal, which can in turn result in
7277
RCU CPU stall warnings.
7378

74-
o Testing a workload on a fast system, tuning the stall-warning
79+
- Testing a workload on a fast system, tuning the stall-warning
7580
timeout down to just barely avoid RCU CPU stall warnings, and then
7681
running the same workload with the same stall-warning timeout on a
7782
slow system. Note that thermal throttling and on-demand governors
7883
can cause a single system to be sometimes fast and sometimes slow!
7984

80-
o A hardware or software issue shuts off the scheduler-clock
85+
- A hardware or software issue shuts off the scheduler-clock
8186
interrupt on a CPU that is not in dyntick-idle mode. This
8287
problem really has happened, and seems to be most likely to
8388
result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
8489

85-
o A bug in the RCU implementation.
90+
- A bug in the RCU implementation.
8691

87-
o A hardware failure. This is quite unlikely, but has occurred
92+
- A hardware failure. This is quite unlikely, but has occurred
8893
at least once in real life. A CPU failed in a running system,
8994
becoming unresponsive, but not causing an immediate crash.
9095
This resulted in a series of RCU CPU stall warnings, eventually
@@ -109,6 +114,7 @@ see include/trace/events/rcu.h.
109114

110115

111116
Fine-Tuning the RCU CPU Stall Detector
117+
======================================
112118

113119
The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
114120
CPU stall detector, which detects conditions that unduly delay RCU grace
@@ -118,6 +124,7 @@ The stall detector's idea of what constitutes "unduly delayed" is
118124
controlled by a set of kernel configuration variables and cpp macros:
119125

120126
CONFIG_RCU_CPU_STALL_TIMEOUT
127+
----------------------------
121128

122129
This kernel configuration parameter defines the period of time
123130
that RCU will wait from the beginning of a grace period until it
@@ -137,6 +144,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
137144
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
138145

139146
RCU_STALL_DELAY_DELTA
147+
---------------------
140148

141149
Although the lockdep facility is extremely useful, it does add
142150
some overhead. Therefore, under CONFIG_PROVE_RCU, the
@@ -145,6 +153,7 @@ RCU_STALL_DELAY_DELTA
145153
macro, not a kernel configuration parameter.)
146154

147155
RCU_STALL_RAT_DELAY
156+
-------------------
148157

149158
The CPU stall detector tries to make the offending CPU print its
150159
own warnings, as this often gives better-quality stack traces.
@@ -155,6 +164,7 @@ RCU_STALL_RAT_DELAY
155164
parameter.)
156165

157166
rcupdate.rcu_task_stall_timeout
167+
-------------------------------
158168

159169
This boot/sysfs parameter controls the RCU-tasks stall warning
160170
interval. A value of zero or less suppresses RCU-tasks stall
@@ -168,9 +178,10 @@ rcupdate.rcu_task_stall_timeout
168178

169179

170180
Interpreting RCU's CPU Stall-Detector "Splats"
181+
==============================================
171182

172183
For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
173-
it will print a message similar to the following:
184+
it will print a message similar to the following::
174185

175186
INFO: rcu_sched detected stalls on CPUs/tasks:
176187
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
@@ -223,7 +234,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs
223234
(625 in this case).
224235

225236
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
226-
for each CPU:
237+
for each CPU::
227238

228239
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
229240

@@ -235,7 +246,7 @@ processing is enabled.
235246

236247
If the grace period ends just as the stall warning starts printing,
237248
there will be a spurious stall-warning message, which will include
238-
the following:
249+
the following::
239250

240251
INFO: Stall ended before state dump start
241252

@@ -248,7 +259,7 @@ which is overkill for this sort of problem.
248259

249260
If all CPUs and tasks have passed through quiescent states, but the
250261
grace period has nevertheless failed to end, the stall-warning splat
251-
will include something like the following:
262+
will include something like the following::
252263

253264
All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0
254265

@@ -261,7 +272,7 @@ which is way less than 23807. Finally, the root rcu_node structure's
261272

262273
If the relevant grace-period kthread has been unable to run prior to
263274
the stall warning, as was the case in the "All QSes seen" line above,
264-
the following additional line is printed:
275+
the following additional line is printed::
265276

266277
kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
267278

@@ -276,6 +287,7 @@ kthread last ran on CPU 5.
276287

277288

278289
Multiple Warnings From One Stall
290+
================================
279291

280292
If a stall lasts long enough, multiple stall-warning messages will be
281293
printed for it. The second and subsequent messages are printed at
@@ -285,9 +297,10 @@ of the stall and the first message.
285297

286298

287299
Stall Warnings for Expedited Grace Periods
300+
==========================================
288301

289302
If an expedited grace period detects a stall, it will place a message
290-
like the following in dmesg:
303+
like the following in dmesg::
291304

292305
INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/.
293306

kernel/rcu/tree_stall.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -468,7 +468,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
468468

469469
/*
470470
* OK, time to rat on our buddy...
471-
* See Documentation/RCU/stallwarn.txt for info on how to debug
471+
* See Documentation/RCU/stallwarn.rst for info on how to debug
472472
* RCU CPU stall warnings.
473473
*/
474474
pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
@@ -535,7 +535,7 @@ static void print_cpu_stall(unsigned long gps)
535535

536536
/*
537537
* OK, time to rat on ourselves...
538-
* See Documentation/RCU/stallwarn.txt for info on how to debug
538+
* See Documentation/RCU/stallwarn.rst for info on how to debug
539539
* RCU CPU stall warnings.
540540
*/
541541
pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);

0 commit comments

Comments
 (0)