Skip to content

Commit 2227e5b

Browse files
committed
Merge tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: "The RCU updates for this cycle were: - RCU-tasks update, including addition of RCU Tasks Trace for BPF use and TASKS_RUDE_RCU - kfree_rcu() updates. - Remove scheduler locking restriction - RCU CPU stall warning updates. - Torture-test updates. - Miscellaneous fixes and other updates" * tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) rcu: Allow for smp_call_function() running callbacks from idle rcu: Provide rcu_irq_exit_check_preempt() rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter() rcu: Provide __rcu_is_watching() rcu: Provide rcu_irq_exit_preempt() rcu: Make RCU IRQ enter/exit functions rely on in_nmi() rcu/tree: Mark the idle relevant functions noinstr x86: Replace ist_enter() with nmi_enter() x86/mce: Send #MC singal from task work x86/entry: Get rid of ist_begin/end_non_atomic() sched,rcu,tracing: Avoid tracing before in_nmi() is correct sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception lockdep: Always inline lockdep_{off,on}() hardirq/nmi: Allow nested nmi_enter() arm64: Prepare arch_nmi_enter() for recursion printk: Disallow instrumenting print_nmi_enter() printk: Prepare for nested printk_nmi_enter() rcutorture: Convert ULONG_CMP_LT() to time_before() torture: Add a --kasan argument torture: Save a few lines by using config_override_param initially ...
2 parents 0bd957e + cb3cb67 commit 2227e5b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+2537
-948
lines changed

Documentation/RCU/Design/Requirements/Requirements.rst

Lines changed: 16 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1943,56 +1943,27 @@ invoked from a CPU-hotplug notifier.
19431943
Scheduler and RCU
19441944
~~~~~~~~~~~~~~~~~
19451945

1946-
RCU depends on the scheduler, and the scheduler uses RCU to protect some
1947-
of its data structures. The preemptible-RCU ``rcu_read_unlock()``
1948-
implementation must therefore be written carefully to avoid deadlocks
1949-
involving the scheduler's runqueue and priority-inheritance locks. In
1950-
particular, ``rcu_read_unlock()`` must tolerate an interrupt where the
1951-
interrupt handler invokes both ``rcu_read_lock()`` and
1952-
``rcu_read_unlock()``. This possibility requires ``rcu_read_unlock()``
1953-
to use negative nesting levels to avoid destructive recursion via
1954-
interrupt handler's use of RCU.
1955-
1956-
This scheduler-RCU requirement came as a `complete
1957-
surprise <https://lwn.net/Articles/453002/>`__.
1958-
1959-
As noted above, RCU makes use of kthreads, and it is necessary to avoid
1960-
excessive CPU-time accumulation by these kthreads. This requirement was
1961-
no surprise, but RCU's violation of it when running context-switch-heavy
1962-
workloads when built with ``CONFIG_NO_HZ_FULL=y`` `did come as a
1963-
surprise
1946+
RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time
1947+
accumulation by these kthreads. This requirement was no surprise, but
1948+
RCU's violation of it when running context-switch-heavy workloads when
1949+
built with ``CONFIG_NO_HZ_FULL=y`` `did come as a surprise
19641950
[PDF] <http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf>`__.
19651951
RCU has made good progress towards meeting this requirement, even for
19661952
context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is
19671953
room for further improvement.
19681954

1969-
It is forbidden to hold any of scheduler's runqueue or
1970-
priority-inheritance spinlocks across an ``rcu_read_unlock()`` unless
1971-
interrupts have been disabled across the entire RCU read-side critical
1972-
section, that is, up to and including the matching ``rcu_read_lock()``.
1973-
Violating this restriction can result in deadlocks involving these
1974-
scheduler spinlocks. There was hope that this restriction might be
1975-
lifted when interrupt-disabled calls to ``rcu_read_unlock()`` started
1976-
deferring the reporting of the resulting RCU-preempt quiescent state
1977-
until the end of the corresponding interrupts-disabled region.
1978-
Unfortunately, timely reporting of the corresponding quiescent state to
1979-
expedited grace periods requires a call to ``raise_softirq()``, which
1980-
can acquire these scheduler spinlocks. In addition, real-time systems
1981-
using RCU priority boosting need this restriction to remain in effect
1982-
because deferred quiescent-state reporting would also defer deboosting,
1983-
which in turn would degrade real-time latencies.
1984-
1985-
In theory, if a given RCU read-side critical section could be guaranteed
1986-
to be less than one second in duration, holding a scheduler spinlock
1987-
across that critical section's ``rcu_read_unlock()`` would require only
1988-
that preemption be disabled across the entire RCU read-side critical
1989-
section, not interrupts. Unfortunately, given the possibility of vCPU
1990-
preemption, long-running interrupts, and so on, it is not possible in
1991-
practice to guarantee that a given RCU read-side critical section will
1992-
complete in less than one second. Therefore, as noted above, if
1993-
scheduler spinlocks are held across a given call to
1994-
``rcu_read_unlock()``, interrupts must be disabled across the entire RCU
1995-
read-side critical section.
1955+
There is no longer any prohibition against holding any of
1956+
scheduler's runqueue or priority-inheritance spinlocks across an
1957+
``rcu_read_unlock()``, even if interrupts and preemption were enabled
1958+
somewhere within the corresponding RCU read-side critical section.
1959+
Therefore, it is now perfectly legal to execute ``rcu_read_lock()``
1960+
with preemption enabled, acquire one of the scheduler locks, and hold
1961+
that lock across the matching ``rcu_read_unlock()``.
1962+
1963+
Similarly, the RCU flavor consolidation has removed the need for negative
1964+
nesting. The fact that interrupt-disabled regions of code act as RCU
1965+
read-side critical sections implicitly avoids earlier issues that used
1966+
to result in destructive recursion via interrupt handler's use of RCU.
19961967

19971968
Tracing and RCU
19981969
~~~~~~~~~~~~~~~

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4210,12 +4210,24 @@
42104210
Duration of CPU stall (s) to test RCU CPU stall
42114211
warnings, zero to disable.
42124212

4213+
rcutorture.stall_cpu_block= [KNL]
4214+
Sleep while stalling if set. This will result
4215+
in warnings from preemptible RCU in addition
4216+
to any other stall-related activity.
4217+
42134218
rcutorture.stall_cpu_holdoff= [KNL]
42144219
Time to wait (s) after boot before inducing stall.
42154220

42164221
rcutorture.stall_cpu_irqsoff= [KNL]
42174222
Disable interrupts while stalling if set.
42184223

4224+
rcutorture.stall_gp_kthread= [KNL]
4225+
Duration (s) of forced sleep within RCU
4226+
grace-period kthread to test RCU CPU stall
4227+
warnings, zero to disable. If both stall_cpu
4228+
and stall_gp_kthread are specified, the
4229+
kthread is starved first, then the CPU.
4230+
42194231
rcutorture.stat_interval= [KNL]
42204232
Time (s) between statistics printk()s.
42214233

@@ -4286,6 +4298,13 @@
42864298
only normal grace-period primitives. No effect
42874299
on CONFIG_TINY_RCU kernels.
42884300

4301+
rcupdate.rcu_task_ipi_delay= [KNL]
4302+
Set time in jiffies during which RCU tasks will
4303+
avoid sending IPIs, starting with the beginning
4304+
of a given grace period. Setting a large
4305+
number avoids disturbing real-time workloads,
4306+
but lengthens grace periods.
4307+
42894308
rcupdate.rcu_task_stall_timeout= [KNL]
42904309
Set timeout in jiffies for RCU task stall warning
42914310
messages. Disable with a value less than or equal

Documentation/trace/ftrace-design.rst

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -229,14 +229,6 @@ Adding support for it is easy: just define the macro in asm/ftrace.h and
229229
pass the return address pointer as the 'retp' argument to
230230
ftrace_push_return_trace().
231231

232-
HAVE_FTRACE_NMI_ENTER
233-
---------------------
234-
235-
If you can't trace NMI functions, then skip this option.
236-
237-
<details to be filled>
238-
239-
240232
HAVE_SYSCALL_TRACEPOINTS
241233
------------------------
242234

arch/arm64/include/asm/hardirq.h

Lines changed: 59 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -32,30 +32,70 @@ u64 smp_irq_stat_cpu(unsigned int cpu);
3232

3333
struct nmi_ctx {
3434
u64 hcr;
35+
unsigned int cnt;
3536
};
3637

3738
DECLARE_PER_CPU(struct nmi_ctx, nmi_contexts);
3839

39-
#define arch_nmi_enter() \
40-
do { \
41-
if (is_kernel_in_hyp_mode()) { \
42-
struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
43-
nmi_ctx->hcr = read_sysreg(hcr_el2); \
44-
if (!(nmi_ctx->hcr & HCR_TGE)) { \
45-
write_sysreg(nmi_ctx->hcr | HCR_TGE, hcr_el2); \
46-
isb(); \
47-
} \
48-
} \
49-
} while (0)
40+
#define arch_nmi_enter() \
41+
do { \
42+
struct nmi_ctx *___ctx; \
43+
u64 ___hcr; \
44+
\
45+
if (!is_kernel_in_hyp_mode()) \
46+
break; \
47+
\
48+
___ctx = this_cpu_ptr(&nmi_contexts); \
49+
if (___ctx->cnt) { \
50+
___ctx->cnt++; \
51+
break; \
52+
} \
53+
\
54+
___hcr = read_sysreg(hcr_el2); \
55+
if (!(___hcr & HCR_TGE)) { \
56+
write_sysreg(___hcr | HCR_TGE, hcr_el2); \
57+
isb(); \
58+
} \
59+
/* \
60+
* Make sure the sysreg write is performed before ___ctx->cnt \
61+
* is set to 1. NMIs that see cnt == 1 will rely on us. \
62+
*/ \
63+
barrier(); \
64+
___ctx->cnt = 1; \
65+
/* \
66+
* Make sure ___ctx->cnt is set before we save ___hcr. We \
67+
* don't want ___ctx->hcr to be overwritten. \
68+
*/ \
69+
barrier(); \
70+
___ctx->hcr = ___hcr; \
71+
} while (0)
5072

51-
#define arch_nmi_exit() \
52-
do { \
53-
if (is_kernel_in_hyp_mode()) { \
54-
struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
55-
if (!(nmi_ctx->hcr & HCR_TGE)) \
56-
write_sysreg(nmi_ctx->hcr, hcr_el2); \
57-
} \
58-
} while (0)
73+
#define arch_nmi_exit() \
74+
do { \
75+
struct nmi_ctx *___ctx; \
76+
u64 ___hcr; \
77+
\
78+
if (!is_kernel_in_hyp_mode()) \
79+
break; \
80+
\
81+
___ctx = this_cpu_ptr(&nmi_contexts); \
82+
___hcr = ___ctx->hcr; \
83+
/* \
84+
* Make sure we read ___ctx->hcr before we release \
85+
* ___ctx->cnt as it makes ___ctx->hcr updatable again. \
86+
*/ \
87+
barrier(); \
88+
___ctx->cnt--; \
89+
/* \
90+
* Make sure ___ctx->cnt release is visible before we \
91+
* restore the sysreg. Otherwise a new NMI occurring \
92+
* right after write_sysreg() can be fooled and think \
93+
* we secured things for it. \
94+
*/ \
95+
barrier(); \
96+
if (!___ctx->cnt && !(___hcr & HCR_TGE)) \
97+
write_sysreg(___hcr, hcr_el2); \
98+
} while (0)
5999

60100
static inline void ack_bad_irq(unsigned int irq)
61101
{

arch/arm64/kernel/sdei.c

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -251,22 +251,12 @@ asmlinkage __kprobes notrace unsigned long
251251
__sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
252252
{
253253
unsigned long ret;
254-
bool do_nmi_exit = false;
255254

256-
/*
257-
* nmi_enter() deals with printk() re-entrance and use of RCU when
258-
* RCU believed this CPU was idle. Because critical events can
259-
* interrupt normal events, we may already be in_nmi().
260-
*/
261-
if (!in_nmi()) {
262-
nmi_enter();
263-
do_nmi_exit = true;
264-
}
255+
nmi_enter();
265256

266257
ret = _sdei_handler(regs, arg);
267258

268-
if (do_nmi_exit)
269-
nmi_exit();
259+
nmi_exit();
270260

271261
return ret;
272262
}

arch/arm64/kernel/traps.c

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -906,17 +906,13 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)
906906

907907
asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
908908
{
909-
const bool was_in_nmi = in_nmi();
910-
911-
if (!was_in_nmi)
912-
nmi_enter();
909+
nmi_enter();
913910

914911
/* non-RAS errors are not containable */
915912
if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
916913
arm64_serror_panic(regs, esr);
917914

918-
if (!was_in_nmi)
919-
nmi_exit();
915+
nmi_exit();
920916
}
921917

922918
asmlinkage void enter_from_user_mode(void)

arch/powerpc/kernel/traps.c

Lines changed: 6 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -441,15 +441,9 @@ void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
441441
void system_reset_exception(struct pt_regs *regs)
442442
{
443443
unsigned long hsrr0, hsrr1;
444-
bool nested = in_nmi();
445444
bool saved_hsrrs = false;
446445

447-
/*
448-
* Avoid crashes in case of nested NMI exceptions. Recoverability
449-
* is determined by RI and in_nmi
450-
*/
451-
if (!nested)
452-
nmi_enter();
446+
nmi_enter();
453447

454448
/*
455449
* System reset can interrupt code where HSRRs are live and MSR[RI]=1.
@@ -521,8 +515,7 @@ void system_reset_exception(struct pt_regs *regs)
521515
mtspr(SPRN_HSRR1, hsrr1);
522516
}
523517

524-
if (!nested)
525-
nmi_exit();
518+
nmi_exit();
526519

527520
/* What should we do here? We could issue a shutdown or hard reset. */
528521
}
@@ -823,9 +816,8 @@ int machine_check_generic(struct pt_regs *regs)
823816
void machine_check_exception(struct pt_regs *regs)
824817
{
825818
int recover = 0;
826-
bool nested = in_nmi();
827-
if (!nested)
828-
nmi_enter();
819+
820+
nmi_enter();
829821

830822
__this_cpu_inc(irq_stat.mce_exceptions);
831823

@@ -851,8 +843,7 @@ void machine_check_exception(struct pt_regs *regs)
851843
if (check_io_access(regs))
852844
goto bail;
853845

854-
if (!nested)
855-
nmi_exit();
846+
nmi_exit();
856847

857848
die("Machine check", regs, SIGBUS);
858849

@@ -863,8 +854,7 @@ void machine_check_exception(struct pt_regs *regs)
863854
return;
864855

865856
bail:
866-
if (!nested)
867-
nmi_exit();
857+
nmi_exit();
868858
}
869859

870860
void SMIException(struct pt_regs *regs)

arch/sh/Kconfig

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,6 @@ config SUPERH32
7171
select HAVE_FUNCTION_TRACER
7272
select HAVE_FTRACE_MCOUNT_RECORD
7373
select HAVE_DYNAMIC_FTRACE
74-
select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE
7574
select ARCH_WANT_IPC_PARSE_VERSION
7675
select HAVE_FUNCTION_GRAPH_TRACER
7776
select HAVE_ARCH_KGDB

arch/sh/kernel/traps.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,11 +170,21 @@ BUILD_TRAP_HANDLER(bug)
170170
force_sig(SIGTRAP);
171171
}
172172

173+
#ifdef CONFIG_DYNAMIC_FTRACE
174+
extern void arch_ftrace_nmi_enter(void);
175+
extern void arch_ftrace_nmi_exit(void);
176+
#else
177+
static inline void arch_ftrace_nmi_enter(void) { }
178+
static inline void arch_ftrace_nmi_exit(void) { }
179+
#endif
180+
173181
BUILD_TRAP_HANDLER(nmi)
174182
{
175183
unsigned int cpu = smp_processor_id();
176184
TRAP_HANDLER_DECL;
177185

186+
arch_ftrace_nmi_enter();
187+
178188
nmi_enter();
179189
nmi_count(cpu)++;
180190

@@ -190,4 +200,6 @@ BUILD_TRAP_HANDLER(nmi)
190200
}
191201

192202
nmi_exit();
203+
204+
arch_ftrace_nmi_exit();
193205
}

arch/x86/include/asm/traps.h

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -118,11 +118,6 @@ void smp_spurious_interrupt(struct pt_regs *regs);
118118
void smp_error_interrupt(struct pt_regs *regs);
119119
asmlinkage void smp_irq_move_cleanup_interrupt(void);
120120

121-
extern void ist_enter(struct pt_regs *regs);
122-
extern void ist_exit(struct pt_regs *regs);
123-
extern void ist_begin_non_atomic(struct pt_regs *regs);
124-
extern void ist_end_non_atomic(void);
125-
126121
#ifdef CONFIG_VMAP_STACK
127122
void __noreturn handle_stack_overflow(const char *message,
128123
struct pt_regs *regs,

0 commit comments

Comments
 (0)