Skip to content

Commit f736e0f

Browse files
committed
Merge branches 'fixes.2020.04.27a', 'kfree_rcu.2020.04.27a', 'rcu-tasks.2020.04.27a', 'stall.2020.04.27a' and 'torture.2020.05.07a' into HEAD
fixes.2020.04.27a: Miscellaneous fixes. kfree_rcu.2020.04.27a: Changes related to kfree_rcu(). rcu-tasks.2020.04.27a: Addition of new RCU-tasks flavors. stall.2020.04.27a: RCU CPU stall-warning updates. torture.2020.05.07a: Torture-test updates.
5 parents e2f3ccf + 6be7436 + e5a971d + 33b2b93 + 3c80b40 commit f736e0f

40 files changed

+2054
-564
lines changed

Documentation/RCU/Design/Requirements/Requirements.rst

Lines changed: 16 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1943,56 +1943,27 @@ invoked from a CPU-hotplug notifier.
19431943
Scheduler and RCU
19441944
~~~~~~~~~~~~~~~~~
19451945

1946-
RCU depends on the scheduler, and the scheduler uses RCU to protect some
1947-
of its data structures. The preemptible-RCU ``rcu_read_unlock()``
1948-
implementation must therefore be written carefully to avoid deadlocks
1949-
involving the scheduler's runqueue and priority-inheritance locks. In
1950-
particular, ``rcu_read_unlock()`` must tolerate an interrupt where the
1951-
interrupt handler invokes both ``rcu_read_lock()`` and
1952-
``rcu_read_unlock()``. This possibility requires ``rcu_read_unlock()``
1953-
to use negative nesting levels to avoid destructive recursion via
1954-
interrupt handler's use of RCU.
1955-
1956-
This scheduler-RCU requirement came as a `complete
1957-
surprise <https://lwn.net/Articles/453002/>`__.
1958-
1959-
As noted above, RCU makes use of kthreads, and it is necessary to avoid
1960-
excessive CPU-time accumulation by these kthreads. This requirement was
1961-
no surprise, but RCU's violation of it when running context-switch-heavy
1962-
workloads when built with ``CONFIG_NO_HZ_FULL=y`` `did come as a
1963-
surprise
1946+
RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time
1947+
accumulation by these kthreads. This requirement was no surprise, but
1948+
RCU's violation of it when running context-switch-heavy workloads when
1949+
built with ``CONFIG_NO_HZ_FULL=y`` `did come as a surprise
19641950
[PDF] <http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf>`__.
19651951
RCU has made good progress towards meeting this requirement, even for
19661952
context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is
19671953
room for further improvement.
19681954

1969-
It is forbidden to hold any of scheduler's runqueue or
1970-
priority-inheritance spinlocks across an ``rcu_read_unlock()`` unless
1971-
interrupts have been disabled across the entire RCU read-side critical
1972-
section, that is, up to and including the matching ``rcu_read_lock()``.
1973-
Violating this restriction can result in deadlocks involving these
1974-
scheduler spinlocks. There was hope that this restriction might be
1975-
lifted when interrupt-disabled calls to ``rcu_read_unlock()`` started
1976-
deferring the reporting of the resulting RCU-preempt quiescent state
1977-
until the end of the corresponding interrupts-disabled region.
1978-
Unfortunately, timely reporting of the corresponding quiescent state to
1979-
expedited grace periods requires a call to ``raise_softirq()``, which
1980-
can acquire these scheduler spinlocks. In addition, real-time systems
1981-
using RCU priority boosting need this restriction to remain in effect
1982-
because deferred quiescent-state reporting would also defer deboosting,
1983-
which in turn would degrade real-time latencies.
1984-
1985-
In theory, if a given RCU read-side critical section could be guaranteed
1986-
to be less than one second in duration, holding a scheduler spinlock
1987-
across that critical section's ``rcu_read_unlock()`` would require only
1988-
that preemption be disabled across the entire RCU read-side critical
1989-
section, not interrupts. Unfortunately, given the possibility of vCPU
1990-
preemption, long-running interrupts, and so on, it is not possible in
1991-
practice to guarantee that a given RCU read-side critical section will
1992-
complete in less than one second. Therefore, as noted above, if
1993-
scheduler spinlocks are held across a given call to
1994-
``rcu_read_unlock()``, interrupts must be disabled across the entire RCU
1995-
read-side critical section.
1955+
There is no longer any prohibition against holding any of
1956+
scheduler's runqueue or priority-inheritance spinlocks across an
1957+
``rcu_read_unlock()``, even if interrupts and preemption were enabled
1958+
somewhere within the corresponding RCU read-side critical section.
1959+
Therefore, it is now perfectly legal to execute ``rcu_read_lock()``
1960+
with preemption enabled, acquire one of the scheduler locks, and hold
1961+
that lock across the matching ``rcu_read_unlock()``.
1962+
1963+
Similarly, the RCU flavor consolidation has removed the need for negative
1964+
nesting. The fact that interrupt-disabled regions of code act as RCU
1965+
read-side critical sections implicitly avoids earlier issues that used
1966+
to result in destructive recursion via interrupt handler's use of RCU.
19961967

19971968
Tracing and RCU
19981969
~~~~~~~~~~~~~~~

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4210,12 +4210,24 @@
42104210
Duration of CPU stall (s) to test RCU CPU stall
42114211
warnings, zero to disable.
42124212

4213+
rcutorture.stall_cpu_block= [KNL]
4214+
Sleep while stalling if set. This will result
4215+
in warnings from preemptible RCU in addition
4216+
to any other stall-related activity.
4217+
42134218
rcutorture.stall_cpu_holdoff= [KNL]
42144219
Time to wait (s) after boot before inducing stall.
42154220

42164221
rcutorture.stall_cpu_irqsoff= [KNL]
42174222
Disable interrupts while stalling if set.
42184223

4224+
rcutorture.stall_gp_kthread= [KNL]
4225+
Duration (s) of forced sleep within RCU
4226+
grace-period kthread to test RCU CPU stall
4227+
warnings, zero to disable. If both stall_cpu
4228+
and stall_gp_kthread are specified, the
4229+
kthread is starved first, then the CPU.
4230+
42194231
rcutorture.stat_interval= [KNL]
42204232
Time (s) between statistics printk()s.
42214233

@@ -4286,6 +4298,13 @@
42864298
only normal grace-period primitives. No effect
42874299
on CONFIG_TINY_RCU kernels.
42884300

4301+
rcupdate.rcu_task_ipi_delay= [KNL]
4302+
Set time in jiffies during which RCU tasks will
4303+
avoid sending IPIs, starting with the beginning
4304+
of a given grace period. Setting a large
4305+
number avoids disturbing real-time workloads,
4306+
but lengthens grace periods.
4307+
42894308
rcupdate.rcu_task_stall_timeout= [KNL]
42904309
Set timeout in jiffies for RCU task stall warning
42914310
messages. Disable with a value less than or equal

include/linux/rcupdate.h

Lines changed: 43 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
/* Exported common interfaces */
3838
void call_rcu(struct rcu_head *head, rcu_callback_t func);
3939
void rcu_barrier_tasks(void);
40+
void rcu_barrier_tasks_rude(void);
4041
void synchronize_rcu(void);
4142

4243
#ifdef CONFIG_PREEMPT_RCU
@@ -129,25 +130,57 @@ static inline void rcu_init_nohz(void) { }
129130
* Note a quasi-voluntary context switch for RCU-tasks's benefit.
130131
* This is a macro rather than an inline function to avoid #include hell.
131132
*/
132-
#ifdef CONFIG_TASKS_RCU
133-
#define rcu_tasks_qs(t) \
134-
do { \
135-
if (READ_ONCE((t)->rcu_tasks_holdout)) \
136-
WRITE_ONCE((t)->rcu_tasks_holdout, false); \
133+
#ifdef CONFIG_TASKS_RCU_GENERIC
134+
135+
# ifdef CONFIG_TASKS_RCU
136+
# define rcu_tasks_classic_qs(t, preempt) \
137+
do { \
138+
if (!(preempt) && READ_ONCE((t)->rcu_tasks_holdout)) \
139+
WRITE_ONCE((t)->rcu_tasks_holdout, false); \
137140
} while (0)
138-
#define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t)
139141
void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
140142
void synchronize_rcu_tasks(void);
143+
# else
144+
# define rcu_tasks_classic_qs(t, preempt) do { } while (0)
145+
# define call_rcu_tasks call_rcu
146+
# define synchronize_rcu_tasks synchronize_rcu
147+
# endif
148+
149+
# ifdef CONFIG_TASKS_RCU_TRACE
150+
# define rcu_tasks_trace_qs(t) \
151+
do { \
152+
if (!likely(READ_ONCE((t)->trc_reader_checked)) && \
153+
!unlikely(READ_ONCE((t)->trc_reader_nesting))) { \
154+
smp_store_release(&(t)->trc_reader_checked, true); \
155+
smp_mb(); /* Readers partitioned by store. */ \
156+
} \
157+
} while (0)
158+
# else
159+
# define rcu_tasks_trace_qs(t) do { } while (0)
160+
# endif
161+
162+
#define rcu_tasks_qs(t, preempt) \
163+
do { \
164+
rcu_tasks_classic_qs((t), (preempt)); \
165+
rcu_tasks_trace_qs((t)); \
166+
} while (0)
167+
168+
# ifdef CONFIG_TASKS_RUDE_RCU
169+
void call_rcu_tasks_rude(struct rcu_head *head, rcu_callback_t func);
170+
void synchronize_rcu_tasks_rude(void);
171+
# endif
172+
173+
#define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t, false)
141174
void exit_tasks_rcu_start(void);
142175
void exit_tasks_rcu_finish(void);
143-
#else /* #ifdef CONFIG_TASKS_RCU */
144-
#define rcu_tasks_qs(t) do { } while (0)
176+
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
177+
#define rcu_tasks_qs(t, preempt) do { } while (0)
145178
#define rcu_note_voluntary_context_switch(t) do { } while (0)
146179
#define call_rcu_tasks call_rcu
147180
#define synchronize_rcu_tasks synchronize_rcu
148181
static inline void exit_tasks_rcu_start(void) { }
149182
static inline void exit_tasks_rcu_finish(void) { }
150-
#endif /* #else #ifdef CONFIG_TASKS_RCU */
183+
#endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
151184

152185
/**
153186
* cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
@@ -158,7 +191,7 @@ static inline void exit_tasks_rcu_finish(void) { }
158191
*/
159192
#define cond_resched_tasks_rcu_qs() \
160193
do { \
161-
rcu_tasks_qs(current); \
194+
rcu_tasks_qs(current, false); \
162195
cond_resched(); \
163196
} while (0)
164197

include/linux/rcupdate_trace.h

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
/* SPDX-License-Identifier: GPL-2.0+ */
2+
/*
3+
* Read-Copy Update mechanism for mutual exclusion, adapted for tracing.
4+
*
5+
* Copyright (C) 2020 Paul E. McKenney.
6+
*/
7+
8+
#ifndef __LINUX_RCUPDATE_TRACE_H
9+
#define __LINUX_RCUPDATE_TRACE_H
10+
11+
#include <linux/sched.h>
12+
#include <linux/rcupdate.h>
13+
14+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
15+
16+
extern struct lockdep_map rcu_trace_lock_map;
17+
18+
static inline int rcu_read_lock_trace_held(void)
19+
{
20+
return lock_is_held(&rcu_trace_lock_map);
21+
}
22+
23+
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
24+
25+
static inline int rcu_read_lock_trace_held(void)
26+
{
27+
return 1;
28+
}
29+
30+
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
31+
32+
#ifdef CONFIG_TASKS_TRACE_RCU
33+
34+
void rcu_read_unlock_trace_special(struct task_struct *t, int nesting);
35+
36+
/**
37+
* rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section
38+
*
39+
* When synchronize_rcu_trace() is invoked by one task, then that task
40+
* is guaranteed to block until all other tasks exit their read-side
41+
* critical sections. Similarly, if call_rcu_trace() is invoked on one
42+
* task while other tasks are within RCU read-side critical sections,
43+
* invocation of the corresponding RCU callback is deferred until after
44+
* the all the other tasks exit their critical sections.
45+
*
46+
* For more details, please see the documentation for rcu_read_lock().
47+
*/
48+
static inline void rcu_read_lock_trace(void)
49+
{
50+
struct task_struct *t = current;
51+
52+
WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
53+
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
54+
t->trc_reader_special.b.need_mb)
55+
smp_mb(); // Pairs with update-side barriers
56+
rcu_lock_acquire(&rcu_trace_lock_map);
57+
}
58+
59+
/**
60+
* rcu_read_unlock_trace - mark end of RCU-trace read-side critical section
61+
*
62+
* Pairs with a preceding call to rcu_read_lock_trace(), and nesting is
63+
* allowed. Invoking a rcu_read_unlock_trace() when there is no matching
64+
* rcu_read_lock_trace() is verboten, and will result in lockdep complaints.
65+
*
66+
* For more details, please see the documentation for rcu_read_unlock().
67+
*/
68+
static inline void rcu_read_unlock_trace(void)
69+
{
70+
int nesting;
71+
struct task_struct *t = current;
72+
73+
rcu_lock_release(&rcu_trace_lock_map);
74+
nesting = READ_ONCE(t->trc_reader_nesting) - 1;
75+
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
76+
WRITE_ONCE(t->trc_reader_nesting, nesting);
77+
return; // We assume shallow reader nesting.
78+
}
79+
rcu_read_unlock_trace_special(t, nesting);
80+
}
81+
82+
void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func);
83+
void synchronize_rcu_tasks_trace(void);
84+
void rcu_barrier_tasks_trace(void);
85+
86+
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
87+
88+
#endif /* __LINUX_RCUPDATE_TRACE_H */

include/linux/rcupdate_wait.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,23 @@ do { \
3131

3232
#define wait_rcu_gp(...) _wait_rcu_gp(false, __VA_ARGS__)
3333

34+
/**
35+
* synchronize_rcu_mult - Wait concurrently for multiple grace periods
36+
* @...: List of call_rcu() functions for different grace periods to wait on
37+
*
38+
* This macro waits concurrently for multiple types of RCU grace periods.
39+
* For example, synchronize_rcu_mult(call_rcu, call_rcu_tasks) would wait
40+
* on concurrent RCU and RCU-tasks grace periods. Waiting on a given SRCU
41+
* domain requires you to write a wrapper function for that SRCU domain's
42+
* call_srcu() function, with this wrapper supplying the pointer to the
43+
* corresponding srcu_struct.
44+
*
45+
* The first argument tells Tiny RCU's _wait_rcu_gp() not to
46+
* bother waiting for RCU. The reason for this is because anywhere
47+
* synchronize_rcu_mult() can be called is automatically already a full
48+
* grace period.
49+
*/
50+
#define synchronize_rcu_mult(...) \
51+
_wait_rcu_gp(IS_ENABLED(CONFIG_TINY_RCU), __VA_ARGS__)
52+
3453
#endif /* _LINUX_SCHED_RCUPDATE_WAIT_H */

include/linux/rcutiny.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ static inline void rcu_softirq_qs(void)
4949
#define rcu_note_context_switch(preempt) \
5050
do { \
5151
rcu_qs(); \
52-
rcu_tasks_qs(current); \
52+
rcu_tasks_qs(current, (preempt)); \
5353
} while (0)
5454

5555
static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
@@ -87,6 +87,7 @@ static inline bool rcu_inkernel_boot_has_ended(void) { return true; }
8787
static inline bool rcu_is_watching(void) { return true; }
8888
static inline void rcu_momentary_dyntick_idle(void) { }
8989
static inline void kfree_rcu_scheduler_running(void) { }
90+
static inline bool rcu_gp_might_be_stalled(void) { return false; }
9091

9192
/* Avoid RCU read-side critical sections leaking across. */
9293
static inline void rcu_all_qs(void) { barrier(); }

include/linux/rcutree.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ void rcu_barrier(void);
3939
bool rcu_eqs_special_set(int cpu);
4040
void rcu_momentary_dyntick_idle(void);
4141
void kfree_rcu_scheduler_running(void);
42+
bool rcu_gp_might_be_stalled(void);
4243
unsigned long get_state_synchronize_rcu(void);
4344
void cond_synchronize_rcu(unsigned long oldstate);
4445

include/linux/sched.h

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -613,7 +613,7 @@ union rcu_special {
613613
u8 blocked;
614614
u8 need_qs;
615615
u8 exp_hint; /* Hint for performance. */
616-
u8 deferred_qs;
616+
u8 need_mb; /* Readers need smp_mb(). */
617617
} b; /* Bits. */
618618
u32 s; /* Set of bits. */
619619
};
@@ -724,6 +724,14 @@ struct task_struct {
724724
struct list_head rcu_tasks_holdout_list;
725725
#endif /* #ifdef CONFIG_TASKS_RCU */
726726

727+
#ifdef CONFIG_TASKS_TRACE_RCU
728+
int trc_reader_nesting;
729+
int trc_ipi_to_cpu;
730+
union rcu_special trc_reader_special;
731+
bool trc_reader_checked;
732+
struct list_head trc_holdout_list;
733+
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
734+
727735
struct sched_info sched_info;
728736

729737
struct list_head tasks;

include/linux/torture.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ void _torture_stop_kthread(char *m, struct task_struct **tp);
8989
#ifdef CONFIG_PREEMPTION
9090
#define torture_preempt_schedule() preempt_schedule()
9191
#else
92-
#define torture_preempt_schedule()
92+
#define torture_preempt_schedule() do { } while (0)
9393
#endif
9494

9595
#endif /* __LINUX_TORTURE_H */

include/linux/wait.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1149,4 +1149,6 @@ int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, i
11491149
(wait)->flags = 0; \
11501150
} while (0)
11511151

1152+
bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg);
1153+
11521154
#endif /* _LINUX_WAIT_H */

0 commit comments

Comments
 (0)