Skip to content

Commit e4453d8

Browse files
committed
rcu: Make rcu_read_unlock_special() safe for rq/pi locks
The scheduler is currently required to hold rq/pi locks across the entire RCU read-side critical section or not at all. This is inconvenient and leaves traps for the unwary, including the author of this commit. But now that excessively long grace periods enable scheduling-clock interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in rcu_read_unlock_special() can be dispensed with. In other words, the rcu_read_unlock_special() function can refrain from doing wakeups unless such wakeups are guaranteed safe. This commit therefore avoids unsafe wakeups, freeing the scheduler to hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU read-side critical section might have been preempted. This commit also updates RCU's requirements documentation. This commit is inspired by a patch from Lai Jiangshan: https://lore.kernel.org/lkml/[email protected] This commit is further intended to be a step towards his goal of permitting the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock(). Cc: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
1 parent c76e7e0 commit e4453d8

File tree

2 files changed

+24
-54
lines changed

2 files changed

+24
-54
lines changed

Documentation/RCU/Design/Requirements/Requirements.rst

Lines changed: 16 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1943,56 +1943,27 @@ invoked from a CPU-hotplug notifier.
19431943
Scheduler and RCU
19441944
~~~~~~~~~~~~~~~~~
19451945

1946-
RCU depends on the scheduler, and the scheduler uses RCU to protect some
1947-
of its data structures. The preemptible-RCU ``rcu_read_unlock()``
1948-
implementation must therefore be written carefully to avoid deadlocks
1949-
involving the scheduler's runqueue and priority-inheritance locks. In
1950-
particular, ``rcu_read_unlock()`` must tolerate an interrupt where the
1951-
interrupt handler invokes both ``rcu_read_lock()`` and
1952-
``rcu_read_unlock()``. This possibility requires ``rcu_read_unlock()``
1953-
to use negative nesting levels to avoid destructive recursion via
1954-
interrupt handler's use of RCU.
1955-
1956-
This scheduler-RCU requirement came as a `complete
1957-
surprise <https://lwn.net/Articles/453002/>`__.
1958-
1959-
As noted above, RCU makes use of kthreads, and it is necessary to avoid
1960-
excessive CPU-time accumulation by these kthreads. This requirement was
1961-
no surprise, but RCU's violation of it when running context-switch-heavy
1962-
workloads when built with ``CONFIG_NO_HZ_FULL=y`` `did come as a
1963-
surprise
1946+
RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time
1947+
accumulation by these kthreads. This requirement was no surprise, but
1948+
RCU's violation of it when running context-switch-heavy workloads when
1949+
built with ``CONFIG_NO_HZ_FULL=y`` `did come as a surprise
19641950
[PDF] <http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf>`__.
19651951
RCU has made good progress towards meeting this requirement, even for
19661952
context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is
19671953
room for further improvement.
19681954

1969-
It is forbidden to hold any of scheduler's runqueue or
1970-
priority-inheritance spinlocks across an ``rcu_read_unlock()`` unless
1971-
interrupts have been disabled across the entire RCU read-side critical
1972-
section, that is, up to and including the matching ``rcu_read_lock()``.
1973-
Violating this restriction can result in deadlocks involving these
1974-
scheduler spinlocks. There was hope that this restriction might be
1975-
lifted when interrupt-disabled calls to ``rcu_read_unlock()`` started
1976-
deferring the reporting of the resulting RCU-preempt quiescent state
1977-
until the end of the corresponding interrupts-disabled region.
1978-
Unfortunately, timely reporting of the corresponding quiescent state to
1979-
expedited grace periods requires a call to ``raise_softirq()``, which
1980-
can acquire these scheduler spinlocks. In addition, real-time systems
1981-
using RCU priority boosting need this restriction to remain in effect
1982-
because deferred quiescent-state reporting would also defer deboosting,
1983-
which in turn would degrade real-time latencies.
1984-
1985-
In theory, if a given RCU read-side critical section could be guaranteed
1986-
to be less than one second in duration, holding a scheduler spinlock
1987-
across that critical section's ``rcu_read_unlock()`` would require only
1988-
that preemption be disabled across the entire RCU read-side critical
1989-
section, not interrupts. Unfortunately, given the possibility of vCPU
1990-
preemption, long-running interrupts, and so on, it is not possible in
1991-
practice to guarantee that a given RCU read-side critical section will
1992-
complete in less than one second. Therefore, as noted above, if
1993-
scheduler spinlocks are held across a given call to
1994-
``rcu_read_unlock()``, interrupts must be disabled across the entire RCU
1995-
read-side critical section.
1955+
There is no longer any prohibition against holding any of
1956+
scheduler's runqueue or priority-inheritance spinlocks across an
1957+
``rcu_read_unlock()``, even if interrupts and preemption were enabled
1958+
somewhere within the corresponding RCU read-side critical section.
1959+
Therefore, it is now perfectly legal to execute ``rcu_read_lock()``
1960+
with preemption enabled, acquire one of the scheduler locks, and hold
1961+
that lock across the matching ``rcu_read_unlock()``.
1962+
1963+
Similarly, the RCU flavor consolidation has removed the need for negative
1964+
nesting. The fact that interrupt-disabled regions of code act as RCU
1965+
read-side critical sections implicitly avoids earlier issues that used
1966+
to result in destructive recursion via interrupt handler's use of RCU.
19961967

19971968
Tracing and RCU
19981969
~~~~~~~~~~~~~~~

kernel/rcu/tree_plugin.h

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -615,19 +615,18 @@ static void rcu_read_unlock_special(struct task_struct *t)
615615
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
616616
struct rcu_node *rnp = rdp->mynode;
617617

618-
exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
619-
(rdp->grpmask & READ_ONCE(rnp->expmask)) ||
620-
tick_nohz_full_cpu(rdp->cpu);
618+
exp = (t->rcu_blocked_node &&
619+
READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
620+
(rdp->grpmask & READ_ONCE(rnp->expmask));
621621
// Need to defer quiescent state until everything is enabled.
622-
if (irqs_were_disabled && use_softirq &&
623-
(in_interrupt() ||
624-
(exp && !t->rcu_read_unlock_special.b.deferred_qs))) {
625-
// Using softirq, safe to awaken, and we get
626-
// no help from enabling irqs, unlike bh/preempt.
622+
if (use_softirq && (in_irq() || (exp && !irqs_were_disabled))) {
623+
// Using softirq, safe to awaken, and either the
624+
// wakeup is free or there is an expedited GP.
627625
raise_softirq_irqoff(RCU_SOFTIRQ);
628626
} else {
629627
// Enabling BH or preempt does reschedule, so...
630-
// Also if no expediting or NO_HZ_FULL, slow is OK.
628+
// Also if no expediting, slow is OK.
629+
// Plus nohz_full CPUs eventually get tick enabled.
631630
set_tsk_need_resched(current);
632631
set_preempt_need_resched();
633632
if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&

0 commit comments

Comments
 (0)