Skip to content

Commit 5d71c2b

Browse files
Joel Fernandesneeraju
authored andcommitted
rcu: Document concurrent quiescent state reporting for offline CPUs
The synchronization of CPU offlining with GP initialization is confusing to put it mildly (rightfully so as the issue it deals with is complex). Recent discussions brought up a question -- what prevents the rcu_implicit_dyntick_qs() from warning about QS reports for offline CPUs (missing QS reports for offline CPUs causing indefinite hangs). QS reporting for now-offline CPUs should only happen from: - gp_init() - rcutree_cpu_report_dead() Add some documentation on this and refer to it from comments in the code explaining how QS reporting is not missed when these functions are concurrently running. I referred heavily to this post [1] about the need for the ofl_lock. [1] https://lore.kernel.org/all/[email protected]/ [ Applied paulmck feedback on moving documentation to Requirements.rst ] Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/ Co-developed-by: "Paul E. McKenney" <[email protected]> Signed-off-by: "Paul E. McKenney" <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Signed-off-by: Joel Fernandes <[email protected]> Signed-off-by: Neeraj Upadhyay (AMD) <[email protected]>
1 parent 186779c commit 5d71c2b

File tree

2 files changed

+105
-1
lines changed

2 files changed

+105
-1
lines changed

Documentation/RCU/Design/Requirements/Requirements.rst

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2011,6 +2011,93 @@ after the CPU hotplug scanning.
20112011
By incrementing gp_seq first, CPU1's RCU read-side critical section
20122012
is guaranteed to not be missed by CPU2.
20132013

2014+
**Concurrent Quiescent State Reporting for Offline CPUs**
2015+
2016+
RCU must ensure that CPUs going offline report quiescent states to avoid
2017+
blocking grace periods. This requires careful synchronization to handle
2018+
race conditions
2019+
2020+
**Race condition causing Offline CPU to hang GP**
2021+
2022+
A race between CPU offlining and new GP initialization (gp_init) may occur
2023+
because `rcu_report_qs_rnp()` in `rcutree_report_cpu_dead()` must temporarily
2024+
release the `rcu_node` lock to wake the RCU grace-period kthread:
2025+
2026+
.. code-block:: none
2027+
2028+
CPU1 (going offline) CPU0 (GP kthread)
2029+
-------------------- -----------------
2030+
rcutree_report_cpu_dead()
2031+
rcu_report_qs_rnp()
2032+
// Must release rnp->lock to wake GP kthread
2033+
raw_spin_unlock_irqrestore_rcu_node()
2034+
// Wakes up and starts new GP
2035+
rcu_gp_init()
2036+
// First loop:
2037+
copies qsmaskinitnext->qsmaskinit
2038+
// CPU1 still in qsmaskinitnext!
2039+
2040+
// Second loop:
2041+
rnp->qsmask = rnp->qsmaskinit
2042+
mask = rnp->qsmask & ~rnp->qsmaskinitnext
2043+
// mask is 0! CPU1 still in both masks
2044+
// Reacquire lock (but too late)
2045+
rnp->qsmaskinitnext &= ~mask // Finally clears bit
2046+
2047+
Without `ofl_lock`, the new grace period includes the offline CPU and waits
2048+
forever for its quiescent state causing a GP hang.
2049+
2050+
**A solution with ofl_lock**
2051+
2052+
The `ofl_lock` (offline lock) prevents `rcu_gp_init()` from running during
2053+
the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`:
2054+
2055+
.. code-block:: none
2056+
2057+
CPU0 (rcu_gp_init) CPU1 (rcutree_report_cpu_dead)
2058+
------------------ ------------------------------
2059+
rcu_for_each_leaf_node(rnp) {
2060+
arch_spin_lock(&ofl_lock) -----> arch_spin_lock(&ofl_lock) [BLOCKED]
2061+
2062+
// Safe: CPU1 can't interfere
2063+
rnp->qsmaskinit = rnp->qsmaskinitnext
2064+
2065+
arch_spin_unlock(&ofl_lock) ---> // Now CPU1 can proceed
2066+
} // But snapshot already taken
2067+
2068+
**Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs**
2069+
2070+
After the first loop takes an atomic snapshot of online CPUs, as shown above,
2071+
the second loop in `rcu_gp_init()` detects CPUs that went offline between
2072+
releasing `ofl_lock` and acquiring the per-node `rnp->lock`. This detection is
2073+
crucial because:
2074+
2075+
1. The CPU might have gone offline after the snapshot but before the second loop
2076+
2. The offline CPU cannot report its own QS if it's already dead
2077+
3. Without this detection, the grace period would wait forever for CPUs that
2078+
are now offline.
2079+
2080+
The second loop performs this detection safely:
2081+
2082+
.. code-block:: none
2083+
2084+
rcu_for_each_node_breadth_first(rnp) {
2085+
raw_spin_lock_irqsave_rcu_node(rnp, flags);
2086+
rnp->qsmask = rnp->qsmaskinit; // Apply the snapshot
2087+
2088+
// Detect CPUs offline after snapshot
2089+
mask = rnp->qsmask & ~rnp->qsmaskinitnext;
2090+
2091+
if (mask && rcu_is_leaf_node(rnp))
2092+
rcu_report_qs_rnp(mask, ...) // Report QS for offline CPUs
2093+
}
2094+
2095+
This approach ensures atomicity: quiescent state reporting for offline CPUs
2096+
happens either in `rcu_gp_init()` (second loop) or in `rcutree_report_cpu_dead()`,
2097+
never both and never neither. The `rnp->lock` held throughout the sequence
2098+
prevents races - `rcutree_report_cpu_dead()` also acquires this lock when
2099+
clearing `qsmaskinitnext`, ensuring mutual exclusion.
2100+
20142101
Scheduler and RCU
20152102
~~~~~~~~~~~~~~~~~
20162103

kernel/rcu/tree.c

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1886,6 +1886,10 @@ static noinline_for_stack bool rcu_gp_init(void)
18861886
/* Exclude CPU hotplug operations. */
18871887
rcu_for_each_leaf_node(rnp) {
18881888
local_irq_disable();
1889+
/*
1890+
* Serialize with CPU offline. See Requirements.rst > Hotplug CPU >
1891+
* Concurrent Quiescent State Reporting for Offline CPUs.
1892+
*/
18891893
arch_spin_lock(&rcu_state.ofl_lock);
18901894
raw_spin_lock_rcu_node(rnp);
18911895
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
@@ -1960,7 +1964,12 @@ static noinline_for_stack bool rcu_gp_init(void)
19601964
trace_rcu_grace_period_init(rcu_state.name, rnp->gp_seq,
19611965
rnp->level, rnp->grplo,
19621966
rnp->grphi, rnp->qsmask);
1963-
/* Quiescent states for tasks on any now-offline CPUs. */
1967+
/*
1968+
* Quiescent states for tasks on any now-offline CPUs. Since we
1969+
* released the ofl and rnp lock before this loop, CPUs might
1970+
* have gone offline and we have to report QS on their behalf.
1971+
* See Requirements.rst > Hotplug CPU > Concurrent QS Reporting.
1972+
*/
19641973
mask = rnp->qsmask & ~rnp->qsmaskinitnext;
19651974
rnp->rcu_gp_init_mask = mask;
19661975
if ((mask || rnp->wait_blkd_tasks) && rcu_is_leaf_node(rnp))
@@ -4386,6 +4395,13 @@ void rcutree_report_cpu_dead(void)
43864395

43874396
/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
43884397
mask = rdp->grpmask;
4398+
4399+
/*
4400+
* Hold the ofl_lock and rnp lock to avoid races between CPU going
4401+
* offline and doing a QS report (as below), versus rcu_gp_init().
4402+
* See Requirements.rst > Hotplug CPU > Concurrent QS Reporting section
4403+
* for more details.
4404+
*/
43894405
arch_spin_lock(&rcu_state.ofl_lock);
43904406
raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
43914407
rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4396,6 +4412,7 @@ void rcutree_report_cpu_dead(void)
43964412
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
43974413
raw_spin_lock_irqsave_rcu_node(rnp, flags);
43984414
}
4415+
/* Clear from ->qsmaskinitnext to mark offline. */
43994416
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
44004417
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
44014418
arch_spin_unlock(&rcu_state.ofl_lock);

0 commit comments

Comments
 (0)