You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off
rnp->qsmaskinitnext, it means that all accesses from the offline CPU
preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including
callbacks expiration and counter updates.
However interrupts can still fire after stop_machine() re-enables
interrupts and before rcutree_report_cpu_dead(). The related accesses
happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing
are _NOT_ guaranteed to be seen by rcu_barrier() without proper
ordering, especially when callbacks are invoked there to the end, making
rcutree_migrate_callback() bypass barrier_lock.
The following theoretical race example can make rcu_barrier() hang:
CPU 0 CPU 1
----- -----
//cpu_down()
smpboot_park_threads()
//ksoftirqd is parked now
<IRQ>
rcu_sched_clock_irq()
invoke_rcu_core()
do_softirq()
rcu_core()
rcu_do_batch()
// callback storm
// rcu_do_batch() returns
// before completing all
// of them
// do_softirq also returns early because of
// timeout. It defers to ksoftirqd but
// it's parked
</IRQ>
stop_machine()
take_cpu_down()
rcu_barrier()
spin_lock(barrier_lock)
// observes rcu_segcblist_n_cbs(&rdp->cblist) != 0
<IRQ>
do_softirq()
rcu_core()
rcu_do_batch()
//completes all pending callbacks
//smp_mb() implied _after_ callback number dec
</IRQ>
rcutree_report_cpu_dead()
rnp->qsmaskinitnext &= ~rdp->grpmask;
rcutree_migrate_callback()
// no callback, early return without locking
// barrier_lock
//observes !rcu_rdp_cpu_online(rdp)
rcu_barrier_entrain()
rcu_segcblist_entrain()
// Observe rcu_segcblist_n_cbs(rsclp) == 0
// because no barrier between reading
// rnp->qsmaskinitnext and rsclp->len
rcu_segcblist_add_len()
smp_mb__before_atomic()
// will now observe the 0 count and empty
// list, but too late, we enqueue regardless
WRITE_ONCE(rsclp->len, rsclp->len + v);
// ignored barrier callback
// rcu barrier stall...
This could be solved with a read memory barrier, enforcing the message
passing between rnp->qsmaskinitnext and rsclp->len, matching the full
memory barrier after rsclp->len addition in rcu_segcblist_add_len()
performed at the end of rcu_do_batch().
However the rcu_barrier() is complicated enough and probably doesn't
need too many more subtleties. CPU down is a slowpath and the
barrier_lock seldom contended. Solve the issue with unconditionally
locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure
that either rcu_barrier() sees the empty queue or its entrained
callback will be migrated.
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
0 commit comments