Skip to content

Commit 0bf3924

Browse files
committed
x86/entry: Force rcu_irq_enter() when in idle task
The idea of conditionally calling into rcu_irq_enter() only when RCU is not watching turned out to be not completely thought through. Paul noticed occasional premature end of grace periods in RCU torture testing. Bisection led to the commit which made the invocation of rcu_irq_enter() conditional on !rcu_is_watching(). It turned out that this conditional breaks RCU assumptions about the idle task when the scheduler tick happens to be a nested interrupt. Nested interrupts can happen when the first interrupt invokes softirq processing on return which enables interrupts. If that nested tick interrupt does not invoke rcu_irq_enter() then the RCU's irq-nesting checks will believe that this interrupt came directly from idle, which will cause RCU to report a quiescent state. Because this interrupt instead came from a softirq handler which might have been executing an RCU read-side critical section, this can cause the grace period to end prematurely. Change the condition from !rcu_is_watching() to is_idle_task(current) which enforces that interrupts in the idle task unconditionally invoke rcu_irq_enter() independent of the RCU state. This is also correct vs. user mode entries in NOHZ full scenarios because user mode entries bring RCU out of EQS and force the RCU irq nesting state accounting to nested. As only the first interrupt can enter from user mode a nested tick interrupt will enter from kernel mode and as the nesting state accounting is forced to nesting it will not do anything stupid even if rcu_irq_enter() has not been invoked. Fixes: 3eeec38 ("x86/entry: Provide idtentry_entry/exit_cond_rcu()") Reported-by: "Paul E. McKenney" <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: "Paul E. McKenney" <[email protected]> Reviewed-by: "Paul E. McKenney" <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent 71ed49d commit 0bf3924

File tree

1 file changed

+28
-7
lines changed

1 file changed

+28
-7
lines changed

arch/x86/entry/common.c

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -557,14 +557,34 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
557557
return false;
558558
}
559559

560-
if (!__rcu_is_watching()) {
560+
/*
561+
* If this entry hit the idle task invoke rcu_irq_enter() whether
562+
* RCU is watching or not.
563+
*
564+
* Interupts can nest when the first interrupt invokes softirq
565+
* processing on return which enables interrupts.
566+
*
567+
* Scheduler ticks in the idle task can mark quiescent state and
568+
* terminate a grace period, if and only if the timer interrupt is
569+
* not nested into another interrupt.
570+
*
571+
* Checking for __rcu_is_watching() here would prevent the nesting
572+
* interrupt to invoke rcu_irq_enter(). If that nested interrupt is
573+
* the tick then rcu_flavor_sched_clock_irq() would wrongfully
574+
* assume that it is the first interupt and eventually claim
575+
* quiescient state and end grace periods prematurely.
576+
*
577+
* Unconditionally invoke rcu_irq_enter() so RCU state stays
578+
* consistent.
579+
*
580+
* TINY_RCU does not support EQS, so let the compiler eliminate
581+
* this part when enabled.
582+
*/
583+
if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
561584
/*
562585
* If RCU is not watching then the same careful
563586
* sequence vs. lockdep and tracing is required
564587
* as in enter_from_user_mode().
565-
*
566-
* This only happens for IRQs that hit the idle
567-
* loop, i.e. if idle is not using MWAIT.
568588
*/
569589
lockdep_hardirqs_off(CALLER_ADDR0);
570590
rcu_irq_enter();
@@ -576,9 +596,10 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
576596
}
577597

578598
/*
579-
* If RCU is watching then RCU only wants to check
580-
* whether it needs to restart the tick in NOHZ
581-
* mode.
599+
* If RCU is watching then RCU only wants to check whether it needs
600+
* to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
601+
* already contains a warning when RCU is not watching, so no point
602+
* in having another one here.
582603
*/
583604
instrumentation_begin();
584605
rcu_irq_enter_check_tick();

0 commit comments

Comments
 (0)