Skip to content

Commit 79f8b28

Browse files
anna-marialxKAGA-KOKO
authored andcommitted
timers: Annotate possible non critical data race of next_expiry
Global timers could be expired remotely when the target CPU is idle. After a remote timer expiry, the remote timer_base->next_expiry value is updated while holding the timer_base->lock. When the formerly idle CPU becomes active at the same time and checks whether timers need to expire, this check is done lockless as it is on the local CPU. This could lead to a data race, which was reported by sysbot: https://lore.kernel.org/r/[email protected] When the value is read lockless but changed by the remote CPU, only two non critical scenarios could happen: 1) The already update value is read -> everything is perfect 2) The old value is read -> a superfluous timer soft interrupt is raised The same situation could happen when enqueueing a new first pinned timer by a remote CPU also with non critical scenarios: 1) The already update value is read -> everything is perfect 2) The old value is read -> when the CPU is idle, an IPI is executed nevertheless and when the CPU isn't idle, the updated value will be visible on the next tick and the timer might be late one jiffie. As this is very unlikely to happen, the overhead of doing the check under the lock is a way more effort, than a superfluous timer soft interrupt or a possible 1 jiffie delay of the timer. Document and annotate this non critical behavior in the code by using READ/WRITE_ONCE() pair when accessing timer_base->next_expiry. Reported-by: [email protected] Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/all/[email protected] Closes: https://lore.kernel.org/lkml/[email protected]
1 parent 4381b89 commit 79f8b28

File tree

1 file changed

+37
-5
lines changed

1 file changed

+37
-5
lines changed

kernel/time/timer.c

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -672,7 +672,7 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
672672
* Set the next expiry time and kick the CPU so it
673673
* can reevaluate the wheel:
674674
*/
675-
base->next_expiry = bucket_expiry;
675+
WRITE_ONCE(base->next_expiry, bucket_expiry);
676676
base->timers_pending = true;
677677
base->next_expiry_recalc = false;
678678
trigger_dyntick_cpu(base, timer);
@@ -1966,7 +1966,7 @@ static void next_expiry_recalc(struct timer_base *base)
19661966
clk += adj;
19671967
}
19681968

1969-
base->next_expiry = next;
1969+
WRITE_ONCE(base->next_expiry, next);
19701970
base->next_expiry_recalc = false;
19711971
base->timers_pending = !(next == base->clk + NEXT_TIMER_MAX_DELTA);
19721972
}
@@ -2020,7 +2020,7 @@ static unsigned long next_timer_interrupt(struct timer_base *base,
20202020
* easy comparable to find out which base holds the first pending timer.
20212021
*/
20222022
if (!base->timers_pending)
2023-
base->next_expiry = basej + NEXT_TIMER_MAX_DELTA;
2023+
WRITE_ONCE(base->next_expiry, basej + NEXT_TIMER_MAX_DELTA);
20242024

20252025
return base->next_expiry;
20262026
}
@@ -2464,8 +2464,40 @@ static void run_local_timers(void)
24642464
hrtimer_run_queues();
24652465

24662466
for (int i = 0; i < NR_BASES; i++, base++) {
2467-
/* Raise the softirq only if required. */
2468-
if (time_after_eq(jiffies, base->next_expiry) ||
2467+
/*
2468+
* Raise the softirq only if required.
2469+
*
2470+
* timer_base::next_expiry can be written by a remote CPU while
2471+
* holding the lock. If this write happens at the same time than
2472+
* the lockless local read, sanity checker could complain about
2473+
* data corruption.
2474+
*
2475+
* There are two possible situations where
2476+
* timer_base::next_expiry is written by a remote CPU:
2477+
*
2478+
* 1. Remote CPU expires global timers of this CPU and updates
2479+
* timer_base::next_expiry of BASE_GLOBAL afterwards in
2480+
* next_timer_interrupt() or timer_recalc_next_expiry(). The
2481+
* worst outcome is a superfluous raise of the timer softirq
2482+
* when the not yet updated value is read.
2483+
*
2484+
* 2. A new first pinned timer is enqueued by a remote CPU
2485+
* and therefore timer_base::next_expiry of BASE_LOCAL is
2486+
* updated. When this update is missed, this isn't a
2487+
* problem, as an IPI is executed nevertheless when the CPU
2488+
* was idle before. When the CPU wasn't idle but the update
2489+
* is missed, then the timer would expire one jiffie late -
2490+
* bad luck.
2491+
*
2492+
* Those unlikely corner cases where the worst outcome is only a
2493+
* one jiffie delay or a superfluous raise of the softirq are
2494+
* not that expensive as doing the check always while holding
2495+
* the lock.
2496+
*
2497+
* Possible remote writers are using WRITE_ONCE(). Local reader
2498+
* uses therefore READ_ONCE().
2499+
*/
2500+
if (time_after_eq(jiffies, READ_ONCE(base->next_expiry)) ||
24692501
(i == BASE_DEF && tmigr_requires_handle_remote())) {
24702502
raise_softirq(TIMER_SOFTIRQ);
24712503
return;

0 commit comments

Comments
 (0)