Skip to content

Conversation

@kernel-patches-daemon-bpf-rc
Copy link

Pull request for series with
subject: barrier: Add smp_cond_load_*_timeout()
version: 7
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 48a97ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-review-bot
Copy link

commit 33ad33820ece5fb116e40213b1214560e03cfbdc
Author: Ankur Arora <[email protected]>

cpuidle/poll_state: Poll via smp_cond_load_relaxed_timeout()

The commit replaces the manual polling loop with
smp_cond_load_relaxed_timeout() to simplify the code while maintaining
the same polling behavior with intermittent timeout checks.

> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 9b6d90a72..dc7f4b424 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -8,35 +8,22 @@
>  #include <linux/sched/clock.h>
>  #include <linux/sched/idle.h>
>
> -#define POLL_IDLE_RELAX_COUNT	200
> -
>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
>  			       struct cpuidle_driver *drv, int index)
>  {
> -	u64 time_start;
> -
> -	time_start = local_clock_noinstr();
> +	u64 time_end;
> +	u32 flags = 0;
>
>  	dev->poll_time_limit = false;
>
> +	time_end = local_clock_noinstr() + cpuidle_poll_time(drv, dev);
> +
>  	raw_local_irq_enable();
>  	if (!current_set_polling_and_test()) {
> -		unsigned int loop_count = 0;
> -		u64 limit;
> -
> -		limit = cpuidle_poll_time(drv, dev);
> -
> -		while (!need_resched()) {
> -			cpu_relax();
> -			if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> -				continue;
> -
> -			loop_count = 0;
> -			if (local_clock_noinstr() - time_start > limit) {
> -				dev->poll_time_limit = true;
> -				break;
> -			}
> -		}
> +		flags = smp_cond_load_relaxed_timeout(&current_thread_info()->flags,
> +						      (VAL & _TIF_NEED_RESCHED),
> +						      (local_clock_noinstr() >= time_end));
> +		dev->poll_time_limit = !(flags & _TIF_NEED_RESCHED);
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Can this incorrectly clear poll_time_limit when a timeout occurs?

The old code set poll_time_limit = true specifically when the timeout
condition caused the loop exit. The new code bases poll_time_limit on
the final flag state after smp_cond_load_relaxed_timeout() returns.

Looking at smp_cond_load_relaxed_timeout() in
include/asm-generic/barrier.h, when the timeout expression becomes true,
it performs an additional READ_ONCE() before returning:

    if (time_check_expr) {
        VAL = READ_ONCE(*__PTR);
        break;
    }

This creates a window where _TIF_NEED_RESCHED could be set after the
timeout is detected but before the final read. In this case,
poll_time_limit would be set to false even though the loop exited due
to timeout, not because work became available.

The flag affects idle governor behavior in three places:

1. drivers/cpuidle/governors/haltpoll.c: If poll_time_limit is true,
   switch from polling to halt state

2. drivers/cpuidle/governors/menu.c: If poll_time_limit is true, use
   next_timer_ns instead of actual residency for idle prediction

3. drivers/cpuidle/governors/teo.c: If poll_time_limit is true, mark
   the wakeup as artificial rather than genuine

If poll_time_limit is incorrectly false after a timeout, the governors
would treat a timeout as a genuine wakeup with available work, affecting
their idle state selection heuristics.

>  	}
>  	raw_local_irq_disable();
>


AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: cpuidle/poll_state: Poll via smp_cond_load_relaxed_timeout()
CI run summary: https://github.com/kernel-patches/bpf-rc/actions/runs/18584379078

@kernel-patches-daemon-bpf-rc
Copy link
Author

Forwarding comment 3414086411 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 48a97ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 50de48a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: c67f4ae
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 7361c86
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 04a8995
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

Add smp_cond_load_relaxed_timeout(), which extends
smp_cond_load_relaxed() to allow waiting for a duration.

The waiting loop uses cpu_poll_relax() to wait on the condition
variable with a periodic evaluation of a time-check.

cpu_poll_relax() unless overridden by the arch code, amounts to
a cpu_relax().

The number of times we spin is defined by SMP_TIMEOUT_POLL_COUNT
(chosen to be 200 by default) which, assuming each cpu_poll_relax()
iteration takes around 20-30 cycles (measured on a variety of x86
platforms), for a total of ~4000-6000 cycles.

Cc: Arnd Bergmann <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Haris Okanovic <[email protected]>
Tested-by: Haris Okanovic <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Support waiting in smp_cond_load_relaxed_timeout() via
__cmpwait_relaxed(). Limit this to when the event-stream is enabled,
to ensure that we wake from WFE periodically and don't block forever
if there are no stores to the cacheline.

In the unlikely event that the event-stream is unavailable, fallback
to spin-waiting.

Also set SMP_TIMEOUT_POLL_COUNT to 1 so we do the time-check for each
iteration in smp_cond_load_relaxed_timeout().

Cc: [email protected]
Cc: Catalin Marinas <[email protected]>
Suggested-by: Will Deacon <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
…ait()

In preparation for defining smp_cond_load_acquire_timeout(), remove
the private copy. Lacking this, the rqspinlock code falls back to using
smp_cond_load_acquire().

Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Haris Okanovic <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Add the acquire variant of smp_cond_load_relaxed_timeout(). This
reuses the relaxed variant, with an additional LOAD->LOAD
ordering.

Cc: Arnd Bergmann <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Haris Okanovic <[email protected]>
Tested-by: Haris Okanovic <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Add atomic load wrappers, atomic_cond_read_*_timeout() and
atomic64_cond_read_*_timeout() for the cond-load timeout interfaces.

Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Switch out the conditional load interfaces used by rqspinlock
to atomic_cond_read_acquire_timeout(), and
smp_cond_read_acquire_timeout().

Both these handle the timeout and amortize as needed, so use
check_timeout() directly.

Also, when using spin-wait implementations, redefine SMP_TIMEOUT_POLL_COUNT
to be 16k to be similar to the spin-count used in RES_CHECK_TIMEOUT().

Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
The inner loop in poll_idle() polls over the thread_info flags,
waiting to see if the thread has TIF_NEED_RESCHED set. The loop
exits once the condition is met, or if the poll time limit has
been exceeded.

To minimize the number of instructions executed in each iteration,
the time check is done only intermittently (once every
POLL_IDLE_RELAX_COUNT iterations). In addition, each loop iteration
executes cpu_relax() which on certain platforms provides a hint to
the pipeline that the loop busy-waits, allowing the processor to
reduce power consumption.

This is close to what smp_cond_load_relaxed_timeout() provides. So,
restructure the loop and fold the loop condition and the timeout check
in smp_cond_load_relaxed_timeout().

Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 04a8995
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-daemon-bpf-rc
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679 irrelevant now. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants