Skip to content

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: barrier: Add smp_cond_load_*_timeout()
version: 7
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 48a97ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-review-bot
Copy link

commit e3a4db2544d352639f9a77ff248bd36fd0c1ccb8
Author: Ankur Arora <[email protected]>

atomic: Add atomic_cond_read_*_timeout()

This patch adds atomic load wrappers with timeout: atomic_cond_read_*_timeout()
and atomic64_cond_read_*_timeout() for the cond-load timeout interfaces.

> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 8dd57c3a9..49f8100ad 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -31,6 +31,16 @@
>  #define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
>  #define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
>
> +#define atomic_cond_read_acquire_timeout(v, c, t) \
> +	smp_cond_load_acquire_timeout(&(v)->counter, (c), (t))
> +#define atomic_cond_read_relaxed_timeout(v, c, t) \
> +	smp_cond_load_relaxed_timeout(&(v)->counter, (c), (t))
> +
> +#define atomic64_cond_read_acquire_timeout(v, c, t) \
> +	smp_cond_load_acquire_timeout(&(v)->counter, (c), (t))
> +#define atomic64_cond_read_relaxed_timeout(v, c, t) \
> +	smp_cond_load_relaxed_timeout(&(v)->counter, (c), (t))
> +

Should atomic_long_cond_read_*_timeout() variants also be added?

The existing pattern in scripts/atomic/gen-atomic-long.sh (lines 83-84,
88-89) and include/linux/atomic/atomic-long.h (lines 15-16, 20-21) shows
that atomic_long has wrappers for the non-timeout variants:

  atomic_long_cond_read_acquire -> atomic64_cond_read_acquire (64-bit)
  atomic_long_cond_read_relaxed -> atomic64_cond_read_relaxed (64-bit)
  atomic_long_cond_read_acquire -> atomic_cond_read_acquire   (32-bit)
  atomic_long_cond_read_relaxed -> atomic_cond_read_relaxed   (32-bit)

There's usage of atomic_long_cond_read_relaxed() in the kernel already
(drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:615).

Without the timeout variants for atomic_long, code using atomic_long_t
would need to directly use atomic_cond_read or atomic64_cond_read based
on CONFIG_64BIT, breaking the abstraction that atomic_long provides.

>  /*
>   * The idea here is to build acquire/release variants by adding explicit
>   * barriers on top of the relaxed variant. In the case where the relaxed



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: atomic: Add atomic_cond_read_*_timeout()
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/18584494860

@kernel-patches-daemon-bpf
Copy link
Author

Forwarding comment 3414092095 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 48a97ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 50de48a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 4f8543b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

Add smp_cond_load_relaxed_timeout(), which extends
smp_cond_load_relaxed() to allow waiting for a duration.

The waiting loop uses cpu_poll_relax() to wait on the condition
variable with a periodic evaluation of a time-check.

cpu_poll_relax() unless overridden by the arch code, amounts to
a cpu_relax().

The number of times we spin is defined by SMP_TIMEOUT_POLL_COUNT
(chosen to be 200 by default) which, assuming each cpu_poll_relax()
iteration takes around 20-30 cycles (measured on a variety of x86
platforms), for a total of ~4000-6000 cycles.

Cc: Arnd Bergmann <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Haris Okanovic <[email protected]>
Tested-by: Haris Okanovic <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Support waiting in smp_cond_load_relaxed_timeout() via
__cmpwait_relaxed(). Limit this to when the event-stream is enabled,
to ensure that we wake from WFE periodically and don't block forever
if there are no stores to the cacheline.

In the unlikely event that the event-stream is unavailable, fallback
to spin-waiting.

Also set SMP_TIMEOUT_POLL_COUNT to 1 so we do the time-check for each
iteration in smp_cond_load_relaxed_timeout().

Cc: [email protected]
Cc: Catalin Marinas <[email protected]>
Suggested-by: Will Deacon <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
…ait()

In preparation for defining smp_cond_load_acquire_timeout(), remove
the private copy. Lacking this, the rqspinlock code falls back to using
smp_cond_load_acquire().

Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Haris Okanovic <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Add the acquire variant of smp_cond_load_relaxed_timeout(). This
reuses the relaxed variant, with an additional LOAD->LOAD
ordering.

Cc: Arnd Bergmann <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Haris Okanovic <[email protected]>
Tested-by: Haris Okanovic <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Add atomic load wrappers, atomic_cond_read_*_timeout() and
atomic64_cond_read_*_timeout() for the cond-load timeout interfaces.

Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 7361c86
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012679
version: 7

Switch out the conditional load interfaces used by rqspinlock
to atomic_cond_read_acquire_timeout(), and
smp_cond_read_acquire_timeout().

Both these handle the timeout and amortize as needed, so use
check_timeout() directly.

Also, when using spin-wait implementations, redefine SMP_TIMEOUT_POLL_COUNT
to be 16k to be similar to the spin-count used in RES_CHECK_TIMEOUT().

Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
The inner loop in poll_idle() polls over the thread_info flags,
waiting to see if the thread has TIF_NEED_RESCHED set. The loop
exits once the condition is met, or if the poll time limit has
been exceeded.

To minimize the number of instructions executed in each iteration,
the time check is done only intermittently (once every
POLL_IDLE_RELAX_COUNT iterations). In addition, each loop iteration
executes cpu_relax() which on certain platforms provides a hint to
the pipeline that the loop busy-waits, allowing the processor to
reduce power consumption.

This is close to what smp_cond_load_relaxed_timeout() provides. So,
restructure the loop and fold the loop condition and the timeout check
in smp_cond_load_relaxed_timeout().

Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant