-
Notifications
You must be signed in to change notification settings - Fork 5
bpf: Introduce deferred task context execution #5960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Introduce deferred task context execution #5960
Conversation
|
Upstream branch: b13448d |
|
Upstream branch: b13448d |
8de8d05 to
f892027
Compare
|
Upstream branch: b13448d |
f892027 to
c615a89
Compare
|
Upstream branch: b13448d |
c615a89 to
8488cb5
Compare
|
Upstream branch: b13448d |
8488cb5 to
4f44ea7
Compare
|
Upstream branch: b13448d |
4f44ea7 to
8823e9e
Compare
Reduce code duplication in detection of the known special field types in map values. This refactoring helps to avoid copying a chunk of code in the next patch of the series. Signed-off-by: Mykyta Yatsenko <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Acked-by: Andrii Nakryiko <[email protected]>
Refactor the verifier by pulling the common logic from process_timer_func() into a dedicated helper. This allows reusing process_async_func() helper for verifying bpf_task_work struct in the next patch. Signed-off-by: Mykyta Yatsenko <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
Extract the cleanup of known embedded structs into the dedicated helper. Remove duplication and introduce a single source of truth for freeing special embedded structs in hashtab. Signed-off-by: Mykyta Yatsenko <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
The verifier currently enforces a zero return value for all async callbacks—a constraint originally introduced for bpf_timer. That restriction is too narrow for other async use cases. Relax the rule by allowing non-zero return codes from async callbacks in general, while preserving the zero-return requirement for bpf_timer to maintain its existing semantics. Signed-off-by: Mykyta Yatsenko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
This patch adds necessary plumbing in verifier, syscall and maps to support handling new kfunc bpf_task_work_schedule and kernel structure bpf_task_work. The idea is similar to how we already handle bpf_wq and bpf_timer. verifier changes validate calls to bpf_task_work_schedule to make sure it is safe and expected invariants hold. btf part is required to detect bpf_task_work structure inside map value and store its offset, which will be used in the next patch to calculate key and value addresses. arraymap and hashtab changes are needed to handle freeing of the bpf_task_work: run code needed to deinitialize it, for example cancel task_work callback if possible. The use of bpf_task_work and proper implementation for kfuncs are introduced in the next patch. Signed-off-by: Mykyta Yatsenko <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
Calculation of the BPF map key, given the pointer to a value is duplicated in a couple of places in helpers already, in the next patch another use case is introduced as well. This patch extracts that functionality into a separate function. Signed-off-by: Mykyta Yatsenko <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
Implementation of the new bpf_task_work_schedule kfuncs, that let a BPF
program schedule task_work callbacks for a target task:
* bpf_task_work_schedule_signal() - schedules with TWA_SIGNAL
* bpf_task_work_schedule_resume() - schedules with TWA_RESUME
Each map value should embed a struct bpf_task_work, which the kernel
side pairs with struct bpf_task_work_kern, containing a pointer to
struct bpf_task_work_ctx, that maintains metadata relevant for the
concrete callback scheduling.
A small state machine and refcounting scheme ensures safe reuse and
teardown. State transitions:
_______________________________
| |
v |
[standby] ---> [pending] --> [scheduling] --> [scheduled]
^ |________________|_________
| |
| v
| [running]
|_______________________________________________________|
All states may transition into FREED state:
[pending] [scheduling] [scheduled] [running] [standby] -> [freed]
A FREED terminal state coordinates with map-value
deletion (bpf_task_work_cancel_and_free()).
Scheduling itself is deferred via irq_work to keep the kfunc callable
from NMI context.
Lifetime is guarded with refcount_t + RCU Tasks Trace.
Main components:
* struct bpf_task_work_context – Metadata and state management per task
work.
* enum bpf_task_work_state – A state machine to serialize work
scheduling and execution.
* bpf_task_work_schedule() – The central helper that initiates
scheduling.
* bpf_task_work_acquire_ctx() - Attempts to take ownership of the context,
pointed by passed struct bpf_task_work, allocates new context if none
exists yet.
* bpf_task_work_callback() – Invoked when the actual task_work runs.
* bpf_task_work_irq() – An intermediate step (runs in softirq context)
to enqueue task work.
* bpf_task_work_cancel_and_free() – Cleanup for deleted BPF map entries.
Flow of successful task work scheduling
1) bpf_task_work_schedule_* is called from BPF code.
2) Transition state from STANDBY to PENDING, mark context as owned by
this task work scheduler
3) irq_work_queue() schedules bpf_task_work_irq().
4) Transition state from PENDING to SCHEDULING.
5) bpf_task_work_irq() attempts task_work_add(). If successful, state
transitions to SCHEDULED.
6) Task work calls bpf_task_work_callback(), which transition state to
RUNNING.
7) BPF callback is executed
8) Context is cleaned up, refcounts released, context state set back to
STANDBY.
Signed-off-by: Mykyta Yatsenko <[email protected]>
Reviewed-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Eduard Zingerman <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
|
Upstream branch: b13448d |
Introducing selftests that check BPF task work scheduling mechanism. Validate that verifier does not accepts incorrect calls to bpf_task_work_schedule kfunc. Signed-off-by: Mykyta Yatsenko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
8823e9e to
261180b
Compare
Pull request for series with
subject: bpf: Introduce deferred task context execution
version: 4
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1002639