Skip to content

Split iterators and KF_FORBID_SLEEP v2#11092

Open
puranjaymohan wants to merge 5 commits intokernel-patches:bpf-next_basefrom
puranjaymohan:sleepable_in_non_sleepable_v2
Open

Split iterators and KF_FORBID_SLEEP v2#11092
puranjaymohan wants to merge 5 commits intokernel-patches:bpf-next_basefrom
puranjaymohan:sleepable_in_non_sleepable_v2

Conversation

@puranjaymohan
Copy link
Contributor

No description provided.

@puranjaymohan puranjaymohan force-pushed the sleepable_in_non_sleepable_v2 branch 2 times, most recently from 80cb6f2 to b54fe0a Compare February 17, 2026 23:43
@puranjaymohan puranjaymohan changed the title Sleepable in non sleepable v2 Split iterators and KF_FORBID_SLEEP v2 Feb 18, 2026
Some iterators hold resources (like mmap_lock in task_vma) that prevent
sleeping. To allow BPF programs to release such resources mid-iteration
and call sleepable helpers, the verifier needs to track acquire/release
semantics on iterator _next pointers.

Repurpose the st->id field on STACK_ITER slots to track the ref_obj_id
of the pointer returned by _next when the kfunc is annotated with
KF_ACQUIRE. This is safe because st->id is initialized to 0 by
__mark_reg_known_zero() in mark_stack_slots_iter() and is not compared
in stacksafe() for STACK_ITER slots.

The lifecycle is:

  _next (KF_ACQUIRE):
    - auto-release old ref if st->id != 0
    - acquire new ref, store ref_obj_id in st->id
    - DRAINED branch: release via st->id, set st->id = 0
    - ACTIVE branch: keeps ref, st->id tracks it

  _release (KF_RELEASE + __iter arg):
    - read st->id, release_reference(), set st->id = 0

  _destroy:
    - release st->id if non-zero before releasing iterator's own ref

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
With KF_ACQUIRE support for iterators in place, we need a way to tell
the verifier that holding a particular acquired reference forbids
sleeping. For example, task_vma's _next holds mmap_lock, so sleeping
between _next and _release must be rejected.

Add a KF_FORBID_SLEEP flag (1 << 17) that can be combined with
KF_ACQUIRE. When acquire_reference() is called for such a kfunc, the
reference is tagged with forbid_sleep=true and a per-state
forbid_sleep_count counter is incremented. When the reference is
released through release_reference_nomark(), the counter is decremented
in the same loop that already scans the refs array.

The counter is checked wherever the verifier decides if sleeping is
allowed.

This is generic and works for both iterator and non-iterator kfuncs.
For iterators, the auto-release and explicit _release from the previous
commit handle the counter decrement automatically via
release_reference().

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
The current implementation of task_vma iterator takes the mmap_lock in
the _new() function and holds it for the entire duration of the
iterator. The next commits will allow releasing the lock in the middle
of the iteration and it would mean that the _next() call should re-take
the mmap_lock.

Move the mmap_lock setup to bpf_iter_task_vma_next()

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Wire up the task_vma iterator to use the KF_ACQUIRE/KF_RELEASE and
KF_FORBID_SLEEP infrastructure from the previous commits.

Annotate bpf_iter_task_vma_next with KF_ACQUIRE | KF_FORBID_SLEEP so
each returned VMA pointer becomes a tracked reference that prevents
sleeping. Add bpf_iter_task_vma_release(it__iter) with KF_RELEASE that
releases the mmap read lock, drops the acquired reference, and
re-enables sleeping. The next call to _next will re-acquire the lock
via mmap_read_trylock().

This enables a split iteration pattern:

  bpf_for_each(task_vma, vma, task, 0) {
      u64 start = vma->vm_start;
      /* vma access allowed, sleeping forbidden */
      bpf_iter_task_vma_release(&___it);
      /* vma access forbidden, sleeping allowed */
      bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
  }

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Add runtime and verifier tests for the split task_vma iteration model.

The runtime test iter_task_vma_release_and_copy iterates VMAs, saves
vm_start, releases mmap_lock via bpf_iter_task_vma_release(), then
calls bpf_copy_from_user() on the saved address.

The verifier tests cover:
- nosleep_iter_release_then_sleep: release then sleep is accepted
- nosleep_iter_sleep_without_release: sleep without release is rejected
  with "in nosleep region"
- nosleep_iter_vma_access_after_release: VMA access after release is
  rejected with "invalid mem access 'scalar'"

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
@puranjaymohan puranjaymohan force-pushed the sleepable_in_non_sleepable_v2 branch from b54fe0a to 129c7e1 Compare February 18, 2026 17:52
@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 2 times, most recently from 291a805 to 44a817a Compare February 19, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments