Split iterators and KF_FORBID_SLEEP v2#11092
Open
puranjaymohan wants to merge 5 commits intokernel-patches:bpf-next_basefrom
Open
Split iterators and KF_FORBID_SLEEP v2#11092puranjaymohan wants to merge 5 commits intokernel-patches:bpf-next_basefrom
puranjaymohan wants to merge 5 commits intokernel-patches:bpf-next_basefrom
Conversation
80cb6f2 to
b54fe0a
Compare
Some iterators hold resources (like mmap_lock in task_vma) that prevent
sleeping. To allow BPF programs to release such resources mid-iteration
and call sleepable helpers, the verifier needs to track acquire/release
semantics on iterator _next pointers.
Repurpose the st->id field on STACK_ITER slots to track the ref_obj_id
of the pointer returned by _next when the kfunc is annotated with
KF_ACQUIRE. This is safe because st->id is initialized to 0 by
__mark_reg_known_zero() in mark_stack_slots_iter() and is not compared
in stacksafe() for STACK_ITER slots.
The lifecycle is:
_next (KF_ACQUIRE):
- auto-release old ref if st->id != 0
- acquire new ref, store ref_obj_id in st->id
- DRAINED branch: release via st->id, set st->id = 0
- ACTIVE branch: keeps ref, st->id tracks it
_release (KF_RELEASE + __iter arg):
- read st->id, release_reference(), set st->id = 0
_destroy:
- release st->id if non-zero before releasing iterator's own ref
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
With KF_ACQUIRE support for iterators in place, we need a way to tell the verifier that holding a particular acquired reference forbids sleeping. For example, task_vma's _next holds mmap_lock, so sleeping between _next and _release must be rejected. Add a KF_FORBID_SLEEP flag (1 << 17) that can be combined with KF_ACQUIRE. When acquire_reference() is called for such a kfunc, the reference is tagged with forbid_sleep=true and a per-state forbid_sleep_count counter is incremented. When the reference is released through release_reference_nomark(), the counter is decremented in the same loop that already scans the refs array. The counter is checked wherever the verifier decides if sleeping is allowed. This is generic and works for both iterator and non-iterator kfuncs. For iterators, the auto-release and explicit _release from the previous commit handle the counter decrement automatically via release_reference(). Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
The current implementation of task_vma iterator takes the mmap_lock in the _new() function and holds it for the entire duration of the iterator. The next commits will allow releasing the lock in the middle of the iteration and it would mean that the _next() call should re-take the mmap_lock. Move the mmap_lock setup to bpf_iter_task_vma_next() Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Wire up the task_vma iterator to use the KF_ACQUIRE/KF_RELEASE and
KF_FORBID_SLEEP infrastructure from the previous commits.
Annotate bpf_iter_task_vma_next with KF_ACQUIRE | KF_FORBID_SLEEP so
each returned VMA pointer becomes a tracked reference that prevents
sleeping. Add bpf_iter_task_vma_release(it__iter) with KF_RELEASE that
releases the mmap read lock, drops the acquired reference, and
re-enables sleeping. The next call to _next will re-acquire the lock
via mmap_read_trylock().
This enables a split iteration pattern:
bpf_for_each(task_vma, vma, task, 0) {
u64 start = vma->vm_start;
/* vma access allowed, sleeping forbidden */
bpf_iter_task_vma_release(&___it);
/* vma access forbidden, sleeping allowed */
bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
}
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Add runtime and verifier tests for the split task_vma iteration model. The runtime test iter_task_vma_release_and_copy iterates VMAs, saves vm_start, releases mmap_lock via bpf_iter_task_vma_release(), then calls bpf_copy_from_user() on the saved address. The verifier tests cover: - nosleep_iter_release_then_sleep: release then sleep is accepted - nosleep_iter_sleep_without_release: sleep without release is rejected with "in nosleep region" - nosleep_iter_vma_access_after_release: VMA access after release is rejected with "invalid mem access 'scalar'" Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
b54fe0a to
129c7e1
Compare
291a805 to
44a817a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.