Skip to content

bpf: Add support for sleepable raw tracepoint programs#11100

Closed
kernel-patches-daemon-bpf[bot] wants to merge 6 commits intobpf-next_basefrom
series/1055256=>bpf-next
Closed

bpf: Add support for sleepable raw tracepoint programs#11100
kernel-patches-daemon-bpf[bot] wants to merge 6 commits intobpf-next_basefrom
series/1055256=>bpf-next

Conversation

@kernel-patches-daemon-bpf
Copy link

Pull request for series with
subject: bpf: Add support for sleepable raw tracepoint programs
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1055256

Add BPF_TRACE_RAW_TP to the set of tracing program attach types
that can be loaded as sleepable in can_be_sleepable(). The actual
enforcement that the target tracepoint supports sleepable execution
(i.e., is faultable) is deferred to attach time, since the target
tracepoint is not known at program load time.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Add an attach-time check in bpf_raw_tp_link_attach() to ensure that
sleepable BPF programs can only attach to faultable tracepoints.
Faultable tracepoints (e.g., sys_enter, sys_exit) are guaranteed to
run in a context where sleeping is safe, using rcu_tasks_trace for
protection. Non-faultable tracepoints may run in NMI or other
non-sleepable contexts.

This complements the verifier-side change that allows BPF_TRACE_RAW_TP
programs to be loaded as sleepable.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Remove preempt_disable_notrace()/preempt_enable_notrace() from
__BPF_DECLARE_TRACE_SYSCALL, the BPF probe callback wrapper for
faultable (syscall) tracepoints.

The preemption management is now handled inside __bpf_trace_run()
on a per-program basis: migrate_disable() for sleepable programs,
rcu_read_lock() (which implies preempt-off in non-PREEMPT_RCU
configs) for non-sleepable programs. This allows sleepable BPF
programs to actually sleep when attached to faultable tracepoints.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Modify __bpf_trace_run() to support both sleepable and non-sleepable
BPF programs. When the program is sleepable:

- Skip cant_sleep() and instead call might_fault() to annotate
  the faultable context
- Use migrate_disable()/migrate_enable() instead of
  rcu_read_lock()/rcu_read_unlock() to allow sleeping while
  still protecting percpu data access
- The outer rcu_tasks_trace lock is already held by the faultable
  tracepoint callback (__DECLARE_TRACE_SYSCALL), providing lifetime
  protection for the BPF program

For non-sleepable programs, behavior is unchanged: cant_sleep() check,
rcu_read_lock() protection.

This allows multiple BPF programs with different sleepable settings
to coexist on the same faultable tracepoint, since __bpf_trace_run()
is invoked per-link.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Add SEC_DEF for "tp_btf.s+" section prefix, enabling userspace BPF
programs to use SEC("tp_btf.s/<tracepoint>") to load sleepable raw
tracepoint programs. This follows the existing pattern used for
fentry.s, fexit.s, fmod_ret.s, and lsm.s section definitions.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Add two subtests:
- success: Attach a sleepable BPF program to the faultable sys_enter
  tracepoint (tp_btf.s/sys_enter). Verify the program is triggered by
  a syscall.
- reject_non_faultable: Attempt to attach a sleepable BPF program to
  a non-faultable tracepoint (tp_btf.s/sched_switch). Verify that
  attachment is rejected.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 4c51f90
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1055256
version: 1

@kernel-patches-review-bot
Copy link

commit b88562063414f80fcc75d40ae238c759b916ed26
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Remove preempt_disable from faultable tracepoint BPF callbacks

This commit removes the preempt_disable_notrace()/preempt_enable_notrace()
pair from __BPF_DECLARE_TRACE_SYSCALL, the BPF callback wrapper for
faultable tracepoints, stating that preemption is now managed inside
__bpf_trace_run().

> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index 9391d54d3f12..d1de8f9aa07f 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -58,9 +58,7 @@ static notrace void							\
>  __bpf_trace_##call(void *__data, proto)					\
>  {									\
>  	might_fault();							\
> -	preempt_disable_notrace();					\
>  	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));	\
> -	preempt_enable_notrace();					\
>  }

At this commit, __bpf_trace_run() still uses cant_sleep() and
rcu_read_lock() without any migrate_disable():

kernel/trace/bpf_trace.c:__bpf_trace_run() {
    cant_sleep();
    if (unlikely(!bpf_prog_get_recursion_context(prog))) {
        ...
        goto out;
    }
    ...
    rcu_read_lock();
    (void) bpf_prog_run(prog, args);
    rcu_read_unlock();
    ...
out:
    bpf_prog_put_recursion_context(prog);
}

The per-cpu recursion counter in bpf_prog_get_recursion_context() uses
this_cpu_inc_return(), and bpf_prog_put_recursion_context() uses
this_cpu_dec(). Both operations are outside the rcu_read_lock() region.

Faultable tracepoint callbacks run under guard(rcu_tasks_trace) which
does not disable preemption. With the preempt_disable_notrace() removed
and no migrate_disable() added yet, can the task be preempted and
migrated between the per-cpu increment and decrement on PREEMPT_RCU
configs, corrupting the recursion counter?

The next commit in the series (3c61c093bde8 "bpf: Add sleepable
execution path for raw tracepoint programs") adds migrate_disable()
to __bpf_trace_run(), which would address this. Should these two
patches be reordered, or combined, so that the protection is not
removed before its replacement is added?

> The preemption management is now handled inside __bpf_trace_run()
> on a per-program basis: migrate_disable() for sleepable programs,
> rcu_read_lock() (which implies preempt-off in non-PREEMPT_RCU
> configs) for non-sleepable programs.

This description does not match __bpf_trace_run() at this commit.
The function has no migrate_disable() and does not differentiate
between sleepable and non-sleepable programs. Should the commit
message describe the state at this commit rather than after the
subsequent patch?



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Remove preempt_disable from faultable tracepoint BPF callbacks
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22147696553

AI-authorship-score: medium
AI-authorship-explanation: The commit message describes per-program preemption handling (migrate_disable/rcu_read_lock) that does not exist at this commit, reading like an auto-generated summary of the series rather than this specific patch; the code change itself is a simple two-line deletion consistent with human authorship.
issues-found: 2
issue-severity-score: medium
issue-severity-explanation: Patch ordering leaves per-cpu recursion counters unprotected from migration on PREEMPT_RCU configs between this commit and the next, which could silently disable BPF programs on affected CPUs; the issue is fixed by the subsequent commit in the series.

@kernel-patches-review-bot
Copy link

commit 545eecd80a2a1cfd08cc5af1187eb8e91647fbee
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Allow sleepable programs for BPF_TRACE_RAW_TP attach type

This patch adds BPF_TRACE_RAW_TP to the set of tracing program attach
types that can be loaded as sleepable. The commit message states that
enforcement of faultable tracepoint requirements is deferred to attach
time.

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0162f946032f..1973184306e4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -25186,6 +25186,7 @@ static bool can_be_sleepable(struct bpf_prog *prog)
>  		case BPF_MODIFY_RETURN:
>  		case BPF_TRACE_ITER:
>  		case BPF_TRACE_FSESSION:
> +		case BPF_TRACE_RAW_TP:
>  			return true;
>  		default:
>  			return false;

With this change, a sleepable BPF_TRACE_RAW_TP program can be loaded
and then attached to any raw tracepoint, including non-faultable ones,
because bpf_raw_tp_link_attach() does not check whether the target
tracepoint is faultable when prog->sleepable is set.

The verifier sets in_sleepable based on prog->sleepable, so sleepable
helpers like bpf_copy_from_user() would be allowed. At runtime, the
dispatch goes through __bpf_trace_run() in kernel/trace/bpf_trace.c:

    cant_sleep();
    ...
    rcu_read_lock();
    (void) bpf_prog_run(prog, args);
    rcu_read_unlock();

If the program calls a sleepable helper here, would that result in
sleeping in atomic context?

This appears to be addressed by patch 2/6 of this series on lore
("bpf: Reject sleepable raw_tp programs on non-faultable tracepoints")
which adds the missing attach-time check, but that patch is not present
in this tree.

> @@ -25215,7 +25216,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)

[ ... ]



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Allow sleepable programs for BPF_TRACE_RAW_TP attach type
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22147696553

AI-authorship-score: low
AI-authorship-explanation: Small, targeted change with natural technical language consistent with domain expertise; no markers of AI generation.
issues-found: 1
issue-severity-score: medium
issue-severity-explanation: Sleepable BPF program can attach to non-faultable tracepoint and sleep in atomic context due to missing attach-time enforcement; fix exists in series patch 2/6 but is not present in this tree.

@kernel-patches-daemon-bpf
Copy link
Author

@kernel-patches-daemon-bpf
Copy link
Author

@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1055256 expired. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments