Skip to content

Conversation

@kernel-patches-daemon-bpf
Copy link

Pull request for series with
subject: bpf: Tighten conditions when timer/wq can be called synchronously
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1050571

Alexei Starovoitov added 4 commits February 3, 2026 20:12
Though hrtimer_start/cancel() inlines all of the smaller helpers in
hrtimer.c and only call timerqueue_add/del() from lib/timerqueue.c where
everything is not traceable and not kprobe-able (because all files in
lib/ are not traceable), there are tracepoints within hrtimer that are
called with locks held. Therefore prevent the deadlock by tightening
conditions when timer/wq can be called synchronously.
hrtimer/wq are using raw_spin_lock_irqsave(), so irqs_disabled() is enough.

Fixes: 1bfbc26 ("bpf: Enable bpf_timer and bpf_wq in any context")
Signed-off-by: Alexei Starovoitov <[email protected]>
Add a testcase that checks that deadlock avoidance is working
as expected.

Signed-off-by: Alexei Starovoitov <[email protected]>
Do not schedule timer/wq operation on a cpu that is in irq_work
callback that is processing async_cmds queue.
Otherwise the following loop is possible:
bpf_timer_start() -> bpf_async_schedule_op() -> irq_work_queue().
irqrestore -> bpf_async_irq_worker() -> tracepoint -> bpf_timer_start().

Fixes: 1bfbc26 ("bpf: Enable bpf_timer and bpf_wq in any context")
Signed-off-by: Alexei Starovoitov <[email protected]>
Strengthen timer_start_deadlock test and check for recursion now

Signed-off-by: Alexei Starovoitov <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: b28dac3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1050571
version: 2

@kernel-patches-review-bot
Copy link

commit b291642856eae07d97b701836aab6c85d97bedc7
Author: Alexei Starovoitov <[email protected]>

bpf: Add a recursion counter to prevent loops in bpf_timer

This commit adds a per-CPU counter to prevent recursive timer/wq operations
when the CPU is already processing the async_cmds queue in the irq_work
callback.

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 0517e9a8f..0f681a89b 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1427,9 +1427,21 @@ static int bpf_async_update_prog_callback(struct bpf_async_cb *cb,
>  	return 0;
>  }
>
> +static DEFINE_PER_CPU(int, async_cb_running);
> +
>  static int bpf_async_schedule_op(struct bpf_async_cb *cb, enum bpf_async_op op,
>  				 u64 nsec, u32 timer_mode)
>  {
> +	/*
> +	 * Do not schedule another operation on this cpu if it's in irq_work
> +	 * callback that is processing async_cmds queue. Otherwise the following
> +	 * loop is possible:
> +	 * bpf_timer_start() -> bpf_async_schedule_op() -> irq_work_queue().
> +	 * irqrestore -> bpf_async_irq_worker() -> tracepoint -> bpf_timer_start().
> +	 */
> +	if (this_cpu_read(async_cb_running))
> +		return -EDEADLK;

Does this early return leak a reference? Looking at the callers:

bpf_timer_start() does:

    if (!refcount_inc_not_zero(&t->cb.refcnt))
        return -ENOENT;
    ...
    return bpf_async_schedule_op(&t->cb, BPF_ASYNC_START, nsecs, mode);

The refcount is incremented before calling bpf_async_schedule_op(). The
existing -ENOMEM path in bpf_async_schedule_op() releases this refcount:

    if (!cmd) {
        bpf_async_refcount_put(cb);
        return -ENOMEM;
    }

But the new -EDEADLK path returns without calling bpf_async_refcount_put(cb).

The same issue affects bpf_wq_start() and bpf_timer_cancel_async(), which
also increment the refcount before calling bpf_async_schedule_op().

> +
>  	struct bpf_async_cmd *cmd = kmalloc_nolock(sizeof(*cmd), 0, NUMA_NO_NODE);
>
>  	if (!cmd) {

[ ... ]



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Add a recursion counter to prevent loops in bpf_timer
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21658239378

AI-authorship-score: low
AI-authorship-explanation: The commit follows established kernel conventions, comes from a known BPF maintainer, and addresses a real technical issue with code patterns typical of experienced kernel developers.
issues-found: 1
issue-severity-score: medium
issue-severity-explanation: The new -EDEADLK error path does not release the refcount taken by callers, leading to reference leaks that can cause memory exhaustion over time.

@kernel-patches-daemon-bpf
Copy link
Author

Forwarding comment 3845171225 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot changed the title bpf: Tighten conditions when timer/wq can be called synchronously bpf: Fix conditions when timer/wq can be called Feb 4, 2026
@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1050571 irrelevant now. Closing PR.

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot deleted the series/1050571=>bpf-next branch February 4, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants