Skip to content

Conversation

@kernel-patches-daemon-bpf
Copy link

Pull request for series with
subject: Remove task and cgroup local storage percpu counters
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1034704

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: ec439c3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1034704
version: 3

@kernel-patches-review-bot
Copy link

commit 8ba48be3912399547ff846e66dbe976e09dad8e3
Author: Amery Hung <ameryhung@gmail.com>

bpf: Convert bpf_selem_link_map to failable

This patch prepares for the rqspinlock conversion by changing
bpf_selem_link_map() to return int instead of void. Currently the
function always returns 0 with no functional change.

> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 850dd736ccd1..4f8e917f49d9 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -191,7 +191,9 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
>  		}
>
>  		if (new_sk_storage) {
> -			bpf_selem_link_map(smap, copy_selem);
> +			ret = bpf_selem_link_map(smap, copy_selem);
> +			if (ret)
> +				goto out;
>  			bpf_selem_link_storage_nolock(new_sk_storage, copy_selem);
>  		} else {
>  			ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC);

When bpf_selem_link_map() fails, is there missing cleanup here? Looking
at the else branch below (bpf_local_storage_alloc failure), the error
handling includes:

  - bpf_selem_free(copy_selem, true)
  - atomic_sub(smap->elem_size, &newsk->sk_omem_alloc)
  - bpf_map_put(map)

The new error path for bpf_selem_link_map() failure goes directly to the
out label without any of this cleanup. At this point in the code:

  - copy_selem was allocated by bpf_sk_storage_clone_elem() and charged
    via bpf_selem_alloc()->mem_charge()
  - map reference was taken via bpf_map_inc_not_zero()
  - copy_selem has not yet been linked to new_sk_storage (that happens
    in bpf_selem_link_storage_nolock() after the failing call)

The comment at the out label says the caller is responsible for cleanup
via bpf_sk_storage_free(), but since copy_selem was never linked to the
storage list, bpf_local_storage_destroy() will not find it.

Currently this is not triggerable since bpf_selem_link_map() always
returns 0, but commit 615477732ee8 ("bpf: Change local_storage->lock and
b->lock to rqspinlock") later in this series makes the function actually
return errors.



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Convert bpf_selem_link_map to failable
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20346662777

@kernel-patches-daemon-bpf
Copy link
Author

@kernel-patches-review-bot
Copy link

commit 88fc7349625f8f5f10aba236ad6005c2b7f05227
Author: Amery Hung <ameryhung@gmail.com>

bpf: Convert bpf_selem_unlink to failable

This commit converts bpf_selem_unlink() from void to int return type
as preparation for switching to rqspinlock. The commit message states
"It still always succeeds and returns 0" and "No functional change."

> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index 0e3fa5fbaaf3..fa629a180e9e 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -367,7 +367,7 @@ static void bpf_selem_link_map_nolock(struct bpf_local_storage_map *smap,
>  	hlist_add_head_rcu(&selem->map_node, &b->list);
>  }
>
> -void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
> +int bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
>  {
>  	struct bpf_local_storage *local_storage;
>  	bool free_local_storage = false;
>  	HLIST_HEAD(selem_free_list);
>  	unsigned long flags;
>  	int err;

The err variable is declared here but not initialized.

>  	if (unlikely(!selem_linked_to_storage_lockless(selem)))
>  		/* selem has already been unlinked from sk */
> -		return;
> +		return 0;

[ ... ]

>  	raw_spin_lock_irqsave(&local_storage->lock, flags);
>  	if (likely(selem_linked_to_storage(selem))) {

If this condition is false, the if-block is skipped and err is never
assigned. There is a race window between the lockless check above and
this locked check where another CPU could unlink the selem.

>  		err = bpf_selem_unlink_map(selem);
>  		if (err)
>  			goto out;
>
>  		free_local_storage = bpf_selem_unlink_storage_nolock(
>  			local_storage, selem, &selem_free_list);
>  	}

[ ... ]

> +	return err;
>  }

Could this return an uninitialized value when selem_linked_to_storage()
returns false? The commit message says "always succeeds and returns 0"
but err is not initialized to 0.

I note that this appears to be addressed in a later commit in this
series (615477732ee8 "bpf: Change local_storage->lock and b->lock to
rqspinlock") where the lock function returns into err before the
condition check. Should err be initialized to 0 here for correctness
in the intermediate state?



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Convert bpf_selem_unlink to failable
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20346662777

@kernel-patches-review-bot
Copy link

commit e94f5418920de11eb2ecc2ac9ba09afad1770cd4
Author: Amery Hung <ameryhung@gmail.com>

bpf: Convert bpf_selem_unlink_map to failable

This commit prepares for changing bpf_local_storage_map_bucket::lock to
rqspinlock by converting bpf_selem_unlink_map() to return int. It also
introduces _nolock variants and modifies bpf_local_storage_update() to
hold bucket locks during the 3-step update operation.

> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index e2fe6c32822b..4e3f227fd634 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c

[ ... ]

> @@ -488,9 +516,10 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
>  {
>  	struct bpf_local_storage_data *old_sdata = NULL;
>  	struct bpf_local_storage_elem *alloc_selem, *selem = NULL;
> +	struct bpf_local_storage_map_bucket *b, *old_b = NULL;
> +	unsigned long flags, b_flags, old_b_flags;
>  	struct bpf_local_storage *local_storage;
>  	HLIST_HEAD(old_selem_free_list);
> -	unsigned long flags;
>  	int err;

[ ... ]

> @@ -574,20 +603,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
>  		goto unlock;
>  	}
>
> +	b = select_bucket(smap, selem);
> +
> +	if (old_sdata) {
> +		old_b = select_bucket(smap, SELEM(old_sdata));
> +		old_b = old_b == b ? NULL : old_b;
> +	}
> +
> +	raw_spin_lock_irqsave(&b->lock, b_flags);
> +
> +	if (old_b)
> +		raw_spin_lock_irqsave(&old_b->lock, old_b_flags);
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Could this introduce an ABBA deadlock between two bucket locks? If two
threads concurrently update storage on the same map with different owners,
their new and old selems could hash to different buckets with reversed
order:

  Thread 1: new_selem -> bucket_A, old_selem -> bucket_B
            acquires b->lock (A), then waits for old_b->lock (B)

  Thread 2: new_selem -> bucket_B, old_selem -> bucket_A
            acquires b->lock (B), then waits for old_b->lock (A)

Since the locks are regular raw_spinlocks (not rqspinlocks yet), they
will block indefinitely in this scenario.

Note: I see that commit 615477732ee8 ("bpf: Change local_storage->lock
and b->lock to rqspinlock") later in this series changes to rqspinlock
which has deadlock detection and can return -EDEADLK instead of blocking.
However, in the current commit the locks are still raw_spinlocks.

> +
>  	alloc_selem = NULL;
>  	/* First, link the new selem to the map */
> -	bpf_selem_link_map(smap, selem);
> +	bpf_selem_link_map_nolock(smap, selem, b);
>
>  	/* Second, link (and publish) the new selem to local_storage */
>  	bpf_selem_link_storage_nolock(local_storage, selem);
>
>  	/* Third, remove old selem, SELEM(old_sdata) */
>  	if (old_sdata) {
> -		bpf_selem_unlink_map(SELEM(old_sdata));
> +		bpf_selem_unlink_map_nolock(SELEM(old_sdata));
>  		bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
>  						&old_selem_free_list);
>  	}
>
> +	if (old_b)
> +		raw_spin_unlock_irqrestore(&old_b->lock, old_b_flags);
> +
> +	raw_spin_unlock_irqrestore(&b->lock, b_flags);
> +
>  unlock:
>  	raw_spin_unlock_irqrestore(&local_storage->lock, flags);



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Convert bpf_selem_unlink_map to failable
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20346662777

@kernel-patches-review-bot
Copy link

commit 615477732ee820965d38b2c135c37bd39bb14f71
Author: Amery Hung <ameryhung@gmail.com>

bpf: Change local_storage->lock and b->lock to rqspinlock

This commit converts raw_spinlock_t to rqspinlock_t for BPF local storage
locks. The rqspinlock can fail to acquire the lock, returning an error
instead of spinning forever.

The commit message notes: "In bpf_local_storage_destroy(), WARN_ON for now.
A later patch will handle this properly."

Note: These regressions are fixed later in the series by commit 1ddcc5cdf114
("bpf: Switch to bpf_selem_unlink_lockless in bpf_local_storage_{map_free,
destroy}").

> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index fa629a180e9e..1d21ec11c80e 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c

[ ... ]

> @@ -382,7 +390,10 @@ int bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
>  	local_storage = rcu_dereference_check(selem->local_storage,
>  					      bpf_rcu_lock_held());
>
> -	raw_spin_lock_irqsave(&local_storage->lock, flags);
> +	err = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
> +	if (err)
> +		return err;

This change makes bpf_selem_unlink() failable, but bpf_local_storage_map_free()
calls it in a while loop that continues iterating while hlist_first_rcu()
returns elements:

    while ((selem = hlist_entry_safe(
            rcu_dereference_raw(hlist_first_rcu(&b->list)), ...))) {
        WARN_ON(bpf_selem_unlink(selem, true));
    }

If bpf_selem_unlink() fails (returns error), the selem is not removed from
b->list, so hlist_first_rcu() returns the same selem again. Would this cause
an infinite loop printing warnings until the system becomes unresponsive?

[ ... ]

> @@ -719,7 +743,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
>  	 * when unlinking elem from the local_storage->list and
>  	 * the map's bucket->list.
>  	 */
> -	raw_spin_lock_irqsave(&local_storage->lock, flags);
> +	WARN_ON(raw_res_spin_lock_irqsave(&local_storage->lock, flags));
>  	hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
>  		/* Always unlink from map before unlinking from
>  		 * local_storage.
>  		 */
> -		bpf_selem_unlink_map(selem);
> +		WARN_ON(bpf_selem_unlink_map(selem));

If raw_res_spin_lock_irqsave() fails, WARN_ON() prints a warning but does
not prevent execution from continuing. The code then proceeds to iterate
local_storage->list without holding local_storage->lock, and eventually
calls raw_res_spin_unlock_irqrestore() at the end without having acquired
the lock. Does this cause issues with the rqspinlock held-lock tracking,
which unconditionally decrements its counter in res_spin_unlock()?

Additionally, if bpf_selem_unlink_map() fails, the selem remains linked to
the map's bucket list, but execution continues and
bpf_selem_unlink_storage_nolock() removes it from local_storage->list and
adds it to the free list. When the selem is later freed, would the map's
bucket still hold a dangling reference to it?

> @@ -734,7 +758,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
>  		free_storage = bpf_selem_unlink_storage_nolock(
>  			local_storage, selem, &free_selem_list);
>  	}
> -	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> +	raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);

[ ... ]



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Change local_storage->lock and b->lock to rqspinlock
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20346662777

@kernel-patches-daemon-bpf
Copy link
Author

@kernel-patches-daemon-bpf
Copy link
Author

@kernel-patches-daemon-bpf
Copy link
Author

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 3d60306
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1034704
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: d2749ae
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1034704
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: f785a31
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1034704
version: 3

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 13 times, most recently from 5626e08 to 14ee21c Compare January 19, 2026 18:23
@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 17 times, most recently from 1682daf to dc9fd0a Compare January 25, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants