Remove task and cgroup local storage percpu counters #10909

kernel-patches-daemon-bpf · 2026-02-01T18:06:52Z

Pull request for series with
subject: Remove task and cgroup local storage percpu counters
version: 5
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499

kernel-patches-daemon-bpf · 2026-02-01T18:06:54Z

Upstream branch: 4bebb99
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499
version: 5

kernel-patches-review-bot · 2026-02-01T18:14:57Z

commit 5ca113c57e19a4ef0ccfc390cc9274cc9ff66abc
Author: Amery Hung <[email protected]>

bpf: Support lockless unlink when freeing map or local storage

This commit introduces bpf_selem_unlink_nofail() to handle errors from
rqspinlock during map or local storage freeing. It uses a link_cnt refcount
to coordinate between map_free() and destroy() when lock acquisition fails.

> Under normal circumstances, bpf_selem_unlink_nofail() will be able to
> grab locks and unlink a selem from map and local storage in sequeunce,

This isn't a bug, but there is a typo: "sequeunce" should be "sequence".

> Then, only when both map and
> local storage are gone, an selem can be free by the last caller that

This isn't a bug, but there is a grammar issue: "can be free by" should be
"can be freed by".

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Support lockless unlink when freeing map or local storage
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21567682421

AI-authorship-score: low
AI-authorship-explanation: The commit shows deep understanding of kernel synchronization primitives and BPF local storage internals with natural technical writing style typical of experienced kernel developers.
issues-found: 2
issue-severity-score: low
issue-severity-explanation: Minor typo and grammar errors in commit message only, no code issues found.

kernel-patches-daemon-bpf · 2026-02-01T18:22:48Z

Forwarding comment 3831655465 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

kernel-patches-daemon-bpf · 2026-02-02T16:05:46Z

Upstream branch: 6b95cc5
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499
version: 5

kernel-patches-daemon-bpf · 2026-02-03T00:11:39Z

Upstream branch: 6b95cc5
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499
version: 5

kernel-patches-daemon-bpf · 2026-02-03T02:52:57Z

Upstream branch: d95d76a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499
version: 5

kernel-patches-daemon-bpf · 2026-02-03T18:42:09Z

Upstream branch: f941479
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499
version: 5

kernel-patches-daemon-bpf · 2026-02-03T19:04:52Z

Upstream branch: f11f7cf
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499
version: 5

A later bpf_local_storage refactor will acquire all locks before performing any update. To simplified the number of locks needed to take in bpf_local_storage_map_update(), determine the bucket based on the local_storage an selem belongs to instead of the selem pointer. Currently, when a new selem needs to be created to replace the old selem in bpf_local_storage_map_update(), locks of both buckets need to be acquired to prevent racing. This can be simplified if the two selem belongs to the same bucket so that only one bucket needs to be locked. Therefore, instead of hashing selem, hashing the local_storage pointer the selem belongs. Performance wise, this is slightly better as update now requires locking one bucket. It should not change the level of contention on one bucket as the pointers to local storages of selems in a map are just as unique as pointers to selems. Signed-off-by: Amery Hung <[email protected]>

To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_unlink_map() to failable. It still always succeeds and returns 0 for now. Since some operations updating local storage cannot fail in the middle, open-code bpf_selem_unlink_map() to take the b->lock before the operation. There are two such locations: - bpf_local_storage_alloc() The first selem will be unlinked from smap if cmpxchg owner_storage_ptr fails, which should not fail. Therefore, hold b->lock when linking until allocation complete. Helpers that assume b->lock is held by callers are introduced: bpf_selem_link_map_nolock() and bpf_selem_unlink_map_nolock(). - bpf_local_storage_update() The three step update process: link_map(new_selem), link_storage(new_selem), and unlink_map(old_selem) should not fail in the middle. In bpf_selem_unlink(), bpf_selem_unlink_map() and bpf_selem_unlink_storage() should either all succeed or fail as a whole instead of failing in the middle. So, return if unlink_map() failed. Remove the selem_linked_to_map_lockless() check as an selem in the common paths (not bpf_local_storage_map_free() or bpf_local_storage_destroy()), will be unlinked under b->lock and local_storage->lock and therefore no other threads can unlink the selem from map at the same time. In bpf_local_storage_destroy(), ignore the return of bpf_selem_unlink_map() for now. A later patch will allow bpf_local_storage_destroy() to unlink selems even when failing to acquire locks. Note that while this patch removes all callers of selem_linked_to_map(), a later patch that introduces bpf_selem_unlink_nofail() will use it again. Signed-off-by: Amery Hung <[email protected]>

To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_link_map() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Signed-off-by: Amery Hung <[email protected]>

To prepare changing both bpf_local_storage_map_bucket::lock and bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Open code bpf_selem_unlink_storage() in the only caller, bpf_selem_unlink(), since unlink_map and unlink_storage must be done together after all the necessary locks are acquired. For bpf_local_storage_map_free(), ignore the return from bpf_selem_unlink() for now. A later patch will allow it to unlink selems even when failing to acquire locks. Signed-off-by: Amery Hung <[email protected]>

Change bpf_local_storage::lock and bpf_local_storage_map_bucket::lock from raw_spin_lock to rqspinlock. Finally, propagate errors from raw_res_spin_lock_irqsave() to syscall return or BPF helper return. In bpf_local_storage_destroy(), ignore return from raw_res_spin_lock_irqsave() for now. A later patch will allow bpf_local_storage_destroy() to unlink selems even when failing to acquire locks. For __bpf_local_storage_map_cache(), instead of handling the error, skip updating the cache. Signed-off-by: Amery Hung <[email protected]>

The percpu counter in task local storage is no longer needed as the underlying bpf_local_storage can now handle deadlock with the help of rqspinlock. Remove the percpu counter and related migrate_{disable, enable}. Since the percpu counter is removed, merge back bpf_task_storage_get() and bpf_task_storage_get_recur(). This will allow the bpf syscalls and helpers to run concurrently on the same CPU, removing the spurious -EBUSY error. bpf_task_storage_get(..., F_CREATE) will now always succeed with enough free memory unless being called recursively. Signed-off-by: Amery Hung <[email protected]>

The percpu counter in cgroup local storage is no longer needed as the underlying bpf_local_storage can now handle deadlock with the help of rqspinlock. Remove the percpu counter and related migrate_{disable, enable}. Signed-off-by: Amery Hung <[email protected]>

Percpu locks have been removed from cgroup and task local storage. Now that all local storage no longer use percpu variables as locks preventing recursion, there is no need to pass them to bpf_local_storage_map_free(). Remove the argument from the function. Signed-off-by: Amery Hung <[email protected]>

The next patch will introduce bpf_selem_unlink_nofail() to handle rqspinlock errors. bpf_selem_unlink_nofail() will allow an selem to be partially unlinked from map or local storage. Save memory allocation method in selem so that later an selem can be correctly freed even when SDATA(selem)->smap is init to NULL. In addition, keep track of memory charge to the owner in local storage so that later bpf_selem_unlink_nofail() can return the correct memory charge to the owner. Updating selems_size is protected by local_storage->lock. Signed-off-by: Amery Hung <[email protected]>

Introduce bpf_selem_unlink_nofail() to properly handle errors returned from rqspinlock in bpf_local_storage_map_free() and bpf_local_storage_destroy() where the operation must succeeds. The idea of bpf_selem_unlink_nofail() is to allow a selem to be partially linked and use refcount to determine when and who can free the selem if any unlink under lock fails. A selem initially is fully linked to a map and a local storage and therefore selem->link_cnt is set to 2. Under normal circumstances, bpf_selem_unlink_nofail() will be able to grab locks and unlink a selem from map and local storage in sequeunce, just like bpf_selem_unlink(), and then free it after an RCU grace period. However, if any of the lock attempts fails, it will only clear SDATA(selem)->smap or selem->local_storage depending on the caller and decrement link_cnt to signal that the corresponding data structure holding a reference to the selem is gone. Then, only when both map and local storage are gone, an selem can be free by the last caller that turns link_cnt to 0. To make sure bpf_obj_free_fields() is done only once and when map is still present, it is called when unlinking an selem from b->list under b->lock. To make sure uncharging memory is done only when the owner is still present in map_free(), block destroy() from returning until there is no pending map_free(). Later bpf_local_storage_destroy() will return the remaining amount of memory charge tracked by selems_size to the owner. Finally, access of selem, SDATA(selem)->smap and selem->local_storage are racy. Callers will protect these fields with RCU. Co-developed-by: Martin KaFai Lau <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Amery Hung <[email protected]>

…, destroy} Take care of rqspinlock error in bpf_local_storage_{map_free, destroy}() properly by switching to bpf_selem_unlink_nofail(). Both functions iterate their own RCU-protected list of selems and call bpf_selem_unlink_nofail(). In map_free(), to prevent infinite loop when both map_free() and destroy() fail to remove a selem from b->list (extremely unlikely), switch to hlist_for_each_entry_rcu(). In destroy(), also switch to hlist_for_each_entry_rcu() since we no longer iterate local_storage->list under local_storage->lock. In addition, defer it to workqueue as sleep may not always be possible in destroy(). Since selem, SDATA(selem)->smap and selem->local_storage may be seen by map_free() and destroy() at the same time, protect them with RCU. This means passing reuse_now == false to bpf_selem_free() and bpf_local_storage_free(). The local storage map is already protected as bpf_local_storage_map_free() waits for an RCU grace period after iterating b->list and before freeing itself. bpf_selem_unlink() now becomes dedicated to helpers and syscalls paths so reuse_now should always be false. Remove it from the argument and hardcode it. Co-developed-by: Martin KaFai Lau <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Amery Hung <[email protected]>

Check sk_omem_alloc when the caller of bpf_local_storage_destroy() returns. bpf_local_storage_destroy() now returns the memory to uncharge to the caller instead of directly uncharge. Therefore, in the sk_storage_omem_uncharge, check sk_omem_alloc when bpf_sk_storage_free() returns instead of bpf_local_storage_destroy(). Signed-off-by: Amery Hung <[email protected]>

Update the expected result of the selftest as recursion of task local storage syscall and helpers have been relaxed. Now that the percpu counter is removed, task local storage helpers, bpf_task_storage_get() and bpf_task_storage_delete() can now run on the same CPU at the same time unless they cause deadlock. Note that since there is no percpu counter preventing recursion in task local storage helpers, bpf_trampoline now catches the recursion of on_update as reported by recursion_misses. on_enter: tp_btf/sys_enter on_update: fentry/bpf_local_storage_update Old behavior New behavior ____________ ____________ on_enter on_enter bpf_task_storage_get(&map_a) bpf_task_storage_get(&map_a) bpf_task_storage_trylock succeed bpf_local_storage_update(&map_a) bpf_local_storage_update(&map_a) on_update on_update bpf_task_storage_get(&map_a) bpf_task_storage_get(&map_a) bpf_task_storage_trylock fail on_update::misses++ (1) return NULL create and return map_a::ptr map_a::ptr += 1 (1) bpf_task_storage_delete(&map_a) return 0 bpf_task_storage_get(&map_b) bpf_task_storage_get(&map_b) bpf_task_storage_trylock fail on_update::misses++ (2) return NULL create and return map_b::ptr map_b::ptr += 1 (1) create and return map_a::ptr create and return map_a::ptr map_a::ptr = 200 map_a::ptr = 200 bpf_task_storage_get(&map_b) bpf_task_storage_get(&map_b) bpf_task_storage_trylock succeed lockless lookup succeed bpf_local_storage_update(&map_b) return map_b::ptr on_update bpf_task_storage_get(&map_a) bpf_task_storage_trylock fail lockless lookup succeed return map_a::ptr map_a::ptr += 1 (201) bpf_task_storage_delete(&map_a) bpf_task_storage_trylock fail return -EBUSY nr_del_errs++ (1) bpf_task_storage_get(&map_b) bpf_task_storage_trylock fail return NULL create and return ptr map_b::ptr = 100 Expected result: map_a::ptr = 201 map_a::ptr = 200 map_b::ptr = 100 map_b::ptr = 1 nr_del_err = 1 nr_del_err = 0 on_update::recursion_misses = 0 on_update::recursion_misses = 2 On_enter::recursion_misses = 0 on_enter::recursion_misses = 0 Signed-off-by: Amery Hung <[email protected]>

Adjust the error code we are checking against as bpf_task_storage_delete() now returns -EDEADLK or -ETIMEDOUT when deadlock happens. Signed-off-by: Amery Hung <[email protected]>

Remove a test in test_maps that checks if the updating of the percpu counter in task local storage map is preemption safe as the percpu counter is now removed. Signed-off-by: Amery Hung <[email protected]>

bpf_cgrp_storage_busy has been removed. Use bpf_bprintf_nest_level instead. This percpu variable is also in the bpf subsystem so that if it is removed in the future, BPF-CI will catch this type of CI- breaking change. Signed-off-by: Amery Hung <[email protected]>

kernel-patches-daemon-bpf · 2026-02-04T01:06:13Z

Upstream branch: b28dac3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499
version: 5

kernel-patches-daemon-bpf · 2026-02-04T04:16:30Z

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1049499 expired. Closing PR.

kernel-patches-daemon-bpf bot added new bpf-next V5 labels Feb 1, 2026

kernel-patches-review-bot bot added the ai-review label Feb 1, 2026

kernel-patches-daemon-bpf bot added V5-ci-fail and removed ai-review labels Feb 1, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from e6f7b68 to af9bd6d Compare February 2, 2026 16:04

kernel-patches-daemon-bpf bot force-pushed the series/1049499=>bpf-next branch from 5e92f42 to c1daddd Compare February 2, 2026 16:05

kernel-patches-daemon-bpf bot added V5-ci-pass and removed V5-ci-fail labels Feb 2, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from af9bd6d to cecc6f5 Compare February 3, 2026 00:10

kernel-patches-daemon-bpf bot force-pushed the series/1049499=>bpf-next branch from c1daddd to 19a3cdb Compare February 3, 2026 00:11

kernel-patches-daemon-bpf bot added V5-ci-fail V5-ci-pass and removed V5-ci-pass V5-ci-fail labels Feb 3, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from cecc6f5 to 68193d8 Compare February 3, 2026 02:52

kernel-patches-daemon-bpf bot force-pushed the series/1049499=>bpf-next branch from 19a3cdb to 4e1e3b9 Compare February 3, 2026 02:53

kernel-patches-daemon-bpf bot added V5-ci-fail and removed V5-ci-pass labels Feb 3, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 68193d8 to f129d4b Compare February 3, 2026 18:41

kernel-patches-daemon-bpf bot force-pushed the series/1049499=>bpf-next branch from 4e1e3b9 to 51f1c7f Compare February 3, 2026 18:42

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from f129d4b to 8d76c66 Compare February 3, 2026 19:03

kernel-patches-daemon-bpf bot force-pushed the series/1049499=>bpf-next branch from 51f1c7f to 44e8bc8 Compare February 3, 2026 19:04

kernel-patches-daemon-bpf bot added V5-ci-pass and removed V5-ci-fail labels Feb 3, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 8d76c66 to eec9a8d Compare February 4, 2026 01:05

ameryhung added 16 commits February 3, 2026 17:06

selftests/bpf: Update task_local_storage/task_storage_nodeadlock test

87ed42f

Adjust the error code we are checking against as bpf_task_storage_delete() now returns -EDEADLK or -ETIMEDOUT when deadlock happens. Signed-off-by: Amery Hung <[email protected]>

selftests/bpf: Remove test_task_storage_map_stress_lookup

3233940

Remove a test in test_maps that checks if the updating of the percpu counter in task local storage map is preemption safe as the percpu counter is now removed. Signed-off-by: Amery Hung <[email protected]>

kernel-patches-daemon-bpf bot force-pushed the series/1049499=>bpf-next branch from 44e8bc8 to 5382434 Compare February 4, 2026 01:06

kernel-patches-daemon-bpf bot added changes-requested and removed new labels Feb 4, 2026

kernel-patches-daemon-bpf bot closed this Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove task and cgroup local storage percpu counters #10909

Remove task and cgroup local storage percpu counters #10909

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 1, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 1, 2026

Uh oh!

kernel-patches-review-bot bot commented Feb 1, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 1, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 2, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 3, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 3, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 3, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 3, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 4, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Remove task and cgroup local storage percpu counters #10909

Remove task and cgroup local storage percpu counters #10909

Uh oh!

Conversation

kernel-patches-daemon-bpf bot commented Feb 1, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 1, 2026

Uh oh!

kernel-patches-review-bot bot commented Feb 1, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 1, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 2, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 3, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 3, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 3, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 3, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 4, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant