Commit 76b8c14
bpf/helpers: Use __GFP_HIGH instead of GFP_ATOMIC in __bpf_async_init()
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:
...
[10.011566] do_raw_spin_lock.cold
[10.011570] try_to_wake_up (5) double-acquiring the same
[10.011575] kick_pool rq_lock, causing a hardlockup
[10.011579] __queue_work
[10.011582] queue_work_on
[10.011585] kernfs_notify
[10.011589] cgroup_file_notify
[10.011593] try_charge_memcg (4) memcg accounting raises an
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event
[10.011599] obj_cgroup_charge_account
[10.011600] __memcg_slab_post_alloc_hook
[10.011603] __kmalloc_node_noprof
...
[10.011611] bpf_map_kmalloc_node
[10.011612] __bpf_async_init
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
[10.011619] bpf__sched_ext_ops_runnable
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
[10.011622] enqueue_task
[10.011626] ttwu_do_activate
[10.011629] sched_ttwu_pending (1) grabs rq_lock
...
The above was reproduced on bpf-next (b338cf8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking [1] the memcg accounting code a bit to make
a bpf_timer_init() call much more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that if try_charge_memcg() raises
an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and skip calling cgroup_file_notify(), in order to
avoid the locking issues described above.
Depends on mm patch "memcg: skip cgroup_file_notify if spinning is not
allowed". Tested with vmtest.sh (llvm-18, x86-64):
$ ./test_progs -a '*timer*' -a '*wq*'
...
Summary: 7/12 PASSED, 0 SKIPPED, 0 FAILED
[1] Making bpf_timer_init() much more likely to raise an MEMCG_MAX event
(gist-only, for brevity):
kernel/bpf/helpers.c:__bpf_async_init():
- cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC, map->numa_node);
+ cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC | __GFP_HACK,
+ map->numa_node);
mm/memcontrol.c:try_charge_memcg():
if (!do_memsw_account() ||
- page_counter_try_charge(&memcg->memsw, batch, &counter)) {
- if (page_counter_try_charge(&memcg->memory, batch, &counter))
+ page_counter_try_charge_hack(&memcg->memsw, batch, &counter,
+ gfp_mask & __GFP_HACK)) {
+ if (page_counter_try_charge_hack(&memcg->memory, batch,
+ &counter,
+ gfp_mask & __GFP_HACK))
goto done_restock;
mm/page_counter.c:page_counter_try_charge():
-bool page_counter_try_charge(struct page_counter *counter,
- unsigned long nr_pages,
- struct page_counter **fail)
+bool page_counter_try_charge_hack(struct page_counter *counter,
+ unsigned long nr_pages,
+ struct page_counter **fail, bool hack)
{
...
- if (new > c->max) {
+ if (hack || new > c->max) { // goto failed;
atomic_long_sub(nr_pages, &c->usage);
Fixes: b00628b ("bpf: Introduce bpf timers.")
Suggested-by: Shakeel Butt <[email protected]>
Signed-off-by: Peilin Ye <[email protected]>1 parent 83390c0 commit 76b8c14
1 file changed
+8
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1274 | 1274 | | |
1275 | 1275 | | |
1276 | 1276 | | |
1277 | | - | |
1278 | | - | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
1279 | 1285 | | |
1280 | 1286 | | |
1281 | 1287 | | |
| |||
0 commit comments