Commit 4612115
bpf/helpers: Skip memcg accounting in __bpf_async_init()
Calling bpf_map_kmalloc_node() from __bpf_async_init() can cause various
locking issues; see the following stack trace (edited for style) as one
example:
...
[10.011566] do_raw_spin_lock.cold
[10.011570] try_to_wake_up (5) double-acquiring the same
[10.011575] kick_pool rq_lock, causing a hardlockup
[10.011579] __queue_work
[10.011582] queue_work_on
[10.011585] kernfs_notify
[10.011589] cgroup_file_notify
[10.011593] try_charge_memcg (4) memcg accounting raises an
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event
[10.011599] obj_cgroup_charge_account
[10.011600] __memcg_slab_post_alloc_hook
[10.011603] __kmalloc_node_noprof
...
[10.011611] bpf_map_kmalloc_node
[10.011612] __bpf_async_init
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
[10.011619] bpf__sched_ext_ops_runnable
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
[10.011622] enqueue_task
[10.011626] ttwu_do_activate
[10.011629] sched_ttwu_pending (1) grabs rq_lock
...
The above was reproduced on bpf-next (b338cf8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking [1] the memcg accounting code a bit to make
it (much more likely to) raise an MEMCG_MAX event from a
bpf_timer_init() call.
We have also run into other similar variants both internally (without
applying the [1] hack) and on bpf-next, including:
* run_timer_softirq() -> cgroup_file_notify()
(grabs cgroup_file_kn_lock) -> try_to_wake_up() ->
BPF calls bpf_timer_init() -> bpf_map_kmalloc_node() ->
try_charge_memcg() raises MEMCG_MAX ->
cgroup_file_notify() (tries to grab cgroup_file_kn_lock again)
* __queue_work() (grabs worker_pool::lock) -> try_to_wake_up() ->
BPF calls bpf_timer_init() -> bpf_map_kmalloc_node() ->
try_charge_memcg() raises MEMCG_MAX -> cgroup_file_notify() ->
__queue_work() (tries to grab the same worker_pool::lock)
...
As pointed out by Kumar, we can use bpf_mem_alloc() and friends for
bpf_hrtimer and bpf_work, to skip memcg accounting.
Tested with vmtest.sh (llvm-18, x86-64):
$ ./test_progs -a '*timer*' -a '*wq*'
...
Summary: 7/12 PASSED, 0 SKIPPED, 0 FAILED
[1] Making a bpf_timer_init() call (much more likely) to raise an
MEMCG_MAX event (gist-only, for brevity):
kernel/bpf/helpers.c:__bpf_async_init():
/* allocate hrtimer via map_kmalloc to use memcg accounting */
- cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC, map->numa_node);
+ cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC | __GFP_HACK,
+ map->numa_node);
mm/memcontrol.c:try_charge_memcg():
if (!do_memsw_account() ||
- page_counter_try_charge(&memcg->memsw, batch, &counter)) {
- if (page_counter_try_charge(&memcg->memory, batch, &counter))
+ page_counter_try_charge_hack(&memcg->memsw, batch, &counter,
+ gfp_mask & __GFP_HACK)) {
+ if (page_counter_try_charge_hack(&memcg->memory, batch,
+ &counter,
+ gfp_mask & __GFP_HACK))
goto done_restock;
mm/page_counter.c:page_counter_try_charge():
-bool page_counter_try_charge(struct page_counter *counter,
- unsigned long nr_pages,
- struct page_counter **fail)
+bool page_counter_try_charge_hack(struct page_counter *counter,
+ unsigned long nr_pages,
+ struct page_counter **fail, bool hack)
{
...
- if (new > c->max) {
+ if (hack || new > c->max) { // goto failed;
atomic_long_sub(nr_pages, &c->usage);
/*
Fixes: b00628b ("bpf: Introduce bpf timers.")
Suggested-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Peilin Ye <[email protected]>1 parent 83390c0 commit 4612115
1 file changed
+8
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1089 | 1089 | | |
1090 | 1090 | | |
1091 | 1091 | | |
1092 | | - | |
1093 | | - | |
1094 | | - | |
1095 | | - | |
| 1092 | + | |
1096 | 1093 | | |
1097 | 1094 | | |
1098 | 1095 | | |
| |||
1225 | 1222 | | |
1226 | 1223 | | |
1227 | 1224 | | |
1228 | | - | |
| 1225 | + | |
1229 | 1226 | | |
1230 | 1227 | | |
1231 | 1228 | | |
| |||
1234 | 1231 | | |
1235 | 1232 | | |
1236 | 1233 | | |
1237 | | - | |
| 1234 | + | |
1238 | 1235 | | |
1239 | 1236 | | |
1240 | 1237 | | |
1241 | 1238 | | |
1242 | 1239 | | |
1243 | | - | |
| 1240 | + | |
1244 | 1241 | | |
1245 | 1242 | | |
1246 | 1243 | | |
| |||
1274 | 1271 | | |
1275 | 1272 | | |
1276 | 1273 | | |
1277 | | - | |
1278 | | - | |
| 1274 | + | |
1279 | 1275 | | |
1280 | 1276 | | |
1281 | 1277 | | |
| |||
1571 | 1567 | | |
1572 | 1568 | | |
1573 | 1569 | | |
1574 | | - | |
| 1570 | + | |
1575 | 1571 | | |
1576 | 1572 | | |
1577 | 1573 | | |
| |||
1581 | 1577 | | |
1582 | 1578 | | |
1583 | 1579 | | |
1584 | | - | |
| 1580 | + | |
1585 | 1581 | | |
1586 | 1582 | | |
1587 | 1583 | | |
| |||
1608 | 1604 | | |
1609 | 1605 | | |
1610 | 1606 | | |
1611 | | - | |
| 1607 | + | |
1612 | 1608 | | |
1613 | 1609 | | |
1614 | 1610 | | |
| |||
0 commit comments