You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
net: fix premature exit from NAPI state polling in napi_disable()
Commit 719c571 ("net: make napi_disable() symmetric with
enable") accidentally introduced a bug sometimes leading to a kernel
BUG when bringing an iface up/down under heavy traffic load.
Prior to this commit, napi_disable() was polling n->state until
none of (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC) is set and then
always flip them. Now there's a possibility to get away with the
NAPIF_STATE_SCHE unset as 'continue' drops us to the cmpxchg()
call with an uninitialized variable, rather than straight to
another round of the state check.
Error path looks like:
napi_disable():
unsigned long val, new; /* new is uninitialized */
do {
val = READ_ONCE(n->state); /* NAPIF_STATE_NPSVC and/or
NAPIF_STATE_SCHED is set */
if (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) { /* true */
usleep_range(20, 200);
continue; /* go straight to the condition check */
}
new = val | <...>
} while (cmpxchg(&n->state, val, new) != val); /* state == val, cmpxchg()
writes garbage */
napi_enable():
do {
val = READ_ONCE(n->state);
BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)); /* 50/50 boom */
<...>
while the typical BUG splat is like:
[ 172.652461] ------------[ cut here ]------------
[ 172.652462] kernel BUG at net/core/dev.c:6937!
[ 172.656914] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 172.661966] CPU: 36 PID: 2829 Comm: xdp_redirect_cp Tainted: G I 5.15.0 #42
[ 172.670222] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ 172.680646] RIP: 0010:napi_enable+0x5a/0xd0
[ 172.684832] Code: 07 49 81 cc 00 01 00 00 4c 89 e2 48 89 d8 80 e6 fb f0 48 0f b1 55 10 48 39 c3 74 10 48 8b 5d 10 f6 c7 04 75 3d f6 c3 01 75 b4 <0f> 0b 5b 5d 41 5c c3 65 ff 05 b8 e5 61 53 48 c7 c6 c0 f3 34 ad 48
[ 172.703578] RSP: 0018:ffffa3c9497477a8 EFLAGS: 00010246
[ 172.708803] RAX: ffffa3c96615a014 RBX: 0000000000000000 RCX: ffff8a4b575301a0
< snip >
[ 172.782403] Call Trace:
[ 172.784857] <TASK>
[ 172.786963] ice_up_complete+0x6f/0x210 [ice]
[ 172.791349] ice_xdp+0x136/0x320 [ice]
[ 172.795108] ? ice_change_mtu+0x180/0x180 [ice]
[ 172.799648] dev_xdp_install+0x61/0xe0
[ 172.803401] dev_xdp_attach+0x1e0/0x550
[ 172.807240] dev_change_xdp_fd+0x1e6/0x220
[ 172.811338] do_setlink+0xee8/0x1010
[ 172.814917] rtnl_setlink+0xe5/0x170
[ 172.818499] ? bpf_lsm_binder_set_context_mgr+0x10/0x10
[ 172.823732] ? security_capable+0x36/0x50
< snip >
Fix this by replacing 'do { } while (cmpxchg())' with an "infinite"
for-loop with an explicit break.
From v1 [0]:
- just use a for-loop to simplify both the fix and the existing
code (Eric).
[0] https://lore.kernel.org/netdev/[email protected]
Fixes: 719c571 ("net: make napi_disable() symmetric with enable")
Suggested-by: Eric Dumazet <[email protected]> # for-loop
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
0 commit comments