Skip to content

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: xdp: fix page_pool leaks
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012904

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: f6fddc6
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012904
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: f6fddc6
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012904
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: a1e83d4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012904
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: a1e83d4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012904
version: 2

Currently, generic XDP hook uses xdp_rxq_info from netstack Rx queues
which do not have its XDP memory model registered. There is a case when
XDP program calls bpf_xdp_adjust_tail() BPF helper, which in turn
releases underlying memory. This happens when it consumes enough amount
of bytes and when XDP buffer has fragments. For this action the memory
model knowledge passed to XDP program is crucial so that core can call
suitable function for freeing/recycling the page.

For netstack queues it defaults to MEM_TYPE_PAGE_SHARED (0) due to lack
of mem model registration. The problem we're fixing here is when kernel
copied the skb to new buffer backed by system's page_pool and XDP buffer
is built around it. Then when bpf_xdp_adjust_tail() calls
__xdp_return(), it acts incorrectly due to mem type not being set to
MEM_TYPE_PAGE_POOL and causes a page leak.

Pull out the existing code from bpf_prog_run_generic_xdp() that
init/prepares xdp_buff onto new helper xdp_convert_skb_to_buff() and
embed there rxq's mem_type initialization that is assigned to xdp_buff.

This problem was triggered by syzbot as well as AF_XDP test suite which
is about to be integrated to BPF CI.

Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/
Fixes: e6d5dbd ("xdp: add multi-buff support for xdp running in generic mode")
Tested-by: Ihor Solodrai <[email protected]>
Co-developed-by: Octavian Purdila <[email protected]>
Signed-off-by: Octavian Purdila <[email protected]> # whole analysis, testing, initiating a fix
Signed-off-by: Maciej Fijalkowski <[email protected]> # commit msg and proposed more robust fix
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Veth calls skb_pp_cow_data() which makes the underlying memory to
originate from system page_pool. For CONFIG_DEBUG_VM=y and XDP program
that uses bpf_xdp_adjust_tail(), following splat was observed:

[   32.204881] BUG: Bad page state in process test_progs  pfn:11c98b
[   32.207167] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11c98b
[   32.210084] flags: 0x1fffe0000000000(node=0|zone=1|lastcpupid=0x7fff)
[   32.212493] raw: 01fffe0000000000 dead000000000040 ff11000123c9b000 0000000000000000
[   32.218056] raw: 0000000000000000 0000000000000001 00000000ffffffff 0000000000000000
[   32.220900] page dumped because: page_pool leak
[   32.222636] Modules linked in: bpf_testmod(O) bpf_preload
[   32.224632] CPU: 6 UID: 0 PID: 3612 Comm: test_progs Tainted: G O        6.17.0-rc5-gfec474d29325 #6969 PREEMPT
[   32.224638] Tainted: [O]=OOT_MODULE
[   32.224639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   32.224641] Call Trace:
[   32.224644]  <IRQ>
[   32.224646]  dump_stack_lvl+0x4b/0x70
[   32.224653]  bad_page.cold+0xbd/0xe0
[   32.224657]  __free_frozen_pages+0x838/0x10b0
[   32.224660]  ? skb_pp_cow_data+0x782/0xc30
[   32.224665]  bpf_xdp_shrink_data+0x221/0x530
[   32.224668]  ? skb_pp_cow_data+0x6d1/0xc30
[   32.224671]  bpf_xdp_adjust_tail+0x598/0x810
[   32.224673]  ? xsk_destruct_skb+0x321/0x800
[   32.224678]  bpf_prog_004ac6bb21de57a7_xsk_xdp_adjust_tail+0x52/0xd6
[   32.224681]  veth_xdp_rcv_skb+0x45d/0x15a0
[   32.224684]  ? get_stack_info_noinstr+0x16/0xe0
[   32.224688]  ? veth_set_channels+0x920/0x920
[   32.224691]  ? get_stack_info+0x2f/0x80
[   32.224693]  ? unwind_next_frame+0x3af/0x1df0
[   32.224697]  veth_xdp_rcv.constprop.0+0x38a/0xbe0
[   32.224700]  ? common_startup_64+0x13e/0x148
[   32.224703]  ? veth_xdp_rcv_one+0xcd0/0xcd0
[   32.224706]  ? stack_trace_save+0x84/0xa0
[   32.224709]  ? stack_depot_save_flags+0x28/0x820
[   32.224713]  ? __resched_curr.constprop.0+0x332/0x3b0
[   32.224716]  ? timerqueue_add+0x217/0x320
[   32.224719]  veth_poll+0x115/0x5e0
[   32.224722]  ? veth_xdp_rcv.constprop.0+0xbe0/0xbe0
[   32.224726]  ? update_load_avg+0x1cb/0x12d0
[   32.224730]  ? update_cfs_group+0x121/0x2c0
[   32.224733]  __napi_poll+0xa0/0x420
[   32.224736]  net_rx_action+0x901/0xe90
[   32.224740]  ? run_backlog_napi+0x50/0x50
[   32.224743]  ? clockevents_program_event+0x1cc/0x280
[   32.224746]  ? hrtimer_interrupt+0x31e/0x7c0
[   32.224749]  handle_softirqs+0x151/0x430
[   32.224752]  do_softirq+0x3f/0x60
[   32.224755]  </IRQ>

It's because xdp_rxq with mem model set to MEM_TYPE_PAGE_SHARED was used
when initializing xdp_buff.

Fix this by using new helper xdp_convert_skb_to_buff() that, besides
init/prepare xdp_buff, will check if page used for linear part of
xdp_buff comes from page_pool. We assume that linear data and frags will
have same memory provider as currently XDP API does not provide us a way
to distinguish it (the mem model is registered for *whole* Rx queue and
here we speak about single buffer granularity).

In order to meet expected skb layout by new helper, pull the mac header
before conversion from skb to xdp_buff.

However, that is not enough as before releasing xdp_buff out of veth via
XDP_{TX,REDIRECT}, mem type on xdp_rxq associated with xdp_buff is
restored to its original model. We need to respect previous setting at
least until buff is converted to frame, as frame carries the mem_type.
Add a page_pool variant of veth_xdp_get() so that we avoid refcount
underflow when draining page frag.

Fixes: 0ebab78 ("net: veth: add page_pool for page recycling")
Reported-by: Alexei Starovoitov <[email protected]>
Closes: https://lore.kernel.org/bpf/CAADnVQ+bBofJDfieyOYzSmSujSfJwDTQhiz3aJw7hE+4E2_iPA@mail.gmail.com/
Signed-off-by: Maciej Fijalkowski <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 1c64efc
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012904
version: 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant