forked from huangbibo/kernel
-
Notifications
You must be signed in to change notification settings - Fork 1
Linux stable update 6.6.74 rc1 #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
opsiff
wants to merge
86
commits into
linux-stable-update-6.6.73
Choose a base branch
from
linux-stable-update-6.6.74-rc1
base: linux-stable-update-6.6.73
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Linux stable update 6.6.74 rc1 #2
opsiff
wants to merge
86
commits into
linux-stable-update-6.6.73
from
linux-stable-update-6.6.74-rc1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
update leapioraid driver version Signed-off-by: haodongdong <[email protected]>
mainline inclusion from mainline-v6.8-rc1 category: feature Add support for client processors starting from Kaby Lake. Signed-off-by: Srinivas Pandruvada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]> (cherry picked from commit 27f2b08)
mainline inclusion from mainline-v6.13-rc1 category: performance alloc_fd() has a sanity check inside to make sure the struct file mapping to the allocated fd is NULL. Remove this sanity check since it can be assured by exisitng zero initilization and NULL set when recycling fd. Meanwhile, add likely/unlikely and expand_file() call avoidance to reduce the work under file_lock. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Tim Chen <[email protected]> Signed-off-by: Yu Ma <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]> (cherry picked from commit 52732bb)
mainline inclusion from mainline-v6.13-rc1 category: performance 64 bits in open_fds are mapped to a common bit in full_fds_bits. It is very likely that a bit in full_fds_bits has been cleared before in __clear_open_fds()'s operation. Check the clear bit in full_fds_bits before clearing to avoid unnecessary write and cache bouncing. See commit fc90888 ("vfs: conditionally clear close-on-exec flag") for a similar optimization. take stock kernel with patch 1 as baseline, it improves pts/blogbench-1.1.0 read for 13%, and write for 5% on Intel ICX 160 cores configuration with v6.10-rc7. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Tim Chen <[email protected]> Signed-off-by: Yu Ma <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]> (cherry picked from commit c9a3019)
mainline inclusion from mainline-v6.13-rc1 category: performance Skip 2-levels searching via find_next_zero_bit() when there is free slot in the word contains next_fd, as: (1) next_fd indicates the lower bound for the first free fd. (2) There is fast path inside of find_next_zero_bit() when size<=64 to speed up searching. (3) After fdt is expanded (the bitmap size doubled for each time of expansion), it would never be shrunk. The search size increases but there are few open fds available here. This fast path is proposed by Mateusz Guzik <[email protected]>, and agreed by Jan Kara <[email protected]>, which is more generic and scalable than previous versions. And on top of patch 1 and 2, it improves pts/blogbench-1.1.0 read by 8% and write by 4% on Intel ICX 160 cores configuration with v6.10-rc7. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Tim Chen <[email protected]> Signed-off-by: Yu Ma <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]> (cherry picked from commit 0c40bf4)
This reverts commit a1a541f which is commit c45beeb upstream. It is reported to part of a series that causes problems in the 6.6.y tree, so revert it at this point in time and it can come back later if still needed. Reported-by: Ignat Korchagin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Dmitry Safonov <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 950b604)
…ntry" This reverts commit a3f8a2b which is commit 07aeefa upstream. It is reported to part of a series that causes problems in the 6.6.y tree, so revert it at this point in time and it can come back later if still needed. Reported-by: Ignat Korchagin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Dmitry Safonov <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit d1c53de)
This reverts commit 26423e1 which is commit 5b02bfc upstream. It is reported to part of a series that causes problems in the 6.6.y tree, so revert it at this point in time and it can come back later if still needed. Reported-by: Ignat Korchagin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Dmitry Safonov <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 1795ca6)
Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 3b4299f)
Enable compressed SquashFS and OverlayFS support as required by the installation media. Also tune SquashFS defaults to optimise performance. Signed-off-by: Mingcong Bai <[email protected]>
[ Upstream commit 67dba2c ] Use usb_autopm_get_interface() and usb_autopm_put_interface() in btmtk_usb_shutdown(), it could send func ctrl after enabling autosuspend. Bluetooth: btmtk_usb_hci_wmt_sync() hci0: Execution of wmt command timed out Bluetooth: btmtk_usb_shutdown() hci0: Failed to send wmt func ctrl (-110) Fixes: 5c5e8c5 ("Bluetooth: btmtk: move btusb_mtk_[setup, shutdown] to btmtk.c") Signed-off-by: Chris Lu <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit cd4522b)
next inclusion from next-20250122 category: performance Current code prefetches cache lines for the received frame first, and then dma_sync_single_for_cpu() against this frame, this is wrong. Cache prefetch should be triggered after dma_sync_single_for_cpu(). This patch brings ~2.8% driver performance improvement in a TCP RX throughput test with iPerf tool on a single isolated Cortex-A65 CPU core, 2.84 Gbits/sec increased to 2.92 Gbits/sec. Signed-off-by: Furong Xu <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Reviewed-by: Yanteng Si <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
[ Upstream commit 03d120f ] CPSW ALE has 75-bit ALE entries stored across three 32-bit words. The cpsw_ale_get_field() and cpsw_ale_set_field() functions support ALE field entries spanning up to two words at the most. The cpsw_ale_get_field() and cpsw_ale_set_field() functions work as expected when ALE field spanned across word1 and word2, but fails when ALE field spanned across word2 and word3. For example, while reading the ALE field spanned across word2 and word3 (i.e. bits 62 to 64), the word3 data shifted to an incorrect position due to the index becoming zero while flipping. The same issue occurred when setting an ALE entry. This issue has not been seen in practice but will be an issue in the future if the driver supports accessing ALE fields spanning word2 and word3 Fix the methods to handle getting/setting fields spanning up to two words. Fixes: b685f1a ("net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()") Signed-off-by: Sudheer Kumar Doredla <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Roger Quadros <[email protected]> Reviewed-by: Siddharth Vadapalli <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 97ab30ee4b56df69f31d7e5aefb21c16ff25ece9)
[ Upstream commit b3af609 ] As pointed out in the original comment, lookup in sockmap can return a TCP ESTABLISHED socket. Such TCP socket may have had SO_ATTACH_REUSEPORT_EBPF set before it was ESTABLISHED. In other words, a non-NULL sk_reuseport_cb does not imply a non-refcounted socket. Drop sk's reference in both error paths. unreferenced object 0xffff888101911800 (size 2048): comm "test_progs", pid 44109, jiffies 4297131437 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 80 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 9336483b): __kmalloc_noprof+0x3bf/0x560 __reuseport_alloc+0x1d/0x40 reuseport_alloc+0xca/0x150 reuseport_attach_prog+0x87/0x140 sk_reuseport_attach_bpf+0xc8/0x100 sk_setsockopt+0x1181/0x1990 do_sock_setsockopt+0x12b/0x160 __sys_setsockopt+0x7b/0xc0 __x64_sys_setsockopt+0x1b/0x30 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 64d8529 ("bpf: Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH") Signed-off-by: Michal Luczaj <[email protected]> Reviewed-by: Martin KaFai Lau <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 29e6db12e92e393fa7b448bdf73b4a967fdb7e81)
[ Upstream commit 47e55e4 ] Commit in a fixes tag attempted to fix the issue in the following sequence of calls: do_output -> ovs_vport_send -> dev_queue_xmit -> __dev_queue_xmit -> netdev_core_pick_tx -> skb_tx_hash When device is unregistering, the 'dev->real_num_tx_queues' goes to zero and the 'while (unlikely(hash >= qcount))' loop inside the 'skb_tx_hash' becomes infinite, locking up the core forever. But unfortunately, checking just the carrier status is not enough to fix the issue, because some devices may still be in unregistering state while reporting carrier status OK. One example of such device is a net/dummy. It sets carrier ON on start, but it doesn't implement .ndo_stop to set the carrier off. And it makes sense, because dummy doesn't really have a carrier. Therefore, while this device is unregistering, it's still easy to hit the infinite loop in the skb_tx_hash() from the OVS datapath. There might be other drivers that do the same, but dummy by itself is important for the OVS ecosystem, because it is frequently used as a packet sink for tcpdump while debugging OVS deployments. And when the issue is hit, the only way to recover is to reboot. Fix that by also checking if the device is running. The running state is handled by the net core during unregistering, so it covers unregistering case better, and we don't really need to send packets to devices that are not running anyway. While only checking the running state might be enough, the carrier check is preserved. The running and the carrier states seem disjoined throughout the code and different drivers. And other core functions like __dev_direct_xmit() check both before attempting to transmit a packet. So, it seems safer to check both flags in OVS as well. Fixes: 066b867 ("net: openvswitch: fix race on port output") Reported-by: Friedrich Weber <[email protected]> Closes: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-January/053423.html Signed-off-by: Ilya Maximets <[email protected]> Tested-by: Friedrich Weber <[email protected]> Reviewed-by: Aaron Conole <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 3a82fc742457ba2b7a29baba5c16b1ef2f5cd8f7)
[ Upstream commit 76201b5 ] Passing a sufficient amount of imix entries leads to invalid access to the pkt_dev->imix_entries array because of the incorrect boundary check. UBSAN: array-index-out-of-bounds in net/core/pktgen.c:874:24 index 20 is out of range for type 'imix_pkt [20]' CPU: 2 PID: 1210 Comm: bash Not tainted 6.10.0-rc1 deepin-community#121 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl lib/dump_stack.c:117 __ubsan_handle_out_of_bounds lib/ubsan.c:429 get_imix_entries net/core/pktgen.c:874 pktgen_if_write net/core/pktgen.c:1063 pde_write fs/proc/inode.c:334 proc_reg_write fs/proc/inode.c:346 vfs_write fs/read_write.c:593 ksys_write fs/read_write.c:644 do_syscall_64 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:130 Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 52a62f8 ("pktgen: Parse internet mix (imix) input") Signed-off-by: Artem Chernyshev <[email protected]> [ fp: allow to fill the array completely; minor changelog cleanup ] Signed-off-by: Fedor Pchelkin <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 3ced6b39e3c6be6b9a25b857834c6565914b793c)
[ Upstream commit fd4f101 ] Many (struct pernet_operations)->exit_batch() methods have to acquire rtnl. In presence of rtnl mutex pressure, this makes cleanup_net() very slow. This patch adds a new exit_batch_rtnl() method to reduce number of rtnl acquisitions from cleanup_net(). exit_batch_rtnl() handlers are called while rtnl is locked, and devices to be killed can be queued in a list provided as their second argument. A single unregister_netdevice_many() is called right before rtnl is released. exit_batch_rtnl() handlers are called before ->exit() and ->exit_batch() handlers. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Antoine Tenart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Stable-dep-of: 46841c7 ("gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().") Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit fddc755b3b916b54c96e72bf7d063ecc0e711990)
[ Upstream commit 6eedda0 ] exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list. This saves one rtnl_lock()/rtnl_unlock() pair per netns and one unregister_netdevice_many() call per netns. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Antoine Tenart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Stable-dep-of: 46841c7 ("gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().") Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 0db04db13c3a5e7b82d51bc98950f7cf977745f1)
[ Upstream commit 46841c7 ] gtp_newlink() links the gtp device to a list in dev_net(dev). However, even after the gtp device is moved to another netns, it stays on the list but should be invisible. Let's use for_each_netdev_rcu() for netdev traversal in gtp_genl_dump_pdp(). Note that gtp_dev_list is no longer used under RCU, so list helpers are converted to the non-RCU variant. Fixes: 459aa66 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reported-by: Xiao Liang <[email protected]> Closes: https://lore.kernel.org/netdev/CABAhCOQdBL6h9M2C+kd+bGivRJ9Q72JUxW+-gur0nub_=PmFPA@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit a568959c28fb7213bebe4d32b6f40e85943bdca2)
[ Upstream commit eb28fd7 ] gtp_newlink() links the device to a list in dev_net(dev) instead of src_net, where a udp tunnel socket is created. Even when src_net is removed, the device stays alive on dev_net(dev). Then, removing src_net triggers the splat below. [0] In this example, gtp0 is created in ns2, and the udp socket is created in ns1. ip netns add ns1 ip netns add ns2 ip -n ns1 link add netns ns2 name gtp0 type gtp role sgsn ip netns del ns1 Let's link the device to the socket's netns instead. Now, gtp_net_exit_batch_rtnl() needs another netdev iteration to remove all gtp devices in the netns. [0]: ref_tracker: net notrefcnt@000000003d6e7d05 has 1/2 users at sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236) inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252) __sock_create (net/socket.c:1558) udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18) gtp_create_sock (./include/net/udp_tunnel.h:59 drivers/net/gtp.c:1423) gtp_create_sockets (drivers/net/gtp.c:1447) gtp_newlink (drivers/net/gtp.c:1507) rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012) rtnetlink_rcv_msg (net/core/rtnetlink.c:6922) netlink_rcv_skb (net/netlink/af_netlink.c:2542) netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347) netlink_sendmsg (net/netlink/af_netlink.c:1891) ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583) ___sys_sendmsg (net/socket.c:2639) __sys_sendmsg (net/socket.c:2669) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) WARNING: CPU: 1 PID: 60 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179) Modules linked in: CPU: 1 UID: 0 PID: 60 Comm: kworker/u16:2 Not tainted 6.13.0-rc5-00147-g4c1224501e9d deepin-community#5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179) Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89 RSP: 0018:ff11000009a07b60 EFLAGS: 00010286 RAX: 0000000000002bd3 RBX: ff1100000f4e1aa0 RCX: 1ffffffff0e40ac6 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c RBP: ff1100000f4e1af0 R08: 0000000000000001 R09: fffffbfff0e395ae R10: 0000000000000001 R11: 0000000000036001 R12: ff1100000f4e1af0 R13: dead000000000100 R14: ff1100000f4e1af0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9b2464bd98 CR3: 0000000005286005 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn (kernel/panic.c:748) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? report_bug (lib/bug.c:201 lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:285) ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158) ? kfree (mm/slub.c:4613 mm/slub.c:4761) net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467) cleanup_net (net/core/net_namespace.c:664 (discriminator 3)) process_one_work (kernel/workqueue.c:3229) worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK> Fixes: 459aa66 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reported-by: Xiao Liang <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit ab1992964fd74ee0f9153b0b5b6bfdc0dd608853)
[ Upstream commit 16ebb6f ] The "sizeof(struct cmsg_bpf_event) + pkt_size + data_size" math could potentially have an integer wrapping bug on 32bit systems. Check for this and return an error. Fixes: 9816dd3 ("nfp: bpf: perf event output helpers support") Signed-off-by: Dan Carpenter <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 6fb280cace31024137fa79e5b2f2d64f4e352ddb)
[ Upstream commit c17ff47 ] If coalesce_count is greater than 255 it will not fit in the register and will overflow. This can be reproduced by running # ethtool -C ethX rx-frames 256 which will result in a timeout of 0us instead. Fix this by checking for invalid values and reporting an error. Fixes: 8a3b7a2 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Reviewed-by: Radhey Shyam Pandey <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit c010a1f18020eea93dd9e63b0a66575f96839f4e)
[ Upstream commit 001ba09 ] The fec_enet_update_cbd function calls page_pool_dev_alloc_pages but did not handle the case when it returned NULL. There was a WARN_ON(!new_page) but it would still proceed to use the NULL pointer and then crash. This case does seem somewhat rare but when the system is under memory pressure it can happen. One case where I can duplicate this with some frequency is when writing over a smbd share to a SATA HDD attached to an imx6q. Setting /proc/sys/vm/min_free_kbytes to higher values also seems to solve the problem for my test case. But it still seems wrong that the fec driver ignores the memory allocation error and can crash. This commit handles the allocation error by dropping the current packet. Fixes: 95698ff ("net: fec: using page pool to manage RX buffers") Signed-off-by: Kevin Groeneveld <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Wei Fang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 341f43053ff9bf50fd27b0b1eebad64659476b7d)
[ Upstream commit c08d3e6 ] User added steering rules at RDMA_TX were being added to the first prio, which is the counters prio. Fix that so that they are correctly added to the BYPASS_PRIO instead. Fixes: 24670b1 ("net/mlx5: Add support for RDMA TX steering") Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit ecd06d66f86d4fe9e3f08ae16edef4cd953ff904)
[ Upstream commit 5641e82 ] Clear the port select structure on error so no stale values left after definers are destroyed. That's because the mlx5_lag_destroy_definers() always try to destroy all lag definers in the tt_map, so in the flow below lag definers get double-destroyed and cause kernel crash: mlx5_lag_port_sel_create() mlx5_lag_create_definers() mlx5_lag_create_definer() <- Failed on tt 1 mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed mlx5_lag_port_sel_create() mlx5_lag_create_definers() mlx5_lag_create_definer() <- Failed on tt 0 mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 64k pages, 48-bit VAs, pgdp=0000000112ce2e00 [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)] CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G OE 6.11.0+ #2 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core] lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core] sp : ffff800085fafb00 x29: ffff800085fafb00 x28: ffff0000da0c8000 x27: 0000000000000000 x26: ffff0000da0c8000 x25: ffff0000da0c8000 x24: ffff0000da0c8000 x23: ffff0000c31f81a0 x22: 0400000000000000 x21: ffff0000da0c8000 x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8b0c9350 x14: 0000000000000000 x13: ffff800081390d18 x12: ffff800081dc3cc0 x11: 0000000000000001 x10: 0000000000000b10 x9 : ffff80007ab7304c x8 : ffff0000d00711f0 x7 : 0000000000000004 x6 : 0000000000000190 x5 : ffff00027edb3010 x4 : 0000000000000000 x3 : 0000000000000000 x2 : ffff0000d39b8000 x1 : ffff0000d39b8000 x0 : 0400000000000000 Call trace: mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core] mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core] mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core] mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core] mlx5_activate_lag+0x60c/0x6f8 [mlx5_core] mlx5_do_bond_work+0x284/0x5c8 [mlx5_core] process_one_work+0x170/0x3e0 worker_thread+0x2d8/0x3e0 kthread+0x11c/0x128 ret_from_fork+0x10/0x20 Code: a9025bf5 aa0003f6 a90363f7 f90023f9 (f9400400) ---[ end trace 0000000000000000 ]--- Fixes: dc48516 ("net/mlx5: Lag, add support to create definers for LAG") Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 4a7f17cc5d50a7c29c76e8137385edde70db7fad)
[ Upstream commit 2c36880 ] Attempt to enable IPsec packet offload in tunnel mode in debug kernel generates the following kernel panic, which is happening due to two issues: 1. In SA add section, the should be _bh() variant when marking SA mode. 2. There is not needed flush_workqueue in SA delete routine. It is not needed as at this stage as it is removed from SADB and the running work will be canceled later in SA free. ===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 6.12.0+ #4 Not tainted ----------------------------------------------------- charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire: ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] and this task is already holding: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30 which would create a new lock dependency: (&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3} but this new dependency connects a SOFTIRQ-irq-safe lock: (&x->lock){+.-.}-{3:3} ... which became SOFTIRQ-irq-safe at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_timer_handler+0x91/0xd70 __hrtimer_run_queues+0x1dd/0xa60 hrtimer_run_softirq+0x146/0x2e0 handle_softirqs+0x266/0x860 irq_exit_rcu+0x115/0x1a0 sysvec_apic_timer_interrupt+0x6e/0x90 asm_sysvec_apic_timer_interrupt+0x16/0x20 default_idle+0x13/0x20 default_idle_call+0x67/0xa0 do_idle+0x2da/0x320 cpu_startup_entry+0x50/0x60 start_secondary+0x213/0x2a0 common_startup_64+0x129/0x138 to a SOFTIRQ-irq-unsafe lock: (&xa->xa_lock#24){+.+.}-{3:3} ... which became SOFTIRQ-irq-unsafe at: ... lock_acquire+0x1be/0x520 _raw_spin_lock+0x2c/0x40 xa_set_mark+0x70/0x110 mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&xa->xa_lock#24); local_irq_disable(); lock(&x->lock); lock(&xa->xa_lock#24); <Interrupt> lock(&x->lock); *** DEADLOCK *** 2 locks held by charon/1337: #0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90 #1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30 the dependencies between SOFTIRQ-irq-safe lock and the holding lock: -> (&x->lock){+.-.}-{3:3} ops: 29 { HARDIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_alloc_spi+0xc0/0xe60 xfrm_alloc_userspi+0x5f6/0xbc0 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 IN-SOFTIRQ-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_timer_handler+0x91/0xd70 __hrtimer_run_queues+0x1dd/0xa60 hrtimer_run_softirq+0x146/0x2e0 handle_softirqs+0x266/0x860 irq_exit_rcu+0x115/0x1a0 sysvec_apic_timer_interrupt+0x6e/0x90 asm_sysvec_apic_timer_interrupt+0x16/0x20 default_idle+0x13/0x20 default_idle_call+0x67/0xa0 do_idle+0x2da/0x320 cpu_startup_entry+0x50/0x60 start_secondary+0x213/0x2a0 common_startup_64+0x129/0x138 INITIAL USE at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_alloc_spi+0xc0/0xe60 xfrm_alloc_userspi+0x5f6/0xbc0 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 } ... key at: [<ffffffff87f9cd20>] __key.18+0x0/0x40 the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: -> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 { HARDIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 SOFTIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock+0x2c/0x40 xa_set_mark+0x70/0x110 mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 INITIAL USE at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 } ... key at: [<ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core] ... acquired at: __lock_acquire+0x30a0/0x5040 lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] xfrm_dev_state_delete+0x90/0x160 __xfrm_state_delete+0x662/0xae0 xfrm_state_delete+0x1e/0x30 xfrm_del_sa+0x1c2/0x340 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 stack backtrace: CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x74/0xd0 check_irq_usage+0x12e8/0x1d90 ? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0 ? check_chain_key+0x1bb/0x4c0 ? __lockdep_reset_lock+0x180/0x180 ? check_path.constprop.0+0x24/0x50 ? mark_lock+0x108/0x2fb0 ? print_circular_bug+0x9b0/0x9b0 ? mark_lock+0x108/0x2fb0 ? print_usage_bug.part.0+0x670/0x670 ? check_prev_add+0x1c4/0x2310 check_prev_add+0x1c4/0x2310 __lock_acquire+0x30a0/0x5040 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? lockdep_set_lock_cmp_fn+0x190/0x190 lock_acquire+0x1be/0x520 ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? __xfrm_state_delete+0x5f0/0xae0 ? lock_downgrade+0x6b0/0x6b0 _raw_spin_lock_bh+0x34/0x40 ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] xfrm_dev_state_delete+0x90/0x160 __xfrm_state_delete+0x662/0xae0 xfrm_state_delete+0x1e/0x30 xfrm_del_sa+0x1c2/0x340 ? xfrm_get_sa+0x250/0x250 ? check_chain_key+0x1bb/0x4c0 xfrm_user_rcv_msg+0x493/0x880 ? copy_sec_ctx+0x270/0x270 ? check_chain_key+0x1bb/0x4c0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? lockdep_set_lock_cmp_fn+0x190/0x190 netlink_rcv_skb+0x12e/0x380 ? copy_sec_ctx+0x270/0x270 ? netlink_ack+0xd90/0xd90 ? netlink_deliver_tap+0xcd/0xb60 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 ? netlink_attachskb+0x730/0x730 ? lock_acquire+0x1be/0x520 netlink_sendmsg+0x745/0xbe0 ? netlink_unicast+0x740/0x740 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x740/0x740 __sock_sendmsg+0xc5/0x190 ? fdget+0x163/0x1d0 __sys_sendto+0x1fe/0x2c0 ? __x64_sys_getpeername+0xb0/0xb0 ? do_user_addr_fault+0x856/0xe30 ? lock_acquire+0x1be/0x520 ? __task_pid_nr_ns+0x117/0x410 ? lock_downgrade+0x6b0/0x6b0 __x64_sys_sendto+0xdc/0x1b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f7d31291ba4 Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45 RSP: 002b:00007f7d2ccd94f0 EFLAGS: 00000297 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7d31291ba4 RDX: 0000000000000028 RSI: 00007f7d2ccd96a0 RDI: 000000000000000a RBP: 00007f7d2ccd9530 R08: 00007f7d2ccd9598 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000028 R13: 00007f7d2ccd9598 R14: 00007f7d2ccd96a0 R15: 00000000000000e1 </TASK> Fixes: 4c24272 ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode") Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit bb61edaa7ff610e877672c3add012952a1f2bea3)
[ Upstream commit 25f2352 ] All packet offloads SAs have reqid in it to make sure they have corresponding policy. While it is not strictly needed for transparent mode, it is extremely important in tunnel mode. In that mode, policy and SAs have different match criteria. Policy catches the whole subnet addresses, and SA catches the tunnel gateways addresses. The source address of such tunnel is not known during egress packet traversal in flow steering as it is added only after successful encryption. As reqid is required for packet offload and it is unique for every SA, we can safely rely on it only. The output below shows the configured egress policy and SA by strongswan: [leonro@vm ~]$ sudo ip x s src 192.169.101.2 dst 192.169.101.1 proto esp spi 0xc88b7652 reqid 1 mode tunnel replay-window 0 flag af-unspec esn aead rfc4106(gcm(aes)) 0xe406a01083986e14d116488549094710e9c57bc6 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 1, bitmap-length 1 00000000 crypto offload parameters: dev eth2 dir out mode packet [leonro@064 ~]$ sudo ip x p src 192.170.0.0/16 dst 192.170.0.0/16 dir out priority 383615 ptype main tmpl src 192.169.101.2 dst 192.169.101.1 proto esp spi 0xc88b7652 reqid 1 mode tunnel crypto offload parameters: dev eth2 mode packet Fixes: b3beba1 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes") Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 345de9405d6601a3dfcd9c2e02da67a30e92d790)
[ Upstream commit 7f95b02 ] According to RFC4303, section "3.3.3. Sequence Number Generation", the first packet sent using a given SA will contain a sequence number of 1. This is applicable to both ESN and non-ESN mode, which was not covered in commit mentioned in Fixes line. Fixes: 3d42c8c ("net/mlx5e: Ensure that IPsec sequence packet number starts from 1") Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 8188414ea0a4c98d387854e9a07608b1c3540d30)
[ Upstream commit b7d4062 ] Adds a new BO param that keeps the reservation locked after creation. This removes the need to re-reserve the BO after creation which is a waste of cycles. This also fixes a bug in vmw_prime_import_sg_table where the imported reservation is unlocked twice. Signed-off-by: Ian Forbes <[email protected]> Fixes: b32233a ("drm/vmwgfx: Fix prime import/export") Reviewed-by: Zack Rusin <[email protected]> Signed-off-by: Zack Rusin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit f014e8c49c816c9cc298fc939153567f5f016c64)
[ Upstream commit e4b5ccd ] After a job completes, the corresponding pointer in the device must be set to NULL. Failing to do so triggers a warning when unloading the driver, as it appears the job is still active. To prevent this, assign the job pointer to NULL after completing the job, indicating the job has finished. Fixes: 14d1d19 ("drm/v3d: Remove the bad signaled() implementation.") Signed-off-by: Maíra Canal <[email protected]> Reviewed-by: Jose Maria Casanova Crespo <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 13b6febefb041b71d07609ef4c952d1202321638)
opsiff
pushed a commit
that referenced
this pull request
Jul 3, 2025
[ Upstream commit 5d3bc9e5e725aa36cca9b794e340057feb6880b4 ] This patch fixes an issue seen in a large-scale deployment under heavy incoming pkts where the aRFS flow wrongly matches a flow and reprograms the NIC with wrong settings. That mis-steering causes RX-path latency spikes and noisy neighbor effects when many connections collide on the same hash (some of our production servers have 20-30K connections). set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by hashing the skb sized by the per rx-queue table size. This results in multiple connections (even across different rx-queues) getting the same hash value. The driver steer function modifies the wrong flow to use this rx-queue, e.g.: Flow#1 is first added: Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10 Later when a new flow needs to be added: Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20 The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This results in both flows getting un-optimized - packets for Flow#1 goes to q#20, and then reprogrammed back to q#10 later and so on; and Flow #2 programming is never done as Flow#1 is matched first for all misses. Many flows may wrongly share the same hash and reprogram rules of the original flow each with their own q#. Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s 72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS, enable aRFS (global value is 144 * rps_flow_cnt). Test notes about results from ice_rx_flow_steer(): --------------------------------------------------- 1. "Skip:" counter increments here: if (fltr_info->q_index == rxq_idx || arfs_entry->fltr_state != ICE_ARFS_ACTIVE) goto out; 2. "Add:" counter increments here: ret = arfs_entry->fltr_info.fltr_id; INIT_HLIST_NODE(&arfs_entry->list_entry); 3. "Update:" counter increments here: /* update the queue to forward to on an already existing flow */ Runtime comparison: original code vs with the patch for different rps_flow_cnt values. +-------------------------------+--------------+--------------+ | rps_flow_cnt | 512 | 2048 | +-------------------------------+--------------+--------------+ | Ratio of Pkts on Good:Bad q's | 214 vs 822K | 1.1M vs 980K | | Avoid wrong aRFS programming | 0 vs 310K | 0 vs 30K | | CPU User | 216 vs 183 | 216 vs 206 | | CPU System | 1441 vs 1171 | 1447 vs 1320 | | CPU Softirq | 1245 vs 920 | 1238 vs 961 | | CPU Total | 29 vs 22.7 | 29 vs 24.9 | | aRFS Update | 533K vs 59 | 521K vs 32 | | aRFS Skip | 82M vs 77M | 7.2M vs 4.5M | +-------------------------------+--------------+--------------+ A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections showed no performance degradation. Some points on the patch/aRFS behavior: 1. Enabling full tuple matching ensures flows are always correctly matched, even with smaller hash sizes. 2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs and fewer calls to driver for programming on misses. 3. Larger hash tables reduces mis-steering due to more unique flow hashes, but still has clashes. However, with larger per-device rps_flow_cnt, old flows take more time to expire and new aRFS flows cannot be added if h/w limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt pkts have been processed by this cpu that are not part of the flow). Fixes: 28bf267 ("ice: Implement aRFS") Signed-off-by: Krishna Kumar <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit f4c19a8e51eeaf64bc0072e348c6858c4b28c296)
opsiff
pushed a commit
that referenced
this pull request
Jul 3, 2025
[ Upstream commit 10876da918fa1aec0227fb4c67647513447f53a9 ] syzkaller reported a null-ptr-deref in sock_omalloc() while allocating a CALIPSO option. [0] The NULL is of struct sock, which was fetched by sk_to_full_sk() in calipso_req_setattr(). Since commit a1a5344 ("tcp: avoid two atomic ops for syncookies"), reqsk->rsk_listener could be NULL when SYN Cookie is returned to its client, as hinted by the leading SYN Cookie log. Here are 3 options to fix the bug: 1) Return 0 in calipso_req_setattr() 2) Return an error in calipso_req_setattr() 3) Alaways set rsk_listener 1) is no go as it bypasses LSM, but 2) effectively disables SYN Cookie for CALIPSO. 3) is also no go as there have been many efforts to reduce atomic ops and make TCP robust against DDoS. See also commit 3b24d85 ("tcp/dccp: do not touch listener sk_refcnt under synflood"). As of the blamed commit, SYN Cookie already did not need refcounting, and no one has stumbled on the bug for 9 years, so no CALIPSO user will care about SYN Cookie. Let's return an error in calipso_req_setattr() and calipso_req_delattr() in the SYN Cookie case. This can be reproduced by [1] on Fedora and now connect() of nc times out. [0]: TCP: request_sock_TCPv6: Possible SYN flooding on port [::]:20002. Sending cookies. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 3 UID: 0 PID: 12262 Comm: syz.1.2611 Not tainted 6.14.0 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_pnet include/net/net_namespace.h:406 [inline] RIP: 0010:sock_net include/net/sock.h:655 [inline] RIP: 0010:sock_kmalloc+0x35/0x170 net/core/sock.c:2806 Code: 89 d5 41 54 55 89 f5 53 48 89 fb e8 25 e3 c6 fd e8 f0 91 e3 00 48 8d 7b 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 26 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b RSP: 0018:ffff88811af89038 EFLAGS: 00010216 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff888105266400 RDX: 0000000000000006 RSI: ffff88800c890000 RDI: 0000000000000030 RBP: 0000000000000050 R08: 0000000000000000 R09: ffff88810526640e R10: ffffed1020a4cc81 R11: ffff88810526640f R12: 0000000000000000 R13: 0000000000000820 R14: ffff888105266400 R15: 0000000000000050 FS: 00007f0653a07640(0000) GS:ffff88811af80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f863ba096f4 CR3: 00000000163c0005 CR4: 0000000000770ef0 PKRU: 80000000 Call Trace: <IRQ> ipv6_renew_options+0x279/0x950 net/ipv6/exthdrs.c:1288 calipso_req_setattr+0x181/0x340 net/ipv6/calipso.c:1204 calipso_req_setattr+0x56/0x80 net/netlabel/netlabel_calipso.c:597 netlbl_req_setattr+0x18a/0x440 net/netlabel/netlabel_kapi.c:1249 selinux_netlbl_inet_conn_request+0x1fb/0x320 security/selinux/netlabel.c:342 selinux_inet_conn_request+0x1eb/0x2c0 security/selinux/hooks.c:5551 security_inet_conn_request+0x50/0xa0 security/security.c:4945 tcp_v6_route_req+0x22c/0x550 net/ipv6/tcp_ipv6.c:825 tcp_conn_request+0xec8/0x2b70 net/ipv4/tcp_input.c:7275 tcp_v6_conn_request+0x1e3/0x440 net/ipv6/tcp_ipv6.c:1328 tcp_rcv_state_process+0xafa/0x52b0 net/ipv4/tcp_input.c:6781 tcp_v6_do_rcv+0x8a6/0x1a40 net/ipv6/tcp_ipv6.c:1667 tcp_v6_rcv+0x505e/0x5b50 net/ipv6/tcp_ipv6.c:1904 ip6_protocol_deliver_rcu+0x17c/0x1da0 net/ipv6/ip6_input.c:436 ip6_input_finish+0x103/0x180 net/ipv6/ip6_input.c:480 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_input+0x13c/0x6b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:469 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] ip6_rcv_finish+0xb6/0x490 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ipv6_rcv+0xf9/0x490 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core+0x12e/0x1f0 net/core/dev.c:5896 __netif_receive_skb+0x1d/0x170 net/core/dev.c:6009 process_backlog+0x41e/0x13b0 net/core/dev.c:6357 __napi_poll+0xbd/0x710 net/core/dev.c:7191 napi_poll net/core/dev.c:7260 [inline] net_rx_action+0x9de/0xde0 net/core/dev.c:7382 handle_softirqs+0x19a/0x770 kernel/softirq.c:561 do_softirq.part.0+0x36/0x70 kernel/softirq.c:462 </IRQ> <TASK> do_softirq arch/x86/include/asm/preempt.h:26 [inline] __local_bh_enable_ip+0xf1/0x110 kernel/softirq.c:389 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline] __dev_queue_xmit+0xc2a/0x3c40 net/core/dev.c:4679 dev_queue_xmit include/linux/netdevice.h:3313 [inline] neigh_hh_output include/net/neighbour.h:523 [inline] neigh_output include/net/neighbour.h:537 [inline] ip6_finish_output2+0xd69/0x1f80 net/ipv6/ip6_output.c:141 __ip6_finish_output net/ipv6/ip6_output.c:215 [inline] ip6_finish_output+0x5dc/0xd60 net/ipv6/ip6_output.c:226 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x24b/0x8d0 net/ipv6/ip6_output.c:247 dst_output include/net/dst.h:459 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_xmit+0xbbc/0x20d0 net/ipv6/ip6_output.c:366 inet6_csk_xmit+0x39a/0x720 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x1a7b/0x3b40 net/ipv4/tcp_output.c:1471 tcp_transmit_skb net/ipv4/tcp_output.c:1489 [inline] tcp_send_syn_data net/ipv4/tcp_output.c:4059 [inline] tcp_connect+0x1c0c/0x4510 net/ipv4/tcp_output.c:4148 tcp_v6_connect+0x156c/0x2080 net/ipv6/tcp_ipv6.c:333 __inet_stream_connect+0x3a7/0xed0 net/ipv4/af_inet.c:677 tcp_sendmsg_fastopen+0x3e2/0x710 net/ipv4/tcp.c:1039 tcp_sendmsg_locked+0x1e82/0x3570 net/ipv4/tcp.c:1091 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1358 inet6_sendmsg+0xb9/0x150 net/ipv6/af_inet6.c:659 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg+0xf4/0x2a0 net/socket.c:733 __sys_sendto+0x29a/0x390 net/socket.c:2187 __do_sys_sendto net/socket.c:2194 [inline] __se_sys_sendto net/socket.c:2190 [inline] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2190 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc3/0x1d0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f06553c47ed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f0653a06fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f0655605fa0 RCX: 00007f06553c47ed RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000b RBP: 00007f065545db38 R08: 0000200000000140 R09: 000000000000001c R10: f7384d4ea84b01bd R11: 0000000000000246 R12: 0000000000000000 R13: 00007f0655605fac R14: 00007f0655606038 R15: 00007f06539e7000 </TASK> Modules linked in: [1]: dnf install -y selinux-policy-targeted policycoreutils netlabel_tools procps-ng nmap-ncat mount -t selinuxfs none /sys/fs/selinux load_policy netlabelctl calipso add pass doi:1 netlabelctl map del default netlabelctl map add default address:::1 protocol:calipso,1 sysctl net.ipv4.tcp_syncookies=2 nc -l ::1 80 & nc ::1 80 Fixes: e1adea9 ("calipso: Allow request sockets to be relabelled by the lsm.") Reported-by: syzkaller <[email protected]> Reported-by: John Cheung <[email protected]> Closes: https://lore.kernel.org/netdev/CAP=Rh=MvfhrGADy+-WJiftV2_WzMH4VEhEFmeT28qY+4yxNu4w@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Paul Moore <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit bde8833eb075ba8e8674de88e32de6b669966451)
opsiff
pushed a commit
that referenced
this pull request
Jul 3, 2025
[ Upstream commit 6aba0cb5bba6141158d5449f2cf53187b7f755f9 ] As-per the SBI specification, an SBI remote fence operation applies to the entire address space if either: 1) start_addr and size are both 0 2) size is equal to 2^XLEN-1 >From the above, only #1 is checked by SBI SFENCE calls so fix the size parameter check in SBI SFENCE calls to cover #2 as well. Fixes: 13acfec ("RISC-V: KVM: Add remote HFENCE functions based on VCPU requests") Reviewed-by: Atish Patra <[email protected]> Signed-off-by: Anup Patel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Anup Patel <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit e20f0f44ec5942b88c26fd284fc15b78d3d57706)
opsiff
pushed a commit
that referenced
this pull request
Jul 8, 2025
[ Upstream commit 387602d8a75574fafb451b7a8215e78dfd67ee63 ] Don't set WDM_READ flag in wdm_in_callback() for ZLP-s, otherwise when userspace tries to poll for available data, it might - incorrectly - believe there is something available, and when it tries to non-blocking read it, it might get stuck in the read loop. For example this is what glib does for non-blocking read (briefly): 1. poll() 2. if poll returns with non-zero, starts a read data loop: a. loop on poll() (EINTR disabled) b. if revents was set, reads data I. if read returns with EINTR or EAGAIN, goto 2.a. II. otherwise return with data So if ZLP sets WDM_READ (#1), we expect data, and try to read it (#2). But as that was a ZLP, and we are doing non-blocking read, wdm_read() returns with EAGAIN (#2.b.I), so loop again, and try to read again (#2.a.). With glib, we might stuck in this loop forever, as EINTR is disabled (#2.a). Signed-off-by: Robert Hodaszi <[email protected]> Acked-by: Oliver Neukum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 00626325dba74c24bf81dd95bd447fc09902a2f1)
opsiff
pushed a commit
that referenced
this pull request
Jul 8, 2025
[ Upstream commit 711741f94ac3cf9f4e3aa73aa171e76d188c0819 ] Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order and prevent the following deadlock from happening ====================================================== WARNING: possible circular locking dependency detected 6.16.0-rc3-build2+ #1301 Tainted: G S W ------------------------------------------------------ cifsd/6055 is trying to acquire lock: ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200 but task is already holding lock: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ret_buf->chan_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_setup_session+0x81/0x4b0 cifs_get_smb_ses+0x771/0x900 cifs_mount_get_session+0x7e/0x170 cifs_mount+0x92/0x2d0 cifs_smb3_do_mount+0x161/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&ret_buf->ses_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_match_super+0x101/0x320 sget+0xab/0x270 cifs_smb3_do_mount+0x1e0/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}: check_noncircular+0x95/0xc0 check_prev_add+0x115/0x2f0 validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_signal_cifsd_for_reconnect+0x134/0x200 __cifs_reconnect+0x8f/0x500 cifs_handle_standard+0x112/0x280 cifs_demultiplex_thread+0x64d/0xbc0 kthread+0x2f7/0x310 ret_from_fork+0x2a/0x230 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ret_buf->chan_lock); lock(&ret_buf->ses_lock); lock(&ret_buf->chan_lock); lock(&tcp_ses->srv_lock); *** DEADLOCK *** 3 locks held by cifsd/6055: #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 Cc: [email protected] Reported-by: David Howells <[email protected]> Fixes: d7d7a66 ("cifs: avoid use of global locks for high contention data") Reviewed-by: David Howells <[email protected]> Tested-by: David Howells <[email protected]> Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit c82c7041258d96e3286f6790ab700e4edd3cc9e3)
opsiff
pushed a commit
that referenced
this pull request
Jul 9, 2025
[ Upstream commit 387602d8a75574fafb451b7a8215e78dfd67ee63 ] Don't set WDM_READ flag in wdm_in_callback() for ZLP-s, otherwise when userspace tries to poll for available data, it might - incorrectly - believe there is something available, and when it tries to non-blocking read it, it might get stuck in the read loop. For example this is what glib does for non-blocking read (briefly): 1. poll() 2. if poll returns with non-zero, starts a read data loop: a. loop on poll() (EINTR disabled) b. if revents was set, reads data I. if read returns with EINTR or EAGAIN, goto 2.a. II. otherwise return with data So if ZLP sets WDM_READ (#1), we expect data, and try to read it (#2). But as that was a ZLP, and we are doing non-blocking read, wdm_read() returns with EAGAIN (#2.b.I), so loop again, and try to read again (#2.a.). With glib, we might stuck in this loop forever, as EINTR is disabled (#2.a). Signed-off-by: Robert Hodaszi <[email protected]> Acked-by: Oliver Neukum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 00626325dba74c24bf81dd95bd447fc09902a2f1)
opsiff
pushed a commit
that referenced
this pull request
Jul 9, 2025
[ Upstream commit 711741f94ac3cf9f4e3aa73aa171e76d188c0819 ] Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order and prevent the following deadlock from happening ====================================================== WARNING: possible circular locking dependency detected 6.16.0-rc3-build2+ #1301 Tainted: G S W ------------------------------------------------------ cifsd/6055 is trying to acquire lock: ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200 but task is already holding lock: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ret_buf->chan_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_setup_session+0x81/0x4b0 cifs_get_smb_ses+0x771/0x900 cifs_mount_get_session+0x7e/0x170 cifs_mount+0x92/0x2d0 cifs_smb3_do_mount+0x161/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&ret_buf->ses_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_match_super+0x101/0x320 sget+0xab/0x270 cifs_smb3_do_mount+0x1e0/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}: check_noncircular+0x95/0xc0 check_prev_add+0x115/0x2f0 validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_signal_cifsd_for_reconnect+0x134/0x200 __cifs_reconnect+0x8f/0x500 cifs_handle_standard+0x112/0x280 cifs_demultiplex_thread+0x64d/0xbc0 kthread+0x2f7/0x310 ret_from_fork+0x2a/0x230 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ret_buf->chan_lock); lock(&ret_buf->ses_lock); lock(&ret_buf->chan_lock); lock(&tcp_ses->srv_lock); *** DEADLOCK *** 3 locks held by cifsd/6055: #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 Cc: [email protected] Reported-by: David Howells <[email protected]> Fixes: d7d7a66 ("cifs: avoid use of global locks for high contention data") Reviewed-by: David Howells <[email protected]> Tested-by: David Howells <[email protected]> Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit c82c7041258d96e3286f6790ab700e4edd3cc9e3)
opsiff
pushed a commit
that referenced
this pull request
Jul 9, 2025
[ Upstream commit 387602d8a75574fafb451b7a8215e78dfd67ee63 ] Don't set WDM_READ flag in wdm_in_callback() for ZLP-s, otherwise when userspace tries to poll for available data, it might - incorrectly - believe there is something available, and when it tries to non-blocking read it, it might get stuck in the read loop. For example this is what glib does for non-blocking read (briefly): 1. poll() 2. if poll returns with non-zero, starts a read data loop: a. loop on poll() (EINTR disabled) b. if revents was set, reads data I. if read returns with EINTR or EAGAIN, goto 2.a. II. otherwise return with data So if ZLP sets WDM_READ (#1), we expect data, and try to read it (#2). But as that was a ZLP, and we are doing non-blocking read, wdm_read() returns with EAGAIN (#2.b.I), so loop again, and try to read again (#2.a.). With glib, we might stuck in this loop forever, as EINTR is disabled (#2.a). Signed-off-by: Robert Hodaszi <[email protected]> Acked-by: Oliver Neukum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 43ea23645b16b41095939344f28f3c6077014c8a)
opsiff
pushed a commit
that referenced
this pull request
Jul 9, 2025
[ Upstream commit 711741f94ac3cf9f4e3aa73aa171e76d188c0819 ] Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order and prevent the following deadlock from happening ====================================================== WARNING: possible circular locking dependency detected 6.16.0-rc3-build2+ #1301 Tainted: G S W ------------------------------------------------------ cifsd/6055 is trying to acquire lock: ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200 but task is already holding lock: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ret_buf->chan_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_setup_session+0x81/0x4b0 cifs_get_smb_ses+0x771/0x900 cifs_mount_get_session+0x7e/0x170 cifs_mount+0x92/0x2d0 cifs_smb3_do_mount+0x161/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&ret_buf->ses_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_match_super+0x101/0x320 sget+0xab/0x270 cifs_smb3_do_mount+0x1e0/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}: check_noncircular+0x95/0xc0 check_prev_add+0x115/0x2f0 validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_signal_cifsd_for_reconnect+0x134/0x200 __cifs_reconnect+0x8f/0x500 cifs_handle_standard+0x112/0x280 cifs_demultiplex_thread+0x64d/0xbc0 kthread+0x2f7/0x310 ret_from_fork+0x2a/0x230 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ret_buf->chan_lock); lock(&ret_buf->ses_lock); lock(&ret_buf->chan_lock); lock(&tcp_ses->srv_lock); *** DEADLOCK *** 3 locks held by cifsd/6055: #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 Cc: [email protected] Reported-by: David Howells <[email protected]> Fixes: d7d7a66 ("cifs: avoid use of global locks for high contention data") Reviewed-by: David Howells <[email protected]> Tested-by: David Howells <[email protected]> Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 7f3ead8ebc0ef65b6c89a13912b4e80218425629)
opsiff
pushed a commit
that referenced
this pull request
Jul 9, 2025
[ Upstream commit 387602d8a75574fafb451b7a8215e78dfd67ee63 ] Don't set WDM_READ flag in wdm_in_callback() for ZLP-s, otherwise when userspace tries to poll for available data, it might - incorrectly - believe there is something available, and when it tries to non-blocking read it, it might get stuck in the read loop. For example this is what glib does for non-blocking read (briefly): 1. poll() 2. if poll returns with non-zero, starts a read data loop: a. loop on poll() (EINTR disabled) b. if revents was set, reads data I. if read returns with EINTR or EAGAIN, goto 2.a. II. otherwise return with data So if ZLP sets WDM_READ (#1), we expect data, and try to read it (#2). But as that was a ZLP, and we are doing non-blocking read, wdm_read() returns with EAGAIN (#2.b.I), so loop again, and try to read again (#2.a.). With glib, we might stuck in this loop forever, as EINTR is disabled (#2.a). Signed-off-by: Robert Hodaszi <[email protected]> Acked-by: Oliver Neukum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 43ea23645b16b41095939344f28f3c6077014c8a)
opsiff
pushed a commit
that referenced
this pull request
Jul 9, 2025
[ Upstream commit 711741f94ac3cf9f4e3aa73aa171e76d188c0819 ] Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order and prevent the following deadlock from happening ====================================================== WARNING: possible circular locking dependency detected 6.16.0-rc3-build2+ #1301 Tainted: G S W ------------------------------------------------------ cifsd/6055 is trying to acquire lock: ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200 but task is already holding lock: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ret_buf->chan_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_setup_session+0x81/0x4b0 cifs_get_smb_ses+0x771/0x900 cifs_mount_get_session+0x7e/0x170 cifs_mount+0x92/0x2d0 cifs_smb3_do_mount+0x161/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&ret_buf->ses_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_match_super+0x101/0x320 sget+0xab/0x270 cifs_smb3_do_mount+0x1e0/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}: check_noncircular+0x95/0xc0 check_prev_add+0x115/0x2f0 validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_signal_cifsd_for_reconnect+0x134/0x200 __cifs_reconnect+0x8f/0x500 cifs_handle_standard+0x112/0x280 cifs_demultiplex_thread+0x64d/0xbc0 kthread+0x2f7/0x310 ret_from_fork+0x2a/0x230 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ret_buf->chan_lock); lock(&ret_buf->ses_lock); lock(&ret_buf->chan_lock); lock(&tcp_ses->srv_lock); *** DEADLOCK *** 3 locks held by cifsd/6055: #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 Cc: [email protected] Reported-by: David Howells <[email protected]> Fixes: d7d7a66 ("cifs: avoid use of global locks for high contention data") Reviewed-by: David Howells <[email protected]> Tested-by: David Howells <[email protected]> Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 7f3ead8ebc0ef65b6c89a13912b4e80218425629)
opsiff
pushed a commit
that referenced
this pull request
Jul 11, 2025
[ Upstream commit b2beb5bb2cd90d7939e470ed4da468683f41baa3 ] With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated on module load: [ 324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578 [ 324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager [ 324.701689] preempt_count: 201, expected: 0 [ 324.701693] RCU nest depth: 0, expected: 0 [ 324.701697] 2 locks held by NetworkManager/1582: [ 324.701702] #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0 [ 324.701730] #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870 [ 324.701749] Preemption disabled at: [ 324.701752] [<ffffffff9cd23b9d>] __dev_open+0x3dd/0x870 [ 324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary) [ 324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 [ 324.701774] Call Trace: [ 324.701777] <TASK> [ 324.701779] dump_stack_lvl+0x5d/0x80 [ 324.701788] ? __dev_open+0x3dd/0x870 [ 324.701793] __might_resched.cold+0x1ef/0x23d <..> [ 324.701818] __mutex_lock+0x113/0x1b80 <..> [ 324.701917] idpf_ctlq_clean_sq+0xad/0x4b0 [idpf] [ 324.701935] ? kasan_save_track+0x14/0x30 [ 324.701941] idpf_mb_clean+0x143/0x380 [idpf] <..> [ 324.701991] idpf_send_mb_msg+0x111/0x720 [idpf] [ 324.702009] idpf_vc_xn_exec+0x4cc/0x990 [idpf] [ 324.702021] ? rcu_is_watching+0x12/0xc0 [ 324.702035] idpf_add_del_mac_filters+0x3ed/0xb50 [idpf] <..> [ 324.702122] __hw_addr_sync_dev+0x1cf/0x300 [ 324.702126] ? find_held_lock+0x32/0x90 [ 324.702134] idpf_set_rx_mode+0x317/0x390 [idpf] [ 324.702152] __dev_open+0x3f8/0x870 [ 324.702159] ? __pfx___dev_open+0x10/0x10 [ 324.702174] __dev_change_flags+0x443/0x650 <..> [ 324.702208] netif_change_flags+0x80/0x160 [ 324.702218] do_setlink.isra.0+0x16a0/0x3960 <..> [ 324.702349] rtnl_newlink+0x12fd/0x21e0 The sequence is as follows: rtnl_newlink()-> __dev_change_flags()-> __dev_open()-> dev_set_rx_mode() - > # disables BH and grabs "dev->addr_list_lock" idpf_set_rx_mode() -> # proceed only if VIRTCHNL2_CAP_MACFILTER is ON __dev_uc_sync() -> idpf_add_mac_filter -> idpf_add_del_mac_filters -> idpf_send_mb_msg() -> idpf_mb_clean() -> idpf_ctlq_clean_sq() # mutex_lock(cq_lock) Fix by converting cq_lock to a spinlock. All operations under the new lock are safe except freeing the DMA memory, which may use vunmap(). Fix by requesting a contiguous physical memory for the DMA mapping. Fixes: a251eee ("idpf: add SRIOV support and other ndo_ops") Reviewed-by: Aleksandr Loktionov <[email protected]> Signed-off-by: Ahmed Zaki <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 9a36715cd6bc6a6f16230e19a7f947bab34b3fe5)
opsiff
pushed a commit
that referenced
this pull request
Jul 11, 2025
[ Upstream commit 2ed25aa7f7711f508b6120e336f05cd9d49943c0 ] The issue arises when kzalloc() is invoked while holding umem_mutex or any other lock acquired under umem_mutex. This is problematic because kzalloc() can trigger fs_reclaim_aqcuire(), which may, in turn, invoke mmu_notifier_invalidate_range_start(). This function can lead to mlx5_ib_invalidate_range(), which attempts to acquire umem_mutex again, resulting in a deadlock. The problematic flow: CPU0 | CPU1 ---------------------------------------|------------------------------------------------ mlx5_ib_dereg_mr() | → revoke_mr() | → mutex_lock(&umem_odp->umem_mutex) | | mlx5_mkey_cache_init() | → mutex_lock(&dev->cache.rb_lock) | → mlx5r_cache_create_ent_locked() | → kzalloc(GFP_KERNEL) | → fs_reclaim() | → mmu_notifier_invalidate_range_start() | → mlx5_ib_invalidate_range() | → mutex_lock(&umem_odp->umem_mutex) → cache_ent_find_and_store() | → mutex_lock(&dev->cache.rb_lock) | Additionally, when kzalloc() is called from within cache_ent_find_and_store(), we encounter the same deadlock due to re-acquisition of umem_mutex. Solve by releasing umem_mutex in dereg_mr() after umr_revoke_mr() and before acquiring rb_lock. This ensures that we don't hold umem_mutex while performing memory allocations that could trigger the reclaim path. This change prevents the deadlock by ensuring proper lock ordering and avoiding holding locks during memory allocation operations that could trigger the reclaim path. The following lockdep warning demonstrates the deadlock: python3/20557 is trying to acquire lock: ffff888387542128 (&umem_odp->umem_mutex){+.+.}-{4:4}, at: mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib] but task is already holding lock: ffffffff82f6b840 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: unmap_vmas+0x7b/0x1a0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: fs_reclaim_acquire+0x60/0xd0 mem_cgroup_css_alloc+0x6f/0x9b0 cgroup_init_subsys+0xa4/0x240 cgroup_init+0x1c8/0x510 start_kernel+0x747/0x760 x86_64_start_reservations+0x25/0x30 x86_64_start_kernel+0x73/0x80 common_startup_64+0x129/0x138 -> #2 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0x91/0xd0 __kmalloc_cache_noprof+0x4d/0x4c0 mlx5r_cache_create_ent_locked+0x75/0x620 [mlx5_ib] mlx5_mkey_cache_init+0x186/0x360 [mlx5_ib] mlx5_ib_stage_post_ib_reg_umr_init+0x3c/0x60 [mlx5_ib] __mlx5_ib_add+0x4b/0x190 [mlx5_ib] mlx5r_probe+0xd9/0x320 [mlx5_ib] auxiliary_bus_probe+0x42/0x70 really_probe+0xdb/0x360 __driver_probe_device+0x8f/0x130 driver_probe_device+0x1f/0xb0 __driver_attach+0xd4/0x1f0 bus_for_each_dev+0x79/0xd0 bus_add_driver+0xf0/0x200 driver_register+0x6e/0xc0 __auxiliary_driver_register+0x6a/0xc0 do_one_initcall+0x5e/0x390 do_init_module+0x88/0x240 init_module_from_file+0x85/0xc0 idempotent_init_module+0x104/0x300 __x64_sys_finit_module+0x68/0xc0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #1 (&dev->cache.rb_lock){+.+.}-{4:4}: __mutex_lock+0x98/0xf10 __mlx5_ib_dereg_mr+0x6f2/0x890 [mlx5_ib] mlx5_ib_dereg_mr+0x21/0x110 [mlx5_ib] ib_dereg_mr_user+0x85/0x1f0 [ib_core] uverbs_free_mr+0x19/0x30 [ib_uverbs] destroy_hw_idr_uobject+0x21/0x80 [ib_uverbs] uverbs_destroy_uobject+0x60/0x3d0 [ib_uverbs] uobj_destroy+0x57/0xa0 [ib_uverbs] ib_uverbs_cmd_verbs+0x4d5/0x1210 [ib_uverbs] ib_uverbs_ioctl+0x129/0x230 [ib_uverbs] __x64_sys_ioctl+0x596/0xaa0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (&umem_odp->umem_mutex){+.+.}-{4:4}: __lock_acquire+0x1826/0x2f00 lock_acquire+0xd3/0x2e0 __mutex_lock+0x98/0xf10 mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x18e/0x1f0 unmap_vmas+0x182/0x1a0 exit_mmap+0xf3/0x4a0 mmput+0x3a/0x100 do_exit+0x2b9/0xa90 do_group_exit+0x32/0xa0 get_signal+0xc32/0xcb0 arch_do_signal_or_restart+0x29/0x1d0 syscall_exit_to_user_mode+0x105/0x1d0 do_syscall_64+0x79/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Chain exists of: &dev->cache.rb_lock --> mmu_notifier_invalidate_range_start --> &umem_odp->umem_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&umem_odp->umem_mutex); lock(mmu_notifier_invalidate_range_start); lock(&umem_odp->umem_mutex); lock(&dev->cache.rb_lock); *** DEADLOCK *** Fixes: abb604a ("RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error") Signed-off-by: Or Har-Toov <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/3c8f225a8a9fade647d19b014df1172544643e4a.1750061612.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit beb89ada5715e7bd1518c58863eedce89ec051a7)
opsiff
pushed a commit
that referenced
this pull request
Jul 18, 2025
…-flight commit ecf371f8b02d5e31b9aa1da7f159f1b2107bdb01 upstream. Reject migration of SEV{-ES} state if either the source or destination VM is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the section between incrementing created_vcpus and online_vcpus. The bulk of vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs in parallel, and so sev_info.es_active can get toggled from false=>true in the destination VM after (or during) svm_vcpu_create(), resulting in an SEV{-ES} VM effectively having a non-SEV{-ES} vCPU. The issue manifests most visibly as a crash when trying to free a vCPU's NULL VMSA page in an SEV-ES VM, but any number of things can go wrong. BUG: unable to handle page fault for address: ffffebde00000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP KASAN NOPTI CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE Tainted: [U]=USER, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline] RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline] RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline] RIP: 0010:PageHead include/linux/page-flags.h:866 [inline] RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067 Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0 RSP: 0018:ffff8984551978d0 EFLAGS: 00010246 RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000 RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000 R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000 R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000 FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169 svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515 kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396 kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline] kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490 kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895 kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310 kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369 __fput+0x3e4/0x9e0 fs/file_table.c:465 task_work_run+0x1a9/0x220 kernel/task_work.c:227 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x7f0/0x25b0 kernel/exit.c:953 do_group_exit+0x203/0x2d0 kernel/exit.c:1102 get_signal+0x1357/0x1480 kernel/signal.c:3034 arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218 do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f87a898e969 </TASK> Modules linked in: gq(O) gsmi: Log Shutdown Reason 0x03 CR2: ffffebde00000000 ---[ end trace 0000000000000000 ]--- Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing the host is likely desirable due to the VMSA being consumed by hardware. E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered away thanks to L1TF, but panicking in this scenario is preferable to potentially running with corrupted state. Reported-by: Alexander Potapenko <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Fixes: 0b020f5 ("KVM: SEV: Add support for SEV-ES intra host migration") Fixes: b566393 ("KVM: SEV: Add support for SEV intra host migration") Cc: [email protected] Cc: James Houghton <[email protected]> Cc: Peter Gonda <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Tested-by: Liam Merwick <[email protected]> Reviewed-by: James Houghton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 8c8e8d4d7544bb783e15078eda8ba2580e192246)
opsiff
pushed a commit
that referenced
this pull request
Jul 18, 2025
…void Priority Inversion in SRIOV commit dc0297f upstream. RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 deepin-community#14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 deepin-community#14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK> v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian). Fixes: e864180 ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <[email protected]> Cc: Jingwen Chen <[email protected]> Cc: Victor Skvortsov <[email protected]> Cc: Zhigang Luo <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> [ Minor context change fixed. ] Signed-off-by: Wenshan Lan <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 07ed75bfa7ede8bfcfa303fd6efc85db1c8684c7)
opsiff
pushed a commit
that referenced
this pull request
Jul 18, 2025
…-flight commit ecf371f8b02d5e31b9aa1da7f159f1b2107bdb01 upstream. Reject migration of SEV{-ES} state if either the source or destination VM is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the section between incrementing created_vcpus and online_vcpus. The bulk of vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs in parallel, and so sev_info.es_active can get toggled from false=>true in the destination VM after (or during) svm_vcpu_create(), resulting in an SEV{-ES} VM effectively having a non-SEV{-ES} vCPU. The issue manifests most visibly as a crash when trying to free a vCPU's NULL VMSA page in an SEV-ES VM, but any number of things can go wrong. BUG: unable to handle page fault for address: ffffebde00000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP KASAN NOPTI CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE Tainted: [U]=USER, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline] RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline] RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline] RIP: 0010:PageHead include/linux/page-flags.h:866 [inline] RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067 Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0 RSP: 0018:ffff8984551978d0 EFLAGS: 00010246 RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000 RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000 R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000 R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000 FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169 svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515 kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396 kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline] kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490 kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895 kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310 kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369 __fput+0x3e4/0x9e0 fs/file_table.c:465 task_work_run+0x1a9/0x220 kernel/task_work.c:227 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x7f0/0x25b0 kernel/exit.c:953 do_group_exit+0x203/0x2d0 kernel/exit.c:1102 get_signal+0x1357/0x1480 kernel/signal.c:3034 arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218 do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f87a898e969 </TASK> Modules linked in: gq(O) gsmi: Log Shutdown Reason 0x03 CR2: ffffebde00000000 ---[ end trace 0000000000000000 ]--- Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing the host is likely desirable due to the VMSA being consumed by hardware. E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered away thanks to L1TF, but panicking in this scenario is preferable to potentially running with corrupted state. Reported-by: Alexander Potapenko <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Fixes: 0b020f5 ("KVM: SEV: Add support for SEV-ES intra host migration") Fixes: b566393 ("KVM: SEV: Add support for SEV intra host migration") Cc: [email protected] Cc: James Houghton <[email protected]> Cc: Peter Gonda <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Tested-by: Liam Merwick <[email protected]> Reviewed-by: James Houghton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit fd044c99d831e9f837518816c7c366b04014d405)
opsiff
pushed a commit
that referenced
this pull request
Jul 22, 2025
…void Priority Inversion in SRIOV commit dc0297f upstream. RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 deepin-community#14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 deepin-community#14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK> v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian). Fixes: e864180 ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <[email protected]> Cc: Jingwen Chen <[email protected]> Cc: Victor Skvortsov <[email protected]> Cc: Zhigang Luo <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> [ Minor context change fixed. ] Signed-off-by: Wenshan Lan <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 07ed75bfa7ede8bfcfa303fd6efc85db1c8684c7)
opsiff
pushed a commit
that referenced
this pull request
Jul 22, 2025
…-flight commit ecf371f8b02d5e31b9aa1da7f159f1b2107bdb01 upstream. Reject migration of SEV{-ES} state if either the source or destination VM is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the section between incrementing created_vcpus and online_vcpus. The bulk of vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs in parallel, and so sev_info.es_active can get toggled from false=>true in the destination VM after (or during) svm_vcpu_create(), resulting in an SEV{-ES} VM effectively having a non-SEV{-ES} vCPU. The issue manifests most visibly as a crash when trying to free a vCPU's NULL VMSA page in an SEV-ES VM, but any number of things can go wrong. BUG: unable to handle page fault for address: ffffebde00000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP KASAN NOPTI CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE Tainted: [U]=USER, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline] RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline] RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline] RIP: 0010:PageHead include/linux/page-flags.h:866 [inline] RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067 Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0 RSP: 0018:ffff8984551978d0 EFLAGS: 00010246 RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000 RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000 R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000 R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000 FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169 svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515 kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396 kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline] kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490 kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895 kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310 kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369 __fput+0x3e4/0x9e0 fs/file_table.c:465 task_work_run+0x1a9/0x220 kernel/task_work.c:227 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x7f0/0x25b0 kernel/exit.c:953 do_group_exit+0x203/0x2d0 kernel/exit.c:1102 get_signal+0x1357/0x1480 kernel/signal.c:3034 arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218 do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f87a898e969 </TASK> Modules linked in: gq(O) gsmi: Log Shutdown Reason 0x03 CR2: ffffebde00000000 ---[ end trace 0000000000000000 ]--- Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing the host is likely desirable due to the VMSA being consumed by hardware. E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered away thanks to L1TF, but panicking in this scenario is preferable to potentially running with corrupted state. Reported-by: Alexander Potapenko <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Fixes: 0b020f5 ("KVM: SEV: Add support for SEV-ES intra host migration") Fixes: b566393 ("KVM: SEV: Add support for SEV intra host migration") Cc: [email protected] Cc: James Houghton <[email protected]> Cc: Peter Gonda <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Tested-by: Liam Merwick <[email protected]> Reviewed-by: James Houghton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit fd044c99d831e9f837518816c7c366b04014d405)
opsiff
pushed a commit
that referenced
this pull request
Jul 22, 2025
hygon inclusion category: bugfix CVE: NA --------------------------- When psp_do_cmd try lock mutex timeout, the following error may occur: [ 544.294573] watchdog: BUG: soft lockup - CPU#30 stuck for 26s! [hwrng:1063] [ 544.295235] Modules linked in: hct nf_conntrack_netlink nfnetlink xt_addrtype xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 dm_mod ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter ip_tables rfkill overlay sunrpc vfat fat intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd ipmi_ssif joydev kvm sg pcspkr acpi_ipmi ipmi_si rapl k10temp i2c_piix4 i2c_designware_platform acpi_cpufreq ipmi_devintf tpm_hygon ipmi_msghandler i2c_designware_core ext4 mbcache jbd2 sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 crct10dif_pclmul ahci crc32_pclmul libahci crc32c_intel ast polyval_clmulni drm_shmem_helper drm_kms_helper polyval_generic ixgbe ghash_clmulni_intel igb mdio ccp sha512_ssse3 i2c_algo_bit dca drm libata mdev sp5100_tco sm4 vhost_net tun tap vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio fuse [ 544.295390] br_netfilter bridge stp llc [ 544.300301] CPU: 30 PID: 1063 Comm: hwrng Kdump: loaded Not tainted 6.6.7-hygon #2 [ 544.300828] Hardware name: HYGON Hygon35N16/35N16, BIOS A0273036 05/31/2024 [ 544.301367] RIP: 0010:psp_do_cmd+0x57/0x100 [ccp] [ 544.301936] Code: 48 8b 05 fc 8c a6 f2 48 8d 88 60 ea 00 00 48 8b 05 d6 d7 00 00 48 8b 50 08 48 c7 44 24 08 01 00 00 00 48 8b 44 24 08 48 87 02 <48> 89 44 24 08 48 8b 44 24 08 48 85 c0 74 4b 48 8b 05 c3 8c a6 f2 [ 544.303086] RSP: 0018:ffffc9000920fd30 EFLAGS: 00000283 [ 544.303674] RAX: 0000000000000001 RBX: 0000000000000100 RCX: 0000000100043a60 [ 544.304279] RDX: ffff888180a04000 RSI: ffff888144422028 RDI: 0000000000000100 [ 544.304891] RBP: ffff888144422028 R08: ffffc9000920fdd0 R09: ffff8890a1dfd006 [ 544.305508] R10: 40007b0100000c00 R11: 00000c0000000180 R12: ffffc9000920fd64 [ 544.306133] R13: ffff8890a1dfd000 R14: 000000007b010000 R15: ffff888146c75000 [ 544.306756] FS: 0000000000000000(0000) GS:ffff88985e380000(0000) knlGS:0000000000000000 [ 544.307397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 544.308042] CR2: 00007f9392a6f830 CR3: 000000011441c000 CR4: 00000000003506e0 [ 544.308696] Call Trace: [ 544.309361] <IRQ> [ 544.310020] ? watchdog_timer_fn+0x1b4/0x220 [ 544.310693] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 544.311365] ? __hrtimer_run_queues+0x112/0x2b0 [ 544.312046] ? hrtimer_interrupt+0xf4/0x230 [ 544.312732] ? __sysvec_apic_timer_interrupt+0x4c/0x140 [ 544.313421] ? sysvec_apic_timer_interrupt+0x69/0x90 [ 544.314115] </IRQ> [ 544.314752] <TASK> [ 544.315384] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 544.316036] ? psp_do_cmd+0x57/0x100 [ccp] [ 544.316699] tpm_c_send+0x46/0x90 [tpm_hygon] [ 544.317345] tpm_try_transmit+0x62/0x2a0 [ 544.317992] tpm_transmit+0x8e/0x290 [ 544.318623] tpm_transmit_cmd+0x25/0xa0 [ 544.319247] tpm2_get_random+0xea/0x1e0 [ 544.319872] ? __pfx_hwrng_fillfn+0x10/0x10 [ 544.320496] tpm_get_random+0x5b/0x70 [ 544.321115] hwrng_fillfn+0xde/0x1e0 [ 544.321731] kthread+0xe4/0x110 [ 544.322346] ? __pfx_kthread+0x10/0x10 [ 544.322961] ret_from_fork+0x30/0x50 [ 544.323570] ? __pfx_kthread+0x10/0x10 [ 544.324141] ret_from_fork_asm+0x1b/0x30 [ 544.324711] </TASK> Fixes: 75f7390 ("drivers/crypto/ccp: concurrent psp access support between user and kernel space") Signed-off-by: xiongmengbiao <[email protected]>
opsiff
pushed a commit
that referenced
this pull request
Jul 22, 2025
…-flight commit ecf371f8b02d5e31b9aa1da7f159f1b2107bdb01 upstream. Reject migration of SEV{-ES} state if either the source or destination VM is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the section between incrementing created_vcpus and online_vcpus. The bulk of vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs in parallel, and so sev_info.es_active can get toggled from false=>true in the destination VM after (or during) svm_vcpu_create(), resulting in an SEV{-ES} VM effectively having a non-SEV{-ES} vCPU. The issue manifests most visibly as a crash when trying to free a vCPU's NULL VMSA page in an SEV-ES VM, but any number of things can go wrong. BUG: unable to handle page fault for address: ffffebde00000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP KASAN NOPTI CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE Tainted: [U]=USER, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline] RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline] RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline] RIP: 0010:PageHead include/linux/page-flags.h:866 [inline] RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067 Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0 RSP: 0018:ffff8984551978d0 EFLAGS: 00010246 RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000 RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000 R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000 R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000 FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169 svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515 kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396 kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline] kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490 kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895 kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310 kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369 __fput+0x3e4/0x9e0 fs/file_table.c:465 task_work_run+0x1a9/0x220 kernel/task_work.c:227 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x7f0/0x25b0 kernel/exit.c:953 do_group_exit+0x203/0x2d0 kernel/exit.c:1102 get_signal+0x1357/0x1480 kernel/signal.c:3034 arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218 do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f87a898e969 </TASK> Modules linked in: gq(O) gsmi: Log Shutdown Reason 0x03 CR2: ffffebde00000000 ---[ end trace 0000000000000000 ]--- Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing the host is likely desirable due to the VMSA being consumed by hardware. E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered away thanks to L1TF, but panicking in this scenario is preferable to potentially running with corrupted state. Reported-by: Alexander Potapenko <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Fixes: 0b020f5 ("KVM: SEV: Add support for SEV-ES intra host migration") Fixes: b566393 ("KVM: SEV: Add support for SEV intra host migration") Cc: [email protected] Cc: James Houghton <[email protected]> Cc: Peter Gonda <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Tested-by: Liam Merwick <[email protected]> Reviewed-by: James Houghton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 8c8e8d4d7544bb783e15078eda8ba2580e192246)
opsiff
pushed a commit
that referenced
this pull request
Jul 22, 2025
…-flight commit ecf371f8b02d5e31b9aa1da7f159f1b2107bdb01 upstream. Reject migration of SEV{-ES} state if either the source or destination VM is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the section between incrementing created_vcpus and online_vcpus. The bulk of vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs in parallel, and so sev_info.es_active can get toggled from false=>true in the destination VM after (or during) svm_vcpu_create(), resulting in an SEV{-ES} VM effectively having a non-SEV{-ES} vCPU. The issue manifests most visibly as a crash when trying to free a vCPU's NULL VMSA page in an SEV-ES VM, but any number of things can go wrong. BUG: unable to handle page fault for address: ffffebde00000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP KASAN NOPTI CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE Tainted: [U]=USER, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline] RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline] RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline] RIP: 0010:PageHead include/linux/page-flags.h:866 [inline] RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067 Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0 RSP: 0018:ffff8984551978d0 EFLAGS: 00010246 RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000 RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000 R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000 R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000 FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169 svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515 kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396 kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline] kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490 kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895 kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310 kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369 __fput+0x3e4/0x9e0 fs/file_table.c:465 task_work_run+0x1a9/0x220 kernel/task_work.c:227 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x7f0/0x25b0 kernel/exit.c:953 do_group_exit+0x203/0x2d0 kernel/exit.c:1102 get_signal+0x1357/0x1480 kernel/signal.c:3034 arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218 do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f87a898e969 </TASK> Modules linked in: gq(O) gsmi: Log Shutdown Reason 0x03 CR2: ffffebde00000000 ---[ end trace 0000000000000000 ]--- Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing the host is likely desirable due to the VMSA being consumed by hardware. E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered away thanks to L1TF, but panicking in this scenario is preferable to potentially running with corrupted state. Reported-by: Alexander Potapenko <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Fixes: 0b020f5 ("KVM: SEV: Add support for SEV-ES intra host migration") Fixes: b566393 ("KVM: SEV: Add support for SEV intra host migration") Cc: [email protected] Cc: James Houghton <[email protected]> Cc: Peter Gonda <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Tested-by: Liam Merwick <[email protected]> Reviewed-by: James Houghton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 8c8e8d4d7544bb783e15078eda8ba2580e192246)
opsiff
pushed a commit
that referenced
this pull request
Jul 27, 2025
commit b1bf1a782fdf5c482215c0c661b5da98b8e75773 upstream. If "try_verify_in_tasklet" is set for dm-verity, DM_BUFIO_CLIENT_NO_SLEEP is enabled for dm-bufio. However, when bufio tries to evict buffers, there is a chance to trigger scheduling in spin_lock_bh, the following warning is hit: BUG: sleeping function called from invalid context at drivers/md/dm-bufio.c:2745 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 123, name: kworker/2:2 preempt_count: 201, expected: 0 RCU nest depth: 0, expected: 0 4 locks held by kworker/2:2/123: #0: ffff88800a2d1548 ((wq_completion)dm_bufio_cache){....}-{0:0}, at: process_one_work+0xe46/0x1970 #1: ffffc90000d97d20 ((work_completion)(&dm_bufio_replacement_work)){....}-{0:0}, at: process_one_work+0x763/0x1970 #2: ffffffff8555b528 (dm_bufio_clients_lock){....}-{3:3}, at: do_global_cleanup+0x1ce/0x710 #3: ffff88801d5820b8 (&c->spinlock){....}-{2:2}, at: do_global_cleanup+0x2a5/0x710 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 2 UID: 0 PID: 123 Comm: kworker/2:2 Not tainted 6.16.0-rc3-g90548c634bd0 deepin-community#305 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: dm_bufio_cache do_global_cleanup Call Trace: <TASK> dump_stack_lvl+0x53/0x70 __might_resched+0x360/0x4e0 do_global_cleanup+0x2f5/0x710 process_one_work+0x7db/0x1970 worker_thread+0x518/0xea0 kthread+0x359/0x690 ret_from_fork+0xf3/0x1b0 ret_from_fork_asm+0x1a/0x30 </TASK> That can be reproduced by: veritysetup format --data-block-size=4096 --hash-block-size=4096 /dev/vda /dev/vdb SIZE=$(blockdev --getsz /dev/vda) dmsetup create myverity -r --table "0 $SIZE verity 1 /dev/vda /dev/vdb 4096 4096 <data_blocks> 1 sha256 <root_hash> <salt> 1 try_verify_in_tasklet" mount /dev/dm-0 /mnt -o ro echo 102400 > /sys/module/dm_bufio/parameters/max_cache_size_bytes [read files in /mnt] Cc: [email protected] # v6.4+ Fixes: 450e8de ("dm bufio: improve concurrent IO performance") Signed-off-by: Wang Shuai <[email protected]> Signed-off-by: Sheng Yong <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 469a39a33a9934af157299bf11c58f6e6cb53f85)
opsiff
pushed a commit
that referenced
this pull request
Jul 27, 2025
commit b1bf1a782fdf5c482215c0c661b5da98b8e75773 upstream. If "try_verify_in_tasklet" is set for dm-verity, DM_BUFIO_CLIENT_NO_SLEEP is enabled for dm-bufio. However, when bufio tries to evict buffers, there is a chance to trigger scheduling in spin_lock_bh, the following warning is hit: BUG: sleeping function called from invalid context at drivers/md/dm-bufio.c:2745 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 123, name: kworker/2:2 preempt_count: 201, expected: 0 RCU nest depth: 0, expected: 0 4 locks held by kworker/2:2/123: #0: ffff88800a2d1548 ((wq_completion)dm_bufio_cache){....}-{0:0}, at: process_one_work+0xe46/0x1970 #1: ffffc90000d97d20 ((work_completion)(&dm_bufio_replacement_work)){....}-{0:0}, at: process_one_work+0x763/0x1970 #2: ffffffff8555b528 (dm_bufio_clients_lock){....}-{3:3}, at: do_global_cleanup+0x1ce/0x710 #3: ffff88801d5820b8 (&c->spinlock){....}-{2:2}, at: do_global_cleanup+0x2a5/0x710 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 2 UID: 0 PID: 123 Comm: kworker/2:2 Not tainted 6.16.0-rc3-g90548c634bd0 deepin-community#305 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: dm_bufio_cache do_global_cleanup Call Trace: <TASK> dump_stack_lvl+0x53/0x70 __might_resched+0x360/0x4e0 do_global_cleanup+0x2f5/0x710 process_one_work+0x7db/0x1970 worker_thread+0x518/0xea0 kthread+0x359/0x690 ret_from_fork+0xf3/0x1b0 ret_from_fork_asm+0x1a/0x30 </TASK> That can be reproduced by: veritysetup format --data-block-size=4096 --hash-block-size=4096 /dev/vda /dev/vdb SIZE=$(blockdev --getsz /dev/vda) dmsetup create myverity -r --table "0 $SIZE verity 1 /dev/vda /dev/vdb 4096 4096 <data_blocks> 1 sha256 <root_hash> <salt> 1 try_verify_in_tasklet" mount /dev/dm-0 /mnt -o ro echo 102400 > /sys/module/dm_bufio/parameters/max_cache_size_bytes [read files in /mnt] Cc: [email protected] # v6.4+ Fixes: 450e8de ("dm bufio: improve concurrent IO performance") Signed-off-by: Wang Shuai <[email protected]> Signed-off-by: Sheng Yong <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 68860d1ade385eef9fcdbf6552f061283091fdb8)
opsiff
pushed a commit
that referenced
this pull request
Jul 28, 2025
commit b1bf1a782fdf5c482215c0c661b5da98b8e75773 upstream. If "try_verify_in_tasklet" is set for dm-verity, DM_BUFIO_CLIENT_NO_SLEEP is enabled for dm-bufio. However, when bufio tries to evict buffers, there is a chance to trigger scheduling in spin_lock_bh, the following warning is hit: BUG: sleeping function called from invalid context at drivers/md/dm-bufio.c:2745 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 123, name: kworker/2:2 preempt_count: 201, expected: 0 RCU nest depth: 0, expected: 0 4 locks held by kworker/2:2/123: #0: ffff88800a2d1548 ((wq_completion)dm_bufio_cache){....}-{0:0}, at: process_one_work+0xe46/0x1970 #1: ffffc90000d97d20 ((work_completion)(&dm_bufio_replacement_work)){....}-{0:0}, at: process_one_work+0x763/0x1970 #2: ffffffff8555b528 (dm_bufio_clients_lock){....}-{3:3}, at: do_global_cleanup+0x1ce/0x710 #3: ffff88801d5820b8 (&c->spinlock){....}-{2:2}, at: do_global_cleanup+0x2a5/0x710 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 2 UID: 0 PID: 123 Comm: kworker/2:2 Not tainted 6.16.0-rc3-g90548c634bd0 deepin-community#305 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: dm_bufio_cache do_global_cleanup Call Trace: <TASK> dump_stack_lvl+0x53/0x70 __might_resched+0x360/0x4e0 do_global_cleanup+0x2f5/0x710 process_one_work+0x7db/0x1970 worker_thread+0x518/0xea0 kthread+0x359/0x690 ret_from_fork+0xf3/0x1b0 ret_from_fork_asm+0x1a/0x30 </TASK> That can be reproduced by: veritysetup format --data-block-size=4096 --hash-block-size=4096 /dev/vda /dev/vdb SIZE=$(blockdev --getsz /dev/vda) dmsetup create myverity -r --table "0 $SIZE verity 1 /dev/vda /dev/vdb 4096 4096 <data_blocks> 1 sha256 <root_hash> <salt> 1 try_verify_in_tasklet" mount /dev/dm-0 /mnt -o ro echo 102400 > /sys/module/dm_bufio/parameters/max_cache_size_bytes [read files in /mnt] Cc: [email protected] # v6.4+ Fixes: 450e8de ("dm bufio: improve concurrent IO performance") Signed-off-by: Wang Shuai <[email protected]> Signed-off-by: Sheng Yong <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 469a39a33a9934af157299bf11c58f6e6cb53f85)
opsiff
pushed a commit
that referenced
this pull request
Aug 25, 2025
[ Upstream commit 17ce3e5949bc37557305ad46316f41c7875d6366 ] syzbot reported that the netfilter bpf prog can be called without migration disabled in xmit path. Then the assertion in __bpf_prog_run() fails, triggering the splat below. [0] Let's use bpf_prog_run_pin_on_cpu() in nf_hook_run_bpf(). [0]: BUG: assuming non migratable context at ./include/linux/filter.h:703 in_atomic(): 0, irqs_disabled(): 0, migration_disabled() 0 pid: 5829, name: sshd-session 3 locks held by sshd-session/5829: #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1667 [inline] #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x20/0x50 net/ipv4/tcp.c:1395 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline] #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline] #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x69/0x26c0 net/ipv4/ip_output.c:470 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline] #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline] #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: nf_hook+0xb2/0x680 include/linux/netfilter.h:241 CPU: 0 UID: 0 PID: 5829 Comm: sshd-session Not tainted 6.16.0-rc6-syzkaller-00002-g155a3c003e55 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120 __cant_migrate kernel/sched/core.c:8860 [inline] __cant_migrate+0x1c7/0x250 kernel/sched/core.c:8834 __bpf_prog_run include/linux/filter.h:703 [inline] bpf_prog_run include/linux/filter.h:725 [inline] nf_hook_run_bpf+0x83/0x1e0 net/netfilter/nf_bpf_link.c:20 nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline] nf_hook_slow+0xbb/0x200 net/netfilter/core.c:623 nf_hook+0x370/0x680 include/linux/netfilter.h:272 NF_HOOK_COND include/linux/netfilter.h:305 [inline] ip_output+0x1bc/0x2a0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:459 [inline] ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x1d7d/0x26c0 net/ipv4/ip_output.c:527 __tcp_transmit_skb+0x2686/0x3e90 net/ipv4/tcp_output.c:1479 tcp_transmit_skb net/ipv4/tcp_output.c:1497 [inline] tcp_write_xmit+0x1274/0x84e0 net/ipv4/tcp_output.c:2838 __tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3021 tcp_push+0x225/0x700 net/ipv4/tcp.c:759 tcp_sendmsg_locked+0x1870/0x42b0 net/ipv4/tcp.c:1359 tcp_sendmsg+0x2e/0x50 net/ipv4/tcp.c:1396 inet_sendmsg+0xb9/0x140 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg net/socket.c:727 [inline] sock_write_iter+0x4aa/0x5b0 net/socket.c:1131 new_sync_write fs/read_write.c:593 [inline] vfs_write+0x6c7/0x1150 fs/read_write.c:686 ksys_write+0x1f8/0x250 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe7d365d407 Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: Fixes: fd9c663 ("bpf: minimal support for programs hooked into netfilter framework") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Tested-by: [email protected] Acked-by: Florian Westphal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit ee2502485702e4398cd74dbfb288bfa111d25e62)
opsiff
pushed a commit
that referenced
this pull request
Aug 25, 2025
[ Upstream commit 4668619092554e1b95c9a5ac2941ca47ba6d548a ] When the root of a nested PCIe bridge configuration is unplugged, the pnv_php driver leaked the allocated IRQ resources for the child bridges' hotplug event notifications, resulting in a panic. Fix this by walking all child buses and deallocating all its IRQ resources before calling pci_hp_remove_devices(). Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so that it is only destroyed in pnv_php_free_slot(), instead of pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will now be called by workers triggered by hot unplug interrupts, so the workqueue needs to stay allocated. The abridged kernel panic that occurs without this patch is as follows: WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292 msi_device_data_release+0x6c/0x9c CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2 Call Trace: msi_device_data_release+0x34/0x9c (unreliable) release_nodes+0x64/0x13c devres_release_all+0xc0/0x140 device_del+0x2d4/0x46c pci_destroy_dev+0x5c/0x194 pci_hp_remove_devices+0x90/0x128 pci_hp_remove_devices+0x44/0x128 pnv_php_disable_slot+0x54/0xd4 power_write_file+0xf8/0x18c pci_slot_attr_store+0x40/0x5c sysfs_kf_write+0x64/0x78 kernfs_fop_write_iter+0x1b0/0x290 vfs_write+0x3bc/0x50c ksys_write+0x84/0x140 system_call_exception+0x124/0x230 system_call_vectored_common+0x15c/0x2ec Signed-off-by: Shawn Anastasio <[email protected]> Signed-off-by: Timothy Pearson <[email protected]> [bhelgaas: tidy comments] Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/2013845045.1359852.1752615367790.JavaMail.zimbra@raptorengineeringinc.com Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 32173edf3fe2d447e14e5e3b299387c6f9602a88)
opsiff
pushed a commit
that referenced
this pull request
Aug 29, 2025
[ Upstream commit 17ce3e5949bc37557305ad46316f41c7875d6366 ] syzbot reported that the netfilter bpf prog can be called without migration disabled in xmit path. Then the assertion in __bpf_prog_run() fails, triggering the splat below. [0] Let's use bpf_prog_run_pin_on_cpu() in nf_hook_run_bpf(). [0]: BUG: assuming non migratable context at ./include/linux/filter.h:703 in_atomic(): 0, irqs_disabled(): 0, migration_disabled() 0 pid: 5829, name: sshd-session 3 locks held by sshd-session/5829: #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1667 [inline] #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x20/0x50 net/ipv4/tcp.c:1395 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline] #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline] #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x69/0x26c0 net/ipv4/ip_output.c:470 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline] #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline] #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: nf_hook+0xb2/0x680 include/linux/netfilter.h:241 CPU: 0 UID: 0 PID: 5829 Comm: sshd-session Not tainted 6.16.0-rc6-syzkaller-00002-g155a3c003e55 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120 __cant_migrate kernel/sched/core.c:8860 [inline] __cant_migrate+0x1c7/0x250 kernel/sched/core.c:8834 __bpf_prog_run include/linux/filter.h:703 [inline] bpf_prog_run include/linux/filter.h:725 [inline] nf_hook_run_bpf+0x83/0x1e0 net/netfilter/nf_bpf_link.c:20 nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline] nf_hook_slow+0xbb/0x200 net/netfilter/core.c:623 nf_hook+0x370/0x680 include/linux/netfilter.h:272 NF_HOOK_COND include/linux/netfilter.h:305 [inline] ip_output+0x1bc/0x2a0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:459 [inline] ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x1d7d/0x26c0 net/ipv4/ip_output.c:527 __tcp_transmit_skb+0x2686/0x3e90 net/ipv4/tcp_output.c:1479 tcp_transmit_skb net/ipv4/tcp_output.c:1497 [inline] tcp_write_xmit+0x1274/0x84e0 net/ipv4/tcp_output.c:2838 __tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3021 tcp_push+0x225/0x700 net/ipv4/tcp.c:759 tcp_sendmsg_locked+0x1870/0x42b0 net/ipv4/tcp.c:1359 tcp_sendmsg+0x2e/0x50 net/ipv4/tcp.c:1396 inet_sendmsg+0xb9/0x140 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg net/socket.c:727 [inline] sock_write_iter+0x4aa/0x5b0 net/socket.c:1131 new_sync_write fs/read_write.c:593 [inline] vfs_write+0x6c7/0x1150 fs/read_write.c:686 ksys_write+0x1f8/0x250 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe7d365d407 Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: Fixes: fd9c663 ("bpf: minimal support for programs hooked into netfilter framework") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Tested-by: [email protected] Acked-by: Florian Westphal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit ee2502485702e4398cd74dbfb288bfa111d25e62)
opsiff
pushed a commit
that referenced
this pull request
Aug 29, 2025
[ Upstream commit 4668619092554e1b95c9a5ac2941ca47ba6d548a ] When the root of a nested PCIe bridge configuration is unplugged, the pnv_php driver leaked the allocated IRQ resources for the child bridges' hotplug event notifications, resulting in a panic. Fix this by walking all child buses and deallocating all its IRQ resources before calling pci_hp_remove_devices(). Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so that it is only destroyed in pnv_php_free_slot(), instead of pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will now be called by workers triggered by hot unplug interrupts, so the workqueue needs to stay allocated. The abridged kernel panic that occurs without this patch is as follows: WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292 msi_device_data_release+0x6c/0x9c CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2 Call Trace: msi_device_data_release+0x34/0x9c (unreliable) release_nodes+0x64/0x13c devres_release_all+0xc0/0x140 device_del+0x2d4/0x46c pci_destroy_dev+0x5c/0x194 pci_hp_remove_devices+0x90/0x128 pci_hp_remove_devices+0x44/0x128 pnv_php_disable_slot+0x54/0xd4 power_write_file+0xf8/0x18c pci_slot_attr_store+0x40/0x5c sysfs_kf_write+0x64/0x78 kernfs_fop_write_iter+0x1b0/0x290 vfs_write+0x3bc/0x50c ksys_write+0x84/0x140 system_call_exception+0x124/0x230 system_call_vectored_common+0x15c/0x2ec Signed-off-by: Shawn Anastasio <[email protected]> Signed-off-by: Timothy Pearson <[email protected]> [bhelgaas: tidy comments] Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/2013845045.1359852.1752615367790.JavaMail.zimbra@raptorengineeringinc.com Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 32173edf3fe2d447e14e5e3b299387c6f9602a88)
opsiff
pushed a commit
that referenced
this pull request
Sep 1, 2025
[ Upstream commit 17ce3e5949bc37557305ad46316f41c7875d6366 ] syzbot reported that the netfilter bpf prog can be called without migration disabled in xmit path. Then the assertion in __bpf_prog_run() fails, triggering the splat below. [0] Let's use bpf_prog_run_pin_on_cpu() in nf_hook_run_bpf(). [0]: BUG: assuming non migratable context at ./include/linux/filter.h:703 in_atomic(): 0, irqs_disabled(): 0, migration_disabled() 0 pid: 5829, name: sshd-session 3 locks held by sshd-session/5829: #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1667 [inline] #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x20/0x50 net/ipv4/tcp.c:1395 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline] #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline] #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x69/0x26c0 net/ipv4/ip_output.c:470 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline] #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline] #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: nf_hook+0xb2/0x680 include/linux/netfilter.h:241 CPU: 0 UID: 0 PID: 5829 Comm: sshd-session Not tainted 6.16.0-rc6-syzkaller-00002-g155a3c003e55 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120 __cant_migrate kernel/sched/core.c:8860 [inline] __cant_migrate+0x1c7/0x250 kernel/sched/core.c:8834 __bpf_prog_run include/linux/filter.h:703 [inline] bpf_prog_run include/linux/filter.h:725 [inline] nf_hook_run_bpf+0x83/0x1e0 net/netfilter/nf_bpf_link.c:20 nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline] nf_hook_slow+0xbb/0x200 net/netfilter/core.c:623 nf_hook+0x370/0x680 include/linux/netfilter.h:272 NF_HOOK_COND include/linux/netfilter.h:305 [inline] ip_output+0x1bc/0x2a0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:459 [inline] ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x1d7d/0x26c0 net/ipv4/ip_output.c:527 __tcp_transmit_skb+0x2686/0x3e90 net/ipv4/tcp_output.c:1479 tcp_transmit_skb net/ipv4/tcp_output.c:1497 [inline] tcp_write_xmit+0x1274/0x84e0 net/ipv4/tcp_output.c:2838 __tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3021 tcp_push+0x225/0x700 net/ipv4/tcp.c:759 tcp_sendmsg_locked+0x1870/0x42b0 net/ipv4/tcp.c:1359 tcp_sendmsg+0x2e/0x50 net/ipv4/tcp.c:1396 inet_sendmsg+0xb9/0x140 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg net/socket.c:727 [inline] sock_write_iter+0x4aa/0x5b0 net/socket.c:1131 new_sync_write fs/read_write.c:593 [inline] vfs_write+0x6c7/0x1150 fs/read_write.c:686 ksys_write+0x1f8/0x250 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe7d365d407 Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: Fixes: fd9c663 ("bpf: minimal support for programs hooked into netfilter framework") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Tested-by: [email protected] Acked-by: Florian Westphal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 62f6175d145e00fc999fd2fcbffad3f59253c66a)
opsiff
pushed a commit
that referenced
this pull request
Sep 1, 2025
[ Upstream commit 4668619092554e1b95c9a5ac2941ca47ba6d548a ] When the root of a nested PCIe bridge configuration is unplugged, the pnv_php driver leaked the allocated IRQ resources for the child bridges' hotplug event notifications, resulting in a panic. Fix this by walking all child buses and deallocating all its IRQ resources before calling pci_hp_remove_devices(). Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so that it is only destroyed in pnv_php_free_slot(), instead of pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will now be called by workers triggered by hot unplug interrupts, so the workqueue needs to stay allocated. The abridged kernel panic that occurs without this patch is as follows: WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292 msi_device_data_release+0x6c/0x9c CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2 Call Trace: msi_device_data_release+0x34/0x9c (unreliable) release_nodes+0x64/0x13c devres_release_all+0xc0/0x140 device_del+0x2d4/0x46c pci_destroy_dev+0x5c/0x194 pci_hp_remove_devices+0x90/0x128 pci_hp_remove_devices+0x44/0x128 pnv_php_disable_slot+0x54/0xd4 power_write_file+0xf8/0x18c pci_slot_attr_store+0x40/0x5c sysfs_kf_write+0x64/0x78 kernfs_fop_write_iter+0x1b0/0x290 vfs_write+0x3bc/0x50c ksys_write+0x84/0x140 system_call_exception+0x124/0x230 system_call_vectored_common+0x15c/0x2ec Signed-off-by: Shawn Anastasio <[email protected]> Signed-off-by: Timothy Pearson <[email protected]> [bhelgaas: tidy comments] Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/2013845045.1359852.1752615367790.JavaMail.zimbra@raptorengineeringinc.com Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 28aa3cfce12487614219e7667ec84424e1f43227)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary by Sourcery
Refactor file handle management in overlayfs, simplifying origin tracking and handling cases where the lower layer does not support exports. Improve file descriptor allocation by optimizing bitmap search and expanding the file descriptor table more efficiently. Update amdgpu driver resume sequence and remove redundant DCE resume phase. Fix a bug in ss command usage in mptcp selftests to correctly handle disconnections. Fix a race condition in GTP device list handling during network namespace exit. Simplify origin verification and index handling in overlayfs. Fix a potential memory leak in vsock transport destruction. Use raw spinlocks for Xilinx GPIO driver. Fix sequence number initialization in mlx5 IPsec driver. Fix slave mode handling in Renesas R-Car I2C driver. Fix PCI host probing to avoid resource reassignment when probing only. Fix a bug in overlayfs index directory handling. Fix PHY driver quirks for AMD XGBE. Fix CBD update in Freescale FEC driver to avoid memory leaks. Fix a bug in nfsd file cache disposal. Fix a potential use-after-free in vsock stream data check. Fix a bit shift issue in TI CPSW ALE driver. Fix a security context handling bug in cachefiles. Fix a potential overflow in pktgen IMIX entries. Fix a race condition in MPTCP receive window handling. Fix a memory leak in vmwgfx dummy query BO creation. Fix a memory leak in vmwgfx compat shader addition. Fix a potential integer overflow in tmp513 driver calculations. Fix block queue initialization to avoid potential double initialization. Fix UFS driver power management level initialization. Fix mptcp epoll readiness check to avoid spurious wakeups. Fix a potential use-after-free in vsock BPF recvmsg. Fix a potential deadlock in ocfs2 read path. Fix a potential memory leak in btmtk USB shutdown. Fix a memory leak in vmwgfx prime import. Fix a potential crash in hfs superblock initialization. Fix inotify file handler encoding error. Fix potential buffer overflow in rfcomm sysfs attributes. Fix IPv6 TCP receive path to drop packets for closed sockets. Fix a potential use-after-free in mac802154 interface removal. Fix a potential bug in Open vSwitch output path. Fix a potential buffer overflow in nfp BPF event output. Fix a potential security vulnerability in cachefiles security context handling. Fix SMB server hostname memory leak. Fix bnxt_re query QP to report correct port number. Fix bnxt_re QP query to retrieve port ID. Fix mlx5 flow steering priority for RDMA TX.