Update QEMU to V10.0.3 #116

rmalmain · 2025-08-12T11:16:24Z

No description provided.

The model number was mistakenly set to 0x0b (11) in commit ff04bc1. The correct value is 0x5b. This mistake occurred because the extended model bits in cpuid[eax=0x1].eax were overlooked, and only the base model was used. Using the wrong model number can affect guest behavior. One known issue is that vPMU (which relies on the model number) may fail to operate correctly. This patch corrects the model field by introducing a new vCPU version. Fixes: ff04bc1 ("target/i386: Introduce Zhaoxin Yongfeng CPU model") Signed-off-by: Ewan Hai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit 280712b) Signed-off-by: Michael Tokarev <[email protected]>

Commit 3f2a05b ("target/i386: Reset TSCs of parked vCPUs too on VM reset") introduced a way to reset TSCs of parked vCPUs during VM reset to prevent them getting desynchronized with the online vCPUs and therefore causing the KVM PV clock to lose PVCLOCK_TSC_STABLE_BIT. The way this was done was by registering a parked vCPU-specific QEMU reset callback via qemu_register_reset(). However, it turns out that on particularly device-rich VMs QEMU reset callbacks can take a long time to execute (which isn't surprising, considering that they involve resetting all of VM devices). In particular, their total runtime can exceed the 1-second TSC synchronization window introduced in KVM commit 5d3cb0f6a8e3 ("KVM: Improve TSC offset matching"). Since the TSCs of online vCPUs are only reset from "synchronize_post_reset" AccelOps handler (which runs after all qemu_register_reset() handlers) this essentially makes that fix ineffective on these VMs. The easiest way to guarantee that these parked vCPUs are reset at the same time as the online ones (regardless how long it takes for VM devices to reset) is to piggyback on post-reset vCPU synchronization handler for one of online vCPUs - as there is no generic post-reset AccelOps handler that isn't per-vCPU. The first online vCPU was selected for that since it is easily available under "first_cpu" define. This does not create an ordering issue since the order of vCPU TSC resets does not matter. Fixes: 3f2a05b ("target/i386: Reset TSCs of parked vCPUs too on VM reset") Signed-off-by: Maciej S. Szmigiero <[email protected]> Link: https://lore.kernel.org/r/e8b85a5915f79aa177ca49eccf0e9b534470c1cd.1743099810.git.maciej.szmigiero@oracle.com Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit f6b5f71) Signed-off-by: Michael Tokarev <[email protected]>

Clear the flags before adding in the ones computed from lflags. Cc: Wei Liu <[email protected]> Cc: [email protected] Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit 94a159f) Signed-off-by: Michael Tokarev <[email protected]>

The comment about not being able to define a field with zero bits is out of date since 94597b6 ("decodetree: Allow !function with no input bits"). This fixes the missing load of imm in the disassembler. Cc: [email protected] Fixes: 9d8caa6 ("target/avr: Add support for disassembling via option '-d in_asm'") Reviewed-by: Pierrick Bouvier <[email protected]> Signed-off-by: Richard Henderson <[email protected]> (cherry picked from commit 6b661b7) Signed-off-by: Michael Tokarev <[email protected]>

Since commit 62b4a22 the default cpu type can come from the valid_cpu_types[] array. Call the machine_class_default_cpu_type() instead of accessing MachineClass::default_cpu_type field. Cc: [email protected] Fixes: 62b4a22 ("hw/core: Add machine_class_default_cpu_type()") Signed-off-by: Philippe Mathieu-Daudé <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Reviewed-by: Pierrick Bouvier <[email protected]> Reviewed-by: Zhao Liu <[email protected]> Message-Id: <[email protected]> (cherry picked from commit d5f2418) Signed-off-by: Michael Tokarev <[email protected]>

The documentation for the CPUClass::gdb_arch_name method claims that the returned string should be freed with g_free(). This is not correct: in commit a650683 we changed this method to instead return a simple constant string, but forgot to update the documentation. Make the documentation match the new semantics. Fixes: a650683 ("hw/core/cpu: Return static value with gdb_arch_name()") Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Alex Bennée <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Message-ID: <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> (cherry picked from commit 56a9f0d) Signed-off-by: Michael Tokarev <[email protected]>

Fix a wrong conversion to gen_op_addr_addi(). The framesize should be added like it was done before. This bug broke booting OpenWrt MIPS32 BE malta Linux system images generated by OpenWrt. Cc: [email protected] Fixes: d0b24b7 ("target/mips: Use gen_op_addr_addi() when possible") Signed-off-by: Hauke Mehrtens <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Message-ID: <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> (cherry picked from commit d4a785b) Signed-off-by: Michael Tokarev <[email protected]>

The use of gnu_source_prefix in the detection of getcpu() was ineffective because the header file that declares getcpu() when _GNU_SOURCE is defined was not included. Pass sched.h to has_header_symbol() so that the existence of the declaration will be properly checked. Cc: [email protected] Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Tested-by: Philippe Mathieu-Daudé <[email protected]> Message-ID: <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> (cherry picked from commit 563cd69) Signed-off-by: Michael Tokarev <[email protected]>

CONFIG_STATX and CONFIG_STATX_MNT_ID are not used since commit e0dc263 ("virtiofsd: Remove source"). Cc: [email protected] Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Tested-by: Philippe Mathieu-Daudé <[email protected]> Message-ID: <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> (cherry picked from commit 6804b89) Signed-off-by: Michael Tokarev <[email protected]>

gnu_source_prefix defines _GNU_SOURCE for compiler object functions. The definition is universally available in the code base. docs/devel/style.rst also says that the "qemu/osdep.h" header is always included, so files included in the file is also universally available in the code base. Rename gnu_source_prefix to osdep_prefix, and add #include directives that are referred by the users of gnu_source_prefix and contained in qemu/osdep.h to safely de-duplicate #include directives. Cc: [email protected] Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Tested-by: Philippe Mathieu-Daudé <[email protected]> Message-ID: <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> (cherry picked from commit 797150d) Signed-off-by: Michael Tokarev <[email protected]>

macOS SDK may have the symbol of strchrnul(), but it is actually available only on macOS 15.4 or later and that fact is codified in string.h. Include the header file using osdep_prefix to check if the function is available on the deployment target. Cc: [email protected] Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Tested-by: Philippe Mathieu-Daudé <[email protected]> Message-ID: <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> (cherry picked from commit a5b30be) Signed-off-by: Michael Tokarev <[email protected]>

When we changed decode_sleb128 from target_long to int64_t, we failed to adjust the shift limit. Cc: [email protected] Fixes: c9ad8d2 ("tcg: Widen gen_insn_data to uint64_t") Reviewed-by: Pierrick Bouvier <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Richard Henderson <[email protected]> (cherry picked from commit 9401f91) Signed-off-by: Michael Tokarev <[email protected]>

NPCM8XX SoC is the successor of the NPCM7XX. It features quad-core Cortex-A35 (Armv8, 64-bit) CPUs and some additional peripherals. Correct the `valid_cpu_types` setting to match the NPCM8XX SoC. Cc: [email protected] Fixes: 7e70eb3 ("hw/arm: Add NPCM845 Evaluation board") Signed-off-by: Tim Lee <[email protected]> Message-id: [email protected] Reviewed-by: Peter Maydell <[email protected]> Reviewed-by: Tyrone Ting <[email protected]> Signed-off-by: Peter Maydell <[email protected]> (cherry picked from commit 97cdd1b) Signed-off-by: Michael Tokarev <[email protected]>

If the guest code has an ISB or SB insn inside an IT block, we generate incorrect code which trips a TCG assertion: qemu-system-arm: ../tcg/tcg-op.c:3343: void tcg_gen_goto_tb(unsigned int): Assertion `(tcg_ctx->goto_tb_issue_mask & (1 << idx)) == 0' failed. This is because we call gen_goto_tb(dc, 1, ...) twice: brcond_i32 ZF,$0x0,ne,$L1 add_i32 pc,pc,$0x4 goto_tb $0x1 exit_tb $0x73d948001b81 set_label $L1 add_i32 pc,pc,$0x4 goto_tb $0x1 exit_tb $0x73d948001b81 Both calls are in arm_tr_tb_stop(), one for the DISAS_NEXT/DISAS_TOO_MANY handling, and one for the dc->condjump condition-failed codepath. The DISAS_NEXT handling doesn't have this problem because arm_post_translate_insn() does the handling of "emit the label for the condition-failed conditional execution" and so arm_tr_tb_stop() doesn't have dc->condjump set. But for DISAS_TOO_MANY we don't do that. Fix the bug by making arm_post_translate_insn() handle the DISAS_TOO_MANY case. This only affects the SB and ISB insns when used in Thumb mode inside an IT block: only these insns specifically set is_jmp to TOO_MANY, and their A32 encodings are unconditional. For the major TOO_MANY case (breaking the TB because it would cross a page boundary) we do that check and set is_jmp to TOO_MANY only after the call to arm_post_translate_insn(); so arm_post_translate_insn() sees is_jmp == DISAS_NEXT, and we emit the correct code for that situation. With this fix we generate the somewhat more sensible set of TCG ops: brcond_i32 ZF,$0x0,ne,$L1 set_label $L1 add_i32 pc,pc,$0x4 goto_tb $0x1 exit_tb $0x7c5434001b81 (NB: the TCG optimizer doesn't optimize out the jump-to-next, but we can't really avoid emitting it because we don't know at the point we're emitting the handling for the condexec check whether this insn is going to happen to be a nop for us or not.) Cc: [email protected] Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2942 Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Message-id: [email protected] (cherry picked from commit 8ed7c0b) Signed-off-by: Michael Tokarev <[email protected]>

Sphinx requires that labels within documents are unique across the whole manual. This is because the "create a hyperlink" directive specifies only the name of the label, not a filename+label. Some Sphinx versions will warn about duplicate labels, but even if there is no warning there is still an ambiguity and no guarantee that the hyperlink will be created to the right target. For QEMU this is awkward, because we have various .rst.inc fragments which we include into multiple .rst files. If you define a label in the .rst.inc file then it will be a duplicate label. We have mostly worked around this by not putting labels into those .rst.inc files, or by adding "insert a label" functionality into the hxtool extension (see commit 1eeb432 "doc/sphinx/hxtool.py: add optional label argument to SRST directive"). Unfortunately in commit 7f63144 ("docs/devel: add a codebase section") we accidentally added a duplicate label, because not all Sphinx versions warn about the mistake. In this case the link was only from the developer docs codebase summary, so as the simplest fix for the stable branch, we drop the link entirely. Cc: [email protected] Fixes: 1eeb432 "doc/sphinx/hxtool.py: add optional label argument to SRST directive" Reported-by: Dario Faggioli <[email protected]> Signed-off-by: Peter Maydell <[email protected]> Acked-by: Eric Blake <[email protected]> Reviewed-by: Pierrick Bouvier <[email protected]> Message-id: [email protected] (cherry picked from commit 82707dd) Signed-off-by: Michael Tokarev <[email protected]>

According to the i.MX 8M Plus reference manual, a GPIO pin is configured as an output when the corresponding bit in the GDIR register is set. The function imx_gpio_set_int_line() is intended to be a no-op if the pin is configured as an output, returning early in such cases. However, it inverts the condition. Fix this by returning early when the bit is set. cc: [email protected] Fixes: f442728 ("i.MX: Add GPIO device") Signed-off-by: Bernhard Beschow <[email protected]> Message-id: [email protected] Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Peter Maydell <[email protected]> (cherry picked from commit eba837a) Signed-off-by: Michael Tokarev <[email protected]>

Because LSS need not trigger an IRQ shadow, gen_movl_seg can't just use the destination register to decide whether to inhibit IRQs. Add an argument. Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit e54ef98) (back-ported to 10.0) Signed-off-by: Michael Tokarev <[email protected]>

STI will trigger a singlestep exception even if it has inhibit-IRQ behavior. Do not suppress single-step for all IRQ-inhibiting instructions, instead special case MOV SS and POP SS. Cc: [email protected] Fixes: f0f0136 ("target/i386: no single-step exception after MOV or POP SS", 2024-05-25) Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit 1e94ddc) Signed-off-by: Michael Tokarev <[email protected]>

If we have request without lock and hit unlocked or invalid entry during the search, we remap it immediately, even if we have matching entry in next entries in bucket. This leads to duplication of mappings of the same size, and to possibility of selecting the wrong element during invalidation and underflow it's entry->lock counter Signed-off-by: Aleksandr Partanen <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Reviewed-by: Edgar E. Iglesias <[email protected]> Signed-off-by: Edgar E. Iglesias <[email protected]> (cherry picked from commit a4b20f7) Signed-off-by: Michael Tokarev <[email protected]>

Today, we don't track write-abiliy in the cache, if a user requests a readable mapping followed by a writeable mapping on the same page, the second lookup will incorrectly hit the readable entry. Split mapcache_grants by ro and rw access. Grants will now have separate ways in the cache depending on writeability. Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Signed-off-by: Edgar E. Iglesias <[email protected]> (cherry picked from commit 88fb705) Signed-off-by: Michael Tokarev <[email protected]>

…curs According to the i.MX 8M Plus reference manual, the status flag I2C_I2SR[IIF] continues to be set when an interrupt condition occurs even when I2C interrupts are disabled (I2C_I2CR[IIEN] is clear). However, the device model only sets the flag when I2C interrupts are enabled which causes U-Boot to loop forever. Fix the device model by always setting the flag and let I2C_I2CR[IIEN] guard I2C interrupts only. Also remove the comment in the code since it merely stated the obvious and would be outdated now. Cc: [email protected] Fixes: 20d0f9c ("i.MX: Add I2C controller emulator") Signed-off-by: Bernhard Beschow <[email protected]> Acked-by: Corey Minyard <[email protected]> Message-ID: <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> (cherry picked from commit 54e54e5) Signed-off-by: Michael Tokarev <[email protected]>

Even though this function is serialized to be always called from main thread, v9fs_reclaim_fd() is dispatching the coroutine to a worker thread in between via its v9fs_co_*() calls, hence leading to the situation where v9fs_reclaim_fd() is effectively executed multiple times simultaniously, which renders its LRU algorithm useless and causes high latency. Fix this by adding a simple boolean variable to ensure this function is only called once at a time. No synchronization needed for this boolean variable as this function is only entered and returned on main thread. Fixes: 7a46274 ('hw/9pfs: Add file descriptor reclaim support') Signed-off-by: Christian Schoenebeck <[email protected]> Reviewed-by: Greg Kurz <[email protected]> Message-Id: <5c622067efd66dd4ee5eca740dcf263f41db20b2.1741339452.git.qemu_oss@crudebyte.com> (cherry picked from commit 61da38d) Signed-off-by: Michael Tokarev <[email protected]>

This patch fixes two different bugs in v9fs_reclaim_fd(): 1. Reduce latency: This function calls v9fs_co_close() and v9fs_co_closedir() in a loop. Each one of the calls adds two thread hops (between main thread and a fs driver background thread). Each thread hop adds latency, which sums up in function's loop to a significant duration. Reduce overall latency by open coding what v9fs_co_close() and v9fs_co_closedir() do, executing those and the loop itself altogether in only one background thread block, hence reducing the total amount of thread hops to only two. 2. Fix file descriptor leak: The existing code called v9fs_co_close() and v9fs_co_closedir() to close file descriptors. Both functions check right at the beginning if the 9p request was cancelled: if (v9fs_request_cancelled(pdu)) { return -EINTR; } So if client sent a 'Tflush' message, v9fs_co_close() / v9fs_co_closedir() returned without having closed the file descriptor and v9fs_reclaim_fd() subsequently freed the FID without its file descriptor being closed, hence leaking those file descriptors. This 2nd bug is fixed by this patch as well by open coding v9fs_co_close() and v9fs_co_closedir() inside of v9fs_reclaim_fd() and not performing the v9fs_request_cancelled(pdu) check there. Fixes: 7a46274 ('hw/9pfs: Add file descriptor reclaim support') Fixes: bccacf6 ('hw/9pfs: Implement TFLUSH operation') Signed-off-by: Christian Schoenebeck <[email protected]> Reviewed-by: Greg Kurz <[email protected]> Message-Id: <5747469d3f039c53147e850b456943a1d4b5485c.1741339452.git.qemu_oss@crudebyte.com> (cherry picked from commit 89f7b4d) Signed-off-by: Michael Tokarev <[email protected]>

ASAN spotted a leaking string in machine_set_loadparm(): Direct leak of 9 byte(s) in 1 object(s) allocated from: #0 0x560ffb5bb379 in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3 #1 0x7f1aca926518 in g_malloc ../glib/gmem.c:106 #2 0x7f1aca94113e in g_strdup ../glib/gstrfuncs.c:364 #3 0x560ffc8afbf9 in qobject_input_type_str ../qapi/qobject-input-visitor.c:542:12 #4 0x560ffc8a80ff in visit_type_str ../qapi/qapi-visit-core.c:349:10 #5 0x560ffbe6053a in machine_set_loadparm ../hw/s390x/s390-virtio-ccw.c:802:10 #6 0x560ffc0c5e52 in object_property_set ../qom/object.c:1450:5 #7 0x560ffc0d4175 in object_property_set_qobject ../qom/qom-qobject.c:28:10 #8 0x560ffc0c6004 in object_property_set_str ../qom/object.c:1458:15 #9 0x560ffbe2ae60 in update_machine_ipl_properties ../hw/s390x/ipl.c:569:9 #10 0x560ffbe2aa65 in s390_ipl_update_diag308 ../hw/s390x/ipl.c:594:5 #11 0x560ffbdee132 in handle_diag_308 ../target/s390x/diag.c:147:9 #12 0x560ffbebb956 in helper_diag ../target/s390x/tcg/misc_helper.c:137:9 #13 0x7f1a3c51c730 (/memfd:tcg-jit (deleted)+0x39730) Cc: [email protected] Signed-off-by: Fabiano Rosas <[email protected]> Message-ID: <[email protected]> Fixes: 1fd396e ("s390x: Register TYPE_S390_CCW_MACHINE properties as class properties") Reviewed-by: Thomas Huth <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Thomas Huth <[email protected]> (cherry picked from commit bdf12f2) Signed-off-by: Michael Tokarev <[email protected]>

virtio-net expects set_features() will be called when the feature set used by the guest changes to update the number of virtqueues but it is not called during reset, which will clear all features, leaving the queues added for VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS. Not only these extra queues are visible to the guest, they will cause segmentation fault during migration. Call set_features() during reset to remove those queues for virtio-net as we call set_status(). It will also prevent similar bugs for virtio-net and other devices in the future. Fixes: f9d6dbf ("virtio-net: remove virtio queues if the guest doesn't support multiqueue") Buglink: https://issues.redhat.com/browse/RHEL-73842 Cc: [email protected] Signed-off-by: Akihiko Odaki <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 0caed25) Signed-off-by: Michael Tokarev <[email protected]>

Commit cd59f50 caused a regression on nvme hotplugging for devices with an implicit nvm subsystem. The nvme-subsys device was incorrectly left with being marked as non-hotpluggable. Fix this. Cc: [email protected] Reported-by: Stéphane Graber <[email protected]> Tested-by: Stéphane Graber <[email protected]> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2950 Fixes: cd59f50 ("hw/nvme: always initialize a subsystem") Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Klaus Jensen <[email protected]> (cherry picked from commit 0b1c23a) Signed-off-by: Michael Tokarev <[email protected]>

When Smepmp is supported, mseccfg.RLB allows bypassing locks when writing CSRs but should not affect interpretation of actual PMP rules. This is not the case with the current implementation where pmp_hart_has_privs calls pmp_is_locked which implements mseccfg.RLB bypass. This commit implements the correct behavior by removing mseccfg.RLB bypass from pmp_is_locked. RLB bypass when writing CSRs is implemented by adding a new pmp_is_readonly function that calls pmp_is_locked and check mseccfg.RLB. pmp_write_cfg and pmpaddr_csr_write are changed to use this new function. Signed-off-by: Loïc Lefort <[email protected]> Reviewed-by: Alistair Francis <[email protected]> Reviewed-by: Daniel Henrique Barboza <[email protected]> Reviewed-by: LIU Zhiwei <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alistair Francis <[email protected]> Cc: [email protected] (cherry picked from commit 4541d20) Signed-off-by: Michael Tokarev <[email protected]>

Signed-off-by: Loïc Lefort <[email protected]> Reviewed-by: Daniel Henrique Barboza <[email protected]> Reviewed-by: Alistair Francis <[email protected]> Reviewed-by: LIU Zhiwei <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alistair Francis <[email protected]> Cc: [email protected] (cherry picked from commit 915b203) Signed-off-by: Michael Tokarev <[email protected]>

With Machine Mode Lockdown (mseccfg.MML) set and RLB not set, checks on pmpcfg writes would match the wrong cases of Smepmp truth table. The existing code allows writes for the following cases: - L=1, X=0: cases 8, 10, 12, 14 - L=0, RWX!=WX: cases 0-2, 4-6 This leaves cases 3, 7, 9, 11, 13, 15 for which writes are ignored. From the Smepmp specification: "Adding a rule with executable privileges that either is M-mode-only or a locked Shared-Region is not possible (...)" This description matches cases 9-11, 13 of the truth table. This commit implements an explicit check for these cases by using pmp_get_epmp_operation to convert between PMP configuration and Smepmp truth table cases. Signed-off-by: Loïc Lefort <[email protected]> Reviewed-by: Daniel Henrique Barboza <[email protected]> Reviewed-by: LIU Zhiwei <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alistair Francis <[email protected]> Cc: [email protected] (cherry picked from commit 19cf1a7) Signed-off-by: Michael Tokarev <[email protected]>

qtest_set_command_cb passed to g_once should match GThreadFunc, which it does not. But using g_once is actually unnecessary, because the function is called by riscv_harts_realize() under the Big QEMU Lock. Reported-by: Kohei Tokunaga <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Alistair Francis <[email protected]> Reviewed-by: Kohei Tokunaga <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alistair Francis <[email protected]> Cc: [email protected] (cherry picked from commit 56cde18) Signed-off-by: Michael Tokarev <[email protected]>

s->pool has n_descs elements so maximum i should be n_descs - 1. Fix the upper bound. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: cb039ef ("net: add initial support for AF_XDP network backend") Cc: [email protected] Reviewed-by: Ilya Maximets <[email protected]> Signed-off-by: Anastasia Belova <[email protected]> Signed-off-by: Jason Wang <[email protected]> (cherry picked from commit 110d0fa) Signed-off-by: Michael Tokarev <[email protected]>

virtio_net_pre_load_queues() inspects vdev->guest_features to tell if VIRTIO_NET_F_RSS or VIRTIO_NET_F_MQ is enabled to infer the required number of queues. This works for VIRTIO_NET_F_MQ but it doesn't for VIRTIO_NET_F_RSS because only the lowest 32 bits of vdev->guest_features is set at the point and VIRTIO_NET_F_RSS uses bit 60 while VIRTIO_NET_F_MQ uses bit 22. Instead of inferring the required number of queues from vdev->guest_features, use the number loaded from the vm state. This change also has a nice side effect to remove a duplicate peer queue pair change by circumventing virtio_net_set_multiqueue(). Also update the comment in include/hw/virtio/virtio.h to prevent an implementation of pre_load_queues() from refering to any fields being loaded during migration by accident in the future. Fixes: 8c49756 ("virtio-net: Add only one queue pair when realizing") Tested-by: Lei Yang <[email protected]> Cc: [email protected] Signed-off-by: Akihiko Odaki <[email protected]> Signed-off-by: Jason Wang <[email protected]> (cherry picked from commit adda0ad) Signed-off-by: Michael Tokarev <[email protected]>

The definitions encoding the maximum Virtual, Physical, and Guest Virtual Address sizes supported by the IOMMU are using incorrect offsets i.e. the VASize and GVASize offsets are switched. The value in the GVAsize field is also modified, since it was incorrectly encoded. Cc: [email protected] Fixes: d29a09c ("hw/i386: Introduce AMD IOMMU") Co-developed-by: Ethan MILON <[email protected]> Signed-off-by: Ethan MILON <[email protected]> Signed-off-by: Alejandro Jimenez <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 091c7d7) Signed-off-by: Michael Tokarev <[email protected]>

The DeviceID bits are extracted using an incorrect offset in the call to amdvi_iotlb_remove_page(). This field is read (correctly) earlier, so use the value already retrieved for devid. Cc: [email protected] Fixes: d29a09c ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Alejandro Jimenez <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit c63b8d1) Signed-off-by: Michael Tokarev <[email protected]>

The DTE validation method verifies that all bits in reserved DTE fields are unset. Update them according to the latest definition available in AMD I/O Virtualization Technology (IOMMU) Specification - Section 2.2.2.1 Device Table Entry Format. Remove the magic numbers and use a macro helper to generate bitmasks covering the specified ranges for better legibility. Note that some reserved fields specify that events are generated when they contain non-zero bits, or checks are skipped under certain configurations. This change only updates the reserved masks, checks for special conditions are not yet implemented. Cc: [email protected] Signed-off-by: Alejandro Jimenez <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit ff3dcb3) Signed-off-by: Michael Tokarev <[email protected]>

Address various issues with definitions of the MMIO registers e.g. for the Device Table Address Register, the size mask currently encompasses reserved bits [11:9], so change it to only extract the bits [8:0] encoding size. Convert masks to use GENMASK64 for consistency, and make unrelated definitions independent. Cc: [email protected] Fixes: d29a09c ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Alejandro Jimenez <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 108e10f) Signed-off-by: Michael Tokarev <[email protected]>

Fix an off-by-one error in the definition of AMDVI_IR_PHYS_ADDR_MASK. The current definition masks off the most significant bit of the Interrupt Table Root ptr i.e. it only generates a mask with bits [50:6] set. See the AMD I/O Virtualization Technology (IOMMU) Specification for the Interrupt Table Root Pointer[51:6] field in the Device Table Entry format. Cc: [email protected] Fixes: b44159f ("x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled") Signed-off-by: Alejandro Jimenez <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 123cf4b) Signed-off-by: Michael Tokarev <[email protected]>

Correctly calculate the Device Table size using the format encoded in the Device Table Base Address Register (MMIO Offset 0000h). Cc: [email protected] Fixes: d29a09c ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Alejandro Jimenez <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 67d3077) Signed-off-by: Michael Tokarev <[email protected]>

No functional change. Signed-off-by: Alejandro Jimenez <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 5959b64) Signed-off-by: Michael Tokarev <[email protected]>

The variable `oldval` was incorrectly declared as a 32-bit `uint32_t`. This could lead to truncation and incorrect behavior where the upper read-only 32 bits are significant. Fix the type of `oldval` to match the return type of `ldq_le_p()`. Cc: [email protected] Fixes: d29a09c ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Ethan Milon <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 5788929) Signed-off-by: Michael Tokarev <[email protected]>

For aio=threads, we're currently not implementing REQ_FUA in any useful way, but just do a separate raw_co_flush_to_disk() call. This changes behaviour compared to the old state, which used bdrv_co_flush() with its optimisations. As a quick fix, call bdrv_co_flush() again like before. Eventually, we can use pwritev2() to make use of RWF_DSYNC if available, but we'll still have to keep this code path as a fallback, so this fix is required either way. While the fix itself is a one-liner, some new graph locking annotations are needed to convince TSA that the locking is correct. Cc: [email protected] Fixes: 984a32f ("file-posix: Support FUA writes") Buglink: https://issues.redhat.com/browse/RHEL-96854 Reported-by: Tingting Mao <[email protected]> Signed-off-by: Kevin Wolf <[email protected]> Message-ID: <[email protected]> Reviewed-by: Eric Blake <[email protected]> Signed-off-by: Kevin Wolf <[email protected]> (cherry picked from commit d402da1) Signed-off-by: Michael Tokarev <[email protected]>

This was fixed in c9d7752 for 9.2.0 but regressed in db34be3 in 10.0.0 When the bit is present, rpmbuild complains about missing ELF build-id Signed-off-by: Cole Robinson <[email protected]> Reviewed-by: Daniel P. Berrangé <[email protected]> Acked-by: Helge Deller <[email protected]> Message-ID: <52d0edfbb9b2f63a866f0065a721f3a95da6f8ba.1747590860.git.crobinso@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> (cherry picked from commit a598090) Signed-off-by: Michael Tokarev <[email protected]>

When we unplug a vhost device, we end up calling vhost_dev_cleanup() where we do a memory_listener_unregister(). This memory_listener_unregister() call will end up disconnecting the listener from the address space through listener_del_address_space(). In that process, we effectively communicate the removal of all memory regions from that listener, resulting in region_del() + commit() callbacks getting triggered. So in case of vhost, we end up calling vhost_commit() with no remaining memory slots (0). In vhost_commit() we end up overwriting the global variables used_memslots / used_shared_memslots, used for detecting the number of free memslots. With used_memslots / used_shared_memslots set to 0 by vhost_commit() during device removal, we'll later assume that the other vhost devices still have plenty of memslots left when calling vhost_get_free_memslots(). Let's fix it by simply removing the global variables and depending only on the actual per-device count. Easy to reproduce by adding two vhost-user devices to a VM and then hot-unplugging one of them. While at it, detect unexpected underflows in vhost_get_free_memslots() and issue a warning. Reported-by: yuanminghao <[email protected]> Link: https://lore.kernel.org/qemu-devel/[email protected]/ Fixes: 2ce68e4 ("vhost: add vhost_has_free_slot() interface") Cc: Igor Mammedov <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Stefano Garzarella <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Message-Id: <[email protected]> Reviewed-by: Igor Mammedov <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 9f74912) Signed-off-by: Michael Tokarev <[email protected]>

vnc_worker_thread_loop() copies z_stream stored in its local VncState to the persistent VncState, and the copied one is freed with deflateEnd() later. However, deflateEnd() refuses to operate with a copied z_stream and returns Z_STREAM_ERROR, leaking the allocated memory. Avoid copying the zlib state to fix the memory leak. Fixes: bd023f9 ("vnc: threaded VNC server") Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Marc-André Lureau <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Message-Id: <[email protected]> (cherry picked from commit aef2233) Signed-off-by: Michael Tokarev <[email protected]>

The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM Vol2: Bits 23-16: Maximum number of addressable IDs for logical processors in this physical package. When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is apic_id, 2) bits [23:16] get truncated. Specifically, if launching the VM with -smp 256, the value written to EBX[23:16] is 0 because of data overflow. If the guest only supports legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs, the return of the kernel invoking cpu_smt_allowed() is false and APs (application processors) will fail to bring up. Then only CPU 0 is online, and others are offline. For example, launch VM via: qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \ -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \ -drive file=guest.img,if=none,id=virtio-disk0,format=raw \ -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic The guest shows: CPU(s): 256 On-line CPU(s) list: 0 Off-line CPU(s) list: 1-255 To avoid this issue caused by overflow, limit the max value written to EBX[23:16] to 255 as the HW does. Cc: [email protected] Reviewed-by: Xiaoyao Li <[email protected]> Signed-off-by: Qian Wen <[email protected]> Signed-off-by: Zhao Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit a62fef5) (Mjt: fixup for 10.0.x series due to missing v10.0.0-2217-gf985a1195b "i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]") Signed-off-by: Michael Tokarev <[email protected]>

According to SDM, CPUID.0x4:EAX[31:26] indicates the Maximum number of addressable IDs for processor cores in the physical package. If we launch over 64 cores VM, the 6-bit field will overflow, and the wrong core_id number will be reported. Since the HW reports 0x3f when the intel processor has over 64 cores, limit the max value written to EAX[31:26] to 63, so max num_cores should be 64. For EAX[14:25], though at present Q35 supports up to 4096 CPUs, by constructing a specific topology, the width of the APIC ID can be extended beyond 12 bits. For example, using `-smp threads=33,cores=9, modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which can also cause overflow. check and honor the maximum value for EAX[14:25] as well. In addition, for host-cache-info case, also apply the same checks and fixes. Reviewed-by: Xiaoyao Li <[email protected]> Signed-off-by: Qian Wen <[email protected]> Signed-off-by: Zhao Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit 3e86124) Signed-off-by: Michael Tokarev <[email protected]>

CPUID.8000001DH:EAX[25:14] is "NumSharingCache", and the number of logical processors sharing this cache is the value of this field incremented by 1. Because of its width limitation, the maximum value currently supported is 4095. Though at present Q35 supports up to 4096 CPUs, by constructing a specific topology, the width of the APIC ID can be extended beyond 12 bits. For example, using `-smp threads=33,cores=9,modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which can also cause overflow. Check and honor the maximum value as CPUID.04H did. Cc: Babu Moger <[email protected]> Signed-off-by: Zhao Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit 5d21ee4) Signed-off-by: Michael Tokarev <[email protected]>

KVM emulates the ARCH_CAPABILITIES on x86 for both Intel and AMD cpus, although the IA32_ARCH_CAPABILITIES MSR is an Intel-specific MSR and it makes no sense to emulate it on AMD. As a consequence, VMs created on AMD with qemu -cpu host and using KVM will advertise the ARCH_CAPABILITIES feature and provide the IA32_ARCH_CAPABILITIES MSR. This can cause issues (like Windows BSOD) as the guest OS might not expect this MSR to exist on such cpus (the AMD documentation specifies that ARCH_CAPABILITIES feature and MSR are not defined on the AMD architecture). A fix was proposed in KVM code, however KVM maintainers don't want to change this behavior that exists for 6+ years and suggest changes to be done in QEMU instead. Therefore, hide the bit from "-cpu host": migration of -cpu host guests is only possible between identical host kernel and QEMU versions, therefore this is not a problematic breakage. If a future AMD machine does include the MSR, that would re-expose the Windows guest bug; but it would not be KVM/QEMU's problem at that point, as we'd be following a genuine physical CPU impl. Reported-by: Alexandre Chartre <[email protected]> Suggested-by: Daniel P. Berrangé <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> (cherry picked from commit d3a2413) Signed-off-by: Michael Tokarev <[email protected]>

The transmit loop in gmac_try_send_next_packet() is constructed in a way that means it will send incorrect data if it it sends more than one packet. The function assembles the outbound data in a dynamically allocated block of memory which is pointed to by tx_send_buffer. We track the first point in this block of memory which is not yet used with the prev_buf_size offset, initially zero. We track the size of the packet we're sending with the length variable, also initially zero. As we read chunks of data out of guest memory, we write them to tx_send_buffer[prev_buf_size], and then increment both prev_buf_size and length. (We might dynamically reallocate the buffer if needed.) When we send a packet, we checksum and send length bytes, starting at tx_send_buffer, and then we reset length to 0. This gives the right data for the first packet. But we don't reset prev_buf_size. This means that if we process more descriptors with further data for the next packet, that data will continue to accumulate at offset prev_buf_size, i.e. after the data for the first packet. But when we transmit that second packet, we send length bytes from tx_send_buffer, so we will send a packet which has the length of the second packet but the data of the first one. The fix for this is to also clear prev_buf_size after the packet has been sent -- we never need the data from packet one after we've sent it, so we can write packet two's data starting at the beginning of the buffer. Cc: [email protected] Signed-off-by: Peter Maydell <[email protected]> Signed-off-by: Jason Wang <[email protected]> (cherry picked from commit 871a6e5) Signed-off-by: Michael Tokarev <[email protected]>

When a VNC client sends a "set pixel format" message, the 'client_endian' field will get initialized, however, it is valid to omit this message if the client wants to use the server's native pixel format. In the latter scenario nothing is initializing the 'client_endian' field, so it remains set to 0, matching neither G_LITTLE_ENDIAN nor G_BIG_ENDIAN. This then results in pixel format conversion routines taking the wrong code paths. This problem existed before the 'client_be' flag was changed into the 'client_endian' value, but the lack of initialization meant it semantically defaulted to little endian, so only big endian systems would potentially be exposed to incorrect pixel translation. The 'virt-viewer' / 'remote-viewer' apps always send a "set pixel format" message so aren't exposed to any problems, but the classical 'vncviewer' app will show the problem easily. Fixes: 7ed9671 Reported-by: Thomas Huth <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Reviewed-by: Marc-André Lureau <[email protected]> Signed-off-by: Daniel P. Berrangé <[email protected]> (cherry picked from commit 3ac6daa) Signed-off-by: Michael Tokarev <[email protected]>

We don't implement the Debug Communications Channel (DCC), but we do attempt to provide dummy versions of its system registers so that software that tries to access them doesn't fall over. However, we got the tx/rx register definitions wrong. These should be: AArch32: DBGDTRTX p14 0 c0 c5 0 (on writes) DBGDTRRX p14 0 c0 c5 0 (on reads) AArch64: DBGDTRTX_EL0 2 3 0 5 0 (on writes) DBGDTRRX_EL0 2 3 0 5 0 (on reads) DBGDTR_EL0 2 3 0 4 0 (reads and writes) where DBGDTRTX and DBGDTRRX are effectively different names for the same 32-bit register, which has tx behaviour on writes and rx behaviour on reads. The AArch64-only DBGDTR_EL0 is a 64-bit wide register whose top and bottom halves map to the DBGDTRRX and DBGDTRTX registers. Currently we have just one cpreg struct, which: * calls itself DBGDTR_EL0 * uses the DBGDTRTX_EL0/DBGDTRRX_EL0 encoding * is marked as ARM_CP_STATE_BOTH but has the wrong opc1 value for AArch32 * is implemented as RAZ/WI Correct the encoding so: * we name the DBGDTRTX/DBGDTRRX register correctly * we split it into AA64 and AA32 versions so we can get the AA32 encoding right * we implement DBGDTR_EL0 at its correct encoding Cc: [email protected] Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2986 Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Message-id: [email protected] (cherry picked from commit 655659a) Signed-off-by: Michael Tokarev <[email protected]>

Commit a226098 ("hvf: arm: Add support for GICv3") added GICv3 support by implementing emulation for a few system registers. ICC_RPR_EL1 was defined but not plugged in the sysreg handlers (for no good reason). Fix it. Fixes: a226098 ("hvf: arm: Add support for GICv3") Signed-off-by: Zenghui Yu <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]> (cherry picked from commit e6da704) Signed-off-by: Michael Tokarev <[email protected]>

Signed-off-by: Michael Tokarev <[email protected]>

v10.0.3 release

Ewan Hai and others added 30 commits April 24, 2025 10:25

naya451 and others added 24 commits July 15, 2025 23:06

Update version for 10.0.3 release

66d2164

Signed-off-by: Michael Tokarev <[email protected]>

Merge tag 'v10.0.3' into update_qemu_v10_0_3

9e5b6ba

v10.0.3 release

rmalmain changed the title ~~Update QEMU ti V10.0.3~~ Update QEMU to V10.0.3 Aug 12, 2025

rmalmain added 2 commits August 12, 2025 13:52

fix typing issue

0993469

update ci

124e7a4

rmalmain mentioned this pull request Aug 12, 2025

Make breakpoints work both in GDB and libafl qemu concurrently #117

Merged

rmalmain merged commit af27154 into main Aug 12, 2025
1 check passed

rmalmain deleted the update_qemu_v10_0_3 branch August 12, 2025 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update QEMU to V10.0.3 #116

Update QEMU to V10.0.3 #116

Uh oh!

rmalmain commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

52 participants

Update QEMU to V10.0.3 #116

Update QEMU to V10.0.3 #116

Uh oh!

Conversation

rmalmain commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

52 participants