Rebase tech/net/ath to baseline #323

miaoqing-quic · 2025-11-19T09:23:50Z

No description provided.

Now we use virtual addresses to fill CSR_MERRENTRY/CSR_TLBRENTRY, but hardware hope physical addresses. Now it works well because the high bits are ignored above PA_BITS (48 bits), but explicitly use physical addresses can avoid potential bugs. So fix it. Cc: [email protected] Signed-off-by: Huacai Chen <[email protected]>

1. Use phys_addr_t instead of u64, which can work for both 32/64 bits. 2. Check whether the input physical address is above TO_PHYS_MASK (and return NULL if yes) for the DMW version. Note: In theory early_ioremap() also need the TO_PHYS_MASK checking, but the UEFI BIOS pass some DMW virtual addresses. Cc: [email protected] Signed-off-by: Jiaxun Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

Now there 5 places which calculate max_pfn & max_low_pfn: 1. in fdt_setup() for FDT systems; 2. in memblock_init() for ACPI systems; 3. in init_numa_memory() for NUMA systems; 4. in arch_mem_init() to recalculate for "mem=" cmdline; 5. in paging_init() to recalculate for NUMA systems. Since memblock_init() is called both for ACPI and FDT systems, move the calculation out of the for_each_efi_memory_desc() loop can eliminate the first case. The last case is very questionable (may be derived from the MIPS/Loongson code) and breaks the "mem=" cmdline, so should be removed. And then the NUMA version of paging_init() can be also eliminated. After consolidation there are 3 places of calculation: 1. in memblock_init() for both ACPI and FDT systems; 2. in init_numa_memory() to recalculate for NUMA systems; 3. in arch_mem_init() to recalculate for the "mem=" cmdline. For all cases the calculation is: max_pfn = PFN_DOWN(memblock_end_of_DRAM()); max_low_pfn = min(PFN_DOWN(HIGHMEM_START), max_pfn); Cc: [email protected] Signed-off-by: Huacai Chen <[email protected]>

Now if the PTE/PMD is dirty with _PAGE_DIRTY but without _PAGE_MODIFIED, after {pte,pmd}_modify() we lose _PAGE_DIRTY, then {pte,pmd}_dirty() return false and lead to data loss. This can happen in certain scenarios such as HW PTW doesn't set _PAGE_MODIFIED automatically, so here we need _PAGE_MODIFIED to record the dirty status (_PAGE_DIRTY). The new modification involves checking whether the original PTE/PMD has the _PAGE_DIRTY flag. If it exists, the _PAGE_MODIFIED bit is also set, ensuring that the {pte,pmd}_dirty() interface can always return accurate information. Cc: [email protected] Co-developed-by: Liupu Wang <[email protected]> Signed-off-by: Liupu Wang <[email protected]> Signed-off-by: Tianyang Zhang <[email protected]>

Remove the unnecessary __GFP_HIGHMEM masking in pud_alloc_one(), which was introduced with commit 3827397 ("loongarch: convert various functions to use ptdescs"). GFP_KERNEL doesn't contain __GFP_HIGHMEM. Signed-off-by: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

(1) Use the existing CPUCFG6_PMNUM_SHIFT macro definition instead of the magic value 4 to get the PMU number. (2) Detect the value of PMU bits via CPUCFG instruction according to the ISA manual instead of hard-coded as 64, because the value may be different for various micro-architectures. Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_cpucfg Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

CSR.FWPC and CSR.MWPC are 32bit registers, so use csr_read32() rather than csr_read64() to read the values of FWPC/MWPC. Cc: [email protected] Fixes: edffa33 ("LoongArch: Add hardware breakpoints/watchpoints support") Signed-off-by: Huacai Chen <[email protected]>

The kexec_buf structure was previously declared without initialization. commit bf454ec ("kexec_file: allow to place kexec_buf randomly") added a field that is always read but not consistently populated by all architectures. This un-initialized field will contain garbage. This is also triggering a UBSAN warning when the uninitialized data is accessed: ------------[ cut here ]------------ UBSAN: invalid-load in ./include/linux/kexec.h:210:10 load of value 252 is not a valid value for type '_Bool' Zero-initializing kexec_buf at declaration ensures all fields are cleanly set, preventing future instances of uninitialized memory being used. Fixes: bf454ec ("kexec_file: allow to place kexec_buf randomly") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Youling Tang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

When specifying '-d' for kexec_file_load interface, loaded locations of kernel/initrd/cmdline etc can be printed out to help debug. Commit eb7622d ("kexec_file, riscv: print out debugging message if required") fixes the same issue on RISC-V. So, remove kexec_image_info() because the content has been printed out in generic code. And on Loongson-3A5000, the printed messages look like below: kexec_file: kernel: 00000000d9aad283 kernel_size: 0x2e77f30 kexec_file(EFI): No LoongArch PE image header. kexec_file: Loaded initrd at 0x80000000 bufsz=0x1637cd0 memsz=0x1638000 kexec_file(ELF): Loaded kernel at 0x9c20000 bufsz=0x27f1800 memsz=0x2950000 kexec_file: nr_segments = 2 kexec_file: segment[0]: buf=0x00000000cc3e6c33 bufsz=0x27f1800 mem=0x9c20000 memsz=0x2950000 kexec_file: segment[1]: buf=0x00000000bb75a541 bufsz=0x1637cd0 mem=0x80000000 memsz=0x1638000 kexec_file: kexec_file_load: type:0, start:0xb15d000 head:0x18db60002 flags:0x8 Signed-off-by: Qiang Ma <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

With secondary MMU page table, if there is a read page fault, the page's write attribute will not set even if it is writable from master MMU page table. This logic only works if dirty tracking is enabled, so page table should be set with _PAGE_WRITE if dirty tracking is disabled. It reduces extra page fault on secondary MMU page table if a VM finishes migration, when the master MMU page table is ready and the secondary MMU page is fresh. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

When timer is fired in oneshot mode, CSR.TVAL will stop with value -1 rather than 0. However when the register CSR.TVAL is restored, it will continue to count down rather than stop there. Now the method is to write 0 to CSR.TVAL, wait to count down for 1 cycle at least, which is 10ns with a timer freq 100MHz, and then retore timer interrupt status. Here add 2 cycles delay to assure that timer interrupt is injected. With this patch, timer selftest case passes to run always. Cc: [email protected] Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

On LoongArch system, guest PMU hardware is shared by guest and host but PMU interrupt is separated. PMU is pass-through to VM, and there is PMU context switch when exit to host and return to guest. There is optimiation to check whether PMU is enabled by guest. If not, it is not necessary to return to guest. However, if it is enabled, PMU context for guest need switch on. Now KVM_REQ_PMU notification is set on vCPU context switch, but it is missing if there is no vCPU context switch while PMU is used by guest VM, so fix it. Cc: <[email protected]> Fixes: f4e40ea ("LoongArch: KVM: Add PMU support for guest") Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

PMU hardware about VM is switched on VM exit to host rather than vCPU context sched off, PMU is checked and restored on return to VM. It is not necessary to check PMU on vCPU context sched on callback, since the request is made on the VM exit entry or VM PMU CSR access abort routine already. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

VM fails to boot with 256 vCPUs, the detailed command is qemu-system-loongarch64 -smp 256 and there is an error reported as follows: KVM_LOONGARCH_EXTIOI_INIT_NUM_CPU failed: Invalid argument There is typo issue in function kvm_eiointc_ctrl_access() when set max supported vCPUs. Cc: [email protected] Fixes: 47256c4 ("LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access()") Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

Page cache folios from a file system that support large block size (LBS) can have minimal folio order greater than 0, thus a high order folio might not be able to be split down to order-0. Commit e220917 ("mm: split a folio in minimum folio order chunks") bumps the target order of split_huge_page*() to the minimum allowed order when splitting a LBS folio. This causes confusion for some split_huge_page*() callers like memory failure handling code, since they expect after-split folios all have order-0 when split succeeds but in reality get min_order_for_split() order folios and give warnings. Fix it by failing a split if the folio cannot be split to the target order. Rename try_folio_split() to try_folio_split_to_order() to reflect the added new_order parameter. Remove its unused list parameter. [The test poisons LBS folios, which cannot be split to order-0 folios, and also tries to poison all memory. The non split LBS folios take more memory than the test anticipated, leading to OOM. The patch fixed the kernel warning and the test needs some change to avoid OOM.] Link: https://lkml.kernel.org/r/[email protected] Fixes: e220917 ("mm: split a folio in minimum folio order chunks") Signed-off-by: Zi Yan <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Luis Chamberlain <[email protected]> Reviewed-by: Pankaj Raghav <[email protected]> Reviewed-by: Wei Yang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Barry Song <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dev Jain <[email protected]> Cc: Jane Chu <[email protected]> Cc: Lance Yang <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mariano Pache <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Christian Brauner <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Patch series "KHO: kfence + KHO memory corruption fix", v3. This series fixes a memory corruption bug in KHO that occurs when KFENCE is enabled. The root cause is that KHO metadata, allocated via kzalloc(), can be randomly serviced by kfence_alloc(). When a kernel boots via KHO, the early memblock allocator is restricted to a "scratch area". This forces the KFENCE pool to be allocated within this scratch area, creating a conflict. If KHO metadata is subsequently placed in this pool, it gets corrupted during the next kexec operation. Google is using KHO and have had obscure crashes due to this memory corruption, with stacks all over the place. I would prefer this fix to be properly backported to stable so we can also automatically consume it once we switch to the upstream KHO. Patch 1/3 introduces a debug-only feature (CONFIG_KEXEC_HANDOVER_DEBUG) that adds checks to detect and fail any operation that attempts to place KHO metadata or preserved memory within the scratch area. This serves as a validation and diagnostic tool to confirm the problem without affecting production builds. Patch 2/3 Increases bitmap to PAGE_SIZE, so buddy allocator can be used. Patch 3/3 Provides the fix by modifying KHO to allocate its metadata directly from the buddy allocator instead of slab. This bypasses the KFENCE interception entirely. This patch (of 3): It is invalid for KHO metadata or preserved memory regions to be located within the KHO scratch area, as this area is overwritten when the next kernel is loaded, and used early in boot by the next kernel. This can lead to memory corruption. Add checks to kho_preserve_* and KHO's internal metadata allocators (xa_load_or_alloc, new_chunk) to verify that the physical address of the memory does not overlap with any defined scratch region. If an overlap is detected, the operation will fail and a WARN_ON is triggered. To avoid performance overhead in production kernels, these checks are enabled only when CONFIG_KEXEC_HANDOVER_DEBUG is selected. [[email protected]: fix KEXEC_HANDOVER_DEBUG Kconfig dependency] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: build fix] Link: https://lkml.kernel.org/r/CA+CK2bBnorfsTymKtv4rKvqGBHs=y=MjEMMRg_tE-RME6n-zUw@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: fc33e4b ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <[email protected]> Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Pratyush Yadav <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Christian Brauner <[email protected]> Cc: David Matlack <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Samiullah Khawaja <[email protected]> Cc: Tejun Heo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

KHO memory preservation metadata is preserved in 512 byte chunks which requires their allocation from slab allocator. Slabs are not safe to be used with KHO because of kfence, and because partial slabs may lead leaks to the next kernel. Change the size to be PAGE_SIZE. The kfence specifically may cause memory corruption, where it randomly provides slab objects that can be within the scratch area. The reason for that is that kfence allocates its objects prior to KHO scratch is marked as CMA region. While this change could potentially increase metadata overhead on systems with sparsely preserved memory, this is being mitigated by ongoing work to reduce sparseness during preservation via 1G guest pages. Furthermore, this change aligns with future work on a stateless KHO, which will also use page-sized bitmaps for its radix tree metadata. Link: https://lkml.kernel.org/r/[email protected] Fixes: fc33e4b ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: Pratyush Yadav <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Christian Brauner <[email protected]> Cc: David Matlack <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Samiullah Khawaja <[email protected]> Cc: Tejun Heo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

KHO allocates metadata for its preserved memory map using the slab allocator via kzalloc(). This metadata is temporary and is used by the next kernel during early boot to find preserved memory. A problem arises when KFENCE is enabled. kzalloc() calls can be randomly intercepted by kfence_alloc(), which services the allocation from a dedicated KFENCE memory pool. This pool is allocated early in boot via memblock. When booting via KHO, the memblock allocator is restricted to a "scratch area", forcing the KFENCE pool to be allocated within it. This creates a conflict, as the scratch area is expected to be ephemeral and overwriteable by a subsequent kexec. If KHO metadata is placed in this KFENCE pool, it leads to memory corruption when the next kernel is loaded. To fix this, modify KHO to allocate its metadata directly from the buddy allocator instead of slab. Link: https://lkml.kernel.org/r/[email protected] Fixes: fc33e4b ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <[email protected]> Reviewed-by: Pratyush Yadav <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: David Matlack <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Samiullah Khawaja <[email protected]> Cc: Tejun Heo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

The order check and fallback loop is updating the index value on every loop. This will cause the index to be wrongly aligned by a larger value while the loop shrinks the order. This may result in inserting and returning a folio of the wrong index and cause data corruption with some userspace workloads [1]. [[email protected]: introduce a temporary variable to improve code] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-mm/CAMgjq7DqgAmj25nDUwwu1U2cSGSn8n4-Hqpgottedy0S6YYeUw@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-mm/CAMgjq7DqgAmj25nDUwwu1U2cSGSn8n4-Hqpgottedy0S6YYeUw@mail.gmail.com/ [1] Fixes: e7a2ab7 ("mm: shmem: add mTHP support for anonymous shmem") Closes: https://lore.kernel.org/linux-mm/CAMgjq7DqgAmj25nDUwwu1U2cSGSn8n4-Hqpgottedy0S6YYeUw@mail.gmail.com/ Signed-off-by: Kairui Song <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Reviewed-by: Barry Song <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Cc: Dev Jain <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nico Pache <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

If no stack depot is allocated yet, due to masking out __GFP_RECLAIM flags kmsan called from kmalloc cannot allocate stack depot. kmsan fails to record origin and report issues. This may result in KMSAN failing to report issues. Reusing flags from kmalloc without modifying them should be safe for kmsan. For example, such chain of calls is possible: test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof -> slab_alloc_node -> slab_post_alloc_hook -> kmsan_slab_alloc -> kmsan_internal_poison_memory. Only when it is called in a context without flags present should __GFP_RECLAIM flags be masked. With this change all kmsan tests start working reliably. Eric reported: : Yes, KMSAN seems to be at least partially broken currently. Besides the : fact that the kmsan KUnit test is currently failing (which I reported at : https://lore.kernel.org/r/20250911175145.GA1376@sol), I've confirmed that : the poly1305 KUnit test causes a KMSAN warning with Aleksei's patch : applied but does not cause a warning without it. The warning did get : reached via syzbot somehow : (https://lore.kernel.org/r/751b3d80293a6f599bb07770afcef24f623c7da0.1761026343.git.xiaopei01@kylinos.cn/), : so KMSAN must still work in some cases. But it didn't work for me. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/20251022030213.GA35717@sol Fixes: 97769a5 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation") Signed-off-by: Aleksei Nikiforov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Tested-by: Eric Biggers <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Marco Elver <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

…_item Currently, scan_get_next_rmap_item() walks every page address in a VMA to locate mergeable pages. This becomes highly inefficient when scanning large virtual memory areas that contain mostly unmapped regions, causing ksmd to use large amount of cpu without deduplicating much pages. This patch replaces the per-address lookup with a range walk using walk_page_range(). The range walker allows KSM to skip over entire unmapped holes in a VMA, avoiding unnecessary lookups. This problem was previously discussed in [1]. Consider the following test program which creates a 32 TiB mapping in the virtual address space but only populates a single page: #include <unistd.h> #include <stdio.h> #include <sys/mman.h> /* 32 TiB */ const size_t size = 32ul * 1024 * 1024 * 1024 * 1024; int main() { char *area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0); if (area == MAP_FAILED) { perror("mmap() failed\n"); return -1; } /* Populate a single page such that we get an anon_vma. */ *area = 0; /* Enable KSM. */ madvise(area, size, MADV_MERGEABLE); pause(); return 0; } $ ./ksm-sparse & $ echo 1 > /sys/kernel/mm/ksm/run Without this patch ksmd uses 100% of the cpu for a long time (more then 1 hour in my test machine) scanning all the 32 TiB virtual address space that contain only one mapped page. This makes ksmd essentially deadlocked not able to deduplicate anything of value. With this patch ksmd walks only the one mapped page and skips the rest of the 32 TiB virtual address space, making the scan fast using little cpu. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-mm/[email protected]/ [1] Fixes: 31dbd01 ("ksm: Kernel SamePage Merging") Signed-off-by: Pedro Demarchi Gomes <[email protected]> Co-developed-by: David Hildenbrand <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Reported-by: craftfever <[email protected]> Closes: https://lkml.kernel.org/r/[email protected] Suggested-by: David Hildenbrand <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Chengming Zhou <[email protected]> Cc: xu xin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

…order folio split clears PG_has_hwpoisoned, but the flag should be preserved in after-split folios containing pages with PG_hwpoisoned flag if the folio is split to >0 order folios. Scan all pages in a to-be-split folio to determine which after-split folios need the flag. An alternatives is to change PG_has_hwpoisoned to PG_maybe_hwpoisoned to avoid the scan and set it on all after-split folios, but resulting false positive has undesirable negative impact. To remove false positive, caller of folio_test_has_hwpoisoned() and folio_contain_hwpoisoned_page() needs to do the scan. That might be causing a hassle for current and future callers and more costly than doing the scan in the split code. More details are discussed in [1]. This issue can be exposed via: 1. splitting a has_hwpoisoned folio to >0 order from debugfs interface; 2. truncating part of a has_hwpoisoned folio in truncate_inode_partial_folio(). And later accesses to a hwpoisoned page could be possible due to the missing has_hwpoisoned folio flag. This will lead to MCE errors. Link: https://lore.kernel.org/all/CAHbLzkoOZm0PXxE9qwtF4gKR=cpRXrSrJ9V9Pm2DJexs985q4g@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/[email protected] Fixes: c010d47 ("mm: thp: split huge page to any lower order pages") Signed-off-by: Zi Yan <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Lance Yang <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Reviewed-by: Wei Yang <[email protected]> Cc: Pankaj Raghav <[email protected]> Cc: Barry Song <[email protected]> Cc: Dev Jain <[email protected]> Cc: Jane Chu <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Luis Chamberalin <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Nico Pache <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Pde is erased from subdir rbtree through rb_erase(), but not set the node to EMPTY, which may result in uaf access. We should use RB_CLEAR_NODE() set the erased node to EMPTY, then pde_subdir_next() will return NULL to avoid uaf access. We found an uaf issue while using stress-ng testing, need to run testcase getdent and tun in the same time. The steps of the issue is as follows: 1) use getdent to traverse dir /proc/pid/net/dev_snmp6/, and current pde is tun3; 2) in the [time windows] unregister netdevice tun3 and tun2, and erase them from rbtree. erase tun3 first, and then erase tun2. the pde(tun2) will be released to slab; 3) continue to getdent process, then pde_subdir_next() will return pde(tun2) which is released, it will case uaf access. CPU 0 | CPU 1 ------------------------------------------------------------------------- traverse dir /proc/pid/net/dev_snmp6/ | unregister_netdevice(tun->dev) //tun3 tun2 sys_getdents64() | iterate_dir() | proc_readdir() | proc_readdir_de() | snmp6_unregister_dev() pde_get(de); | proc_remove() read_unlock(&proc_subdir_lock); | remove_proc_subtree() | write_lock(&proc_subdir_lock); [time window] | rb_erase(&root->subdir_node, &parent->subdir); | write_unlock(&proc_subdir_lock); read_lock(&proc_subdir_lock); | next = pde_subdir_next(de); | pde_put(de); | de = next; //UAF | rbtree of dev_snmp6 | pde(tun3) / \ NULL pde(tun2) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: wangzijie <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Patch series "Fix SIGBUS semantics with large folios", v3. Accessing memory within a VMA, but beyond i_size rounded up to the next page size, is supposed to generate SIGBUS. Darrick reported[1] an xfstests regression in v6.18-rc1. generic/749 failed due to missing SIGBUS. This was caused by my recent changes that try to fault in the whole folio where possible: 19773df ("mm/fault: try to map the entire file folio in finish_fault()") 357b927 ("mm/filemap: map entire large folio faultaround") These changes did not consider i_size when setting up PTEs, leading to xfstest breakage. However, the problem has been present in the kernel for a long time - since huge tmpfs was introduced in 2016. The kernel happily maps PMD-sized folios as PMD without checking i_size. And huge=always tmpfs allocates PMD-size folios on any writes. I considered this corner case when I implemented a large tmpfs, and my conclusion was that no one in their right mind should rely on receiving a SIGBUS signal when accessing beyond i_size. I cannot imagine how it could be useful for the workload. But apparently filesystem folks care a lot about preserving strict SIGBUS semantics. Generic/749 was introduced last year with reference to POSIX, but no real workloads were mentioned. It also acknowledged the tmpfs deviation from the test case. POSIX indeed says[3]: References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. The patchset fixes the regression introduced by recent changes as well as more subtle SIGBUS breakage due to split failure on truncation. This patch (of 2): Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are supposed to generate SIGBUS. Recent changes attempted to fault in full folio where possible. They did not respect i_size, which led to populating PTEs beyond i_size and breaking SIGBUS semantics. Darrick reported generic/749 breakage because of this. However, the problem existed before the recent changes. With huge=always tmpfs, any write to a file leads to PMD-size allocation. Following the fault-in of the folio will install PMD mapping regardless of i_size. Fix filemap_map_pages() and finish_fault() to not install: - PTEs beyond i_size; - PMD mappings across i_size; Make an exception for shmem/tmpfs that for long time intentionally mapped with PMDs across i_size. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kiryl Shutsemau <[email protected]> Fixes: 6795801 ("xfs: Support large folios") Reported-by: "Darrick J. Wong" <[email protected]> Cc: Al Viro <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Dave Chinner <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are supposed to generate SIGBUS. This behavior might not be respected on truncation. During truncation, the kernel splits a large folio in order to reclaim memory. As a side effect, it unmaps the folio and destroys PMD mappings of the folio. The folio will be refaulted as PTEs and SIGBUS semantics are preserved. However, if the split fails, PMD mappings are preserved and the user will not receive SIGBUS on any accesses within the PMD. Unmap the folio on split failure. It will lead to refault as PTEs and preserve SIGBUS semantics. Make an exception for shmem/tmpfs that for long time intentionally mapped with PMDs across i_size. Link: https://lkml.kernel.org/r/[email protected] Fixes: b9a8a41 ("truncate,shmem: Handle truncates that split large folios") Signed-off-by: Kiryl Shutsemau <[email protected]> Cc: Al Viro <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Christian Brauner <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

When emitting the order of the allocation for a hash table, alloc_large_system_hash() unconditionally subtracts PAGE_SHIFT from log base 2 of the allocation size. This is not correct if the allocation size is smaller than a page, and yields a negative value for the order as seen below: TCP established hash table entries: 32 (order: -4, 256 bytes, linear) TCP bind hash table entries: 32 (order: -2, 1024 bytes, linear) Use get_order() to compute the order when emitting the hash table information to correctly handle cases where the allocation size is smaller than a page: TCP established hash table entries: 32 (order: 0, 256 bytes, linear) TCP bind hash table entries: 32 (order: 0, 1024 bytes, linear) Link: https://lkml.kernel.org/r/[email protected] Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: Isaac J. Manjarres <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Using gcov on kernels compiled with GCC 15 results in truncated 16-byte long .gcda files with no usable data. To fix this, update GCOV_COUNTERS to match the value defined by GCC 15. Tested with GCC 14.3.0 and GCC 15.2.0. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Oberparleiter <[email protected]> Reported-by: Matthieu Baerts <[email protected]> Closes: linux-test-project/lcov#445 Tested-by: Matthieu Baerts <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Currently mremap folio pte batch ignores the writable bit during figuring out a set of similar ptes mapping the same folio. Suppose that the first pte of the batch is writable while the others are not - set_ptes will end up setting the writable bit on the other ptes, which is a violation of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable bit while determining the pte batch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dev Jain <[email protected]> Fixes: f822a9a ("mm: optimize mremap() by PTE batching") Reported-by: David Hildenbrand <[email protected]> Debugged-by: David Hildenbrand <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Pedro Falcato <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Cc: Barry Song <[email protected]> Cc: Jann Horn <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> [6.17+] Signed-off-by: Andrew Morton <[email protected]>

…or slabobj_ext When alloc_slab_obj_exts() fails and then later succeeds in allocating a slab extension vector, it calls handle_failed_objexts_alloc() to mark all objects in the vector as empty. As a result all objects in this slab (slabA) will have their extensions set to CODETAG_EMPTY. Later on if this slabA is used to allocate a slabobj_ext vector for another slab (slabB), we end up with the slabB->obj_exts pointing to a slabobj_ext vector that itself has a non-NULL slabobj_ext equal to CODETAG_EMPTY. When slabB gets freed, free_slab_obj_exts() is called to free slabB->obj_exts vector. free_slab_obj_exts() calls mark_objexts_empty(slabB->obj_exts) which will generate a warning because it expects slabobj_ext vectors to have a NULL obj_ext, not CODETAG_EMPTY. Modify mark_objexts_empty() to skip the warning and setting the obj_ext value if it's already set to CODETAG_EMPTY. To quickly detect this WARN, I modified the code from WARN_ON(slab_exts[offs].ref.ct) to BUG_ON(slab_exts[offs].ref.ct == 1); We then obtained this message: [21630.898561] ------------[ cut here ]------------ [21630.898596] kernel BUG at mm/slub.c:2050! [21630.898611] Internal error: Oops - BUG: 00000000f2000800 [qualcomm-linux#1] SMP [21630.900372] Modules linked in: squashfs isofs vfio_iommu_type1 vhost_vsock vfio vhost_net vmw_vsock_virtio_transport_common vhost tap vhost_iotlb iommufd vsock binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu sr_mod cdrom drm_client_lib virtio_dma_buf drm_shmem_helper drm_kms_helper drm ghash_ce backlight virtio_net virtio_blk virtio_scsi net_failover virtio_console failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject] [21630.909177] CPU: 3 UID: 0 PID: 3787 Comm: kylin-process-m Kdump: loaded Tainted: G W 6.18.0-rc1+ qualcomm-linux#74 PREEMPT(voluntary) [21630.910495] Tainted: [W]=WARN [21630.910867] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 [21630.911625] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [21630.912392] pc : __free_slab+0x228/0x250 [21630.912868] lr : __free_slab+0x18c/0x250[21630.913334] sp : ffff8000a02f73e0 [21630.913830] x29: ffff8000a02f73e0 x28: fffffdffc43fc800 x27: ffff0000c0011c40 [21630.914677] x26: ffff0000c000cac0 x25: ffff00010fe5e5f0 x24: ffff000102199b40 [21630.915469] x23: 0000000000000003 x22: 0000000000000003 x21: ffff0000c0011c40 [21630.916259] x20: fffffdffc4086600 x19: fffffdffc43fc800 x18: 0000000000000000 [21630.917048] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [21630.917837] x14: 0000000000000000 x13: 0000000000000000 x12: ffff70001405ee66 [21630.918640] x11: 1ffff0001405ee65 x10: ffff70001405ee65 x9 : ffff800080a295dc [21630.919442] x8 : ffff8000a02f7330 x7 : 0000000000000000 x6 : 0000000000003000 [21630.920232] x5 : 0000000024924925 x4 : 0000000000000001 x3 : 0000000000000007 [21630.921021] x2 : 0000000000001b40 x1 : 000000000000001f x0 : 0000000000000001 [21630.921810] Call trace: [21630.922130] __free_slab+0x228/0x250 (P) [21630.922669] free_slab+0x38/0x118 [21630.923079] free_to_partial_list+0x1d4/0x340 [21630.923591] __slab_free+0x24c/0x348 [21630.924024] ___cache_free+0xf0/0x110 [21630.924468] qlist_free_all+0x78/0x130 [21630.924922] kasan_quarantine_reduce+0x114/0x148 [21630.925525] __kasan_slab_alloc+0x7c/0xb0 [21630.926006] kmem_cache_alloc_noprof+0x164/0x5c8 [21630.926699] __alloc_object+0x44/0x1f8 [21630.927153] __create_object+0x34/0xc8 [21630.927604] kmemleak_alloc+0xb8/0xd8 [21630.928052] kmem_cache_alloc_noprof+0x368/0x5c8 [21630.928606] getname_flags.part.0+0xa4/0x610 [21630.929112] getname_flags+0x80/0xd8 [21630.929557] vfs_fstatat+0xc8/0xe0 [21630.929975] __do_sys_newfstatat+0xa0/0x100 [21630.930469] __arm64_sys_newfstatat+0x90/0xd8 [21630.931046] invoke_syscall+0xd4/0x258 [21630.931685] el0_svc_common.constprop.0+0xb4/0x240 [21630.932467] do_el0_svc+0x48/0x68 [21630.932972] el0_svc+0x40/0xe0 [21630.933472] el0t_64_sync_handler+0xa0/0xe8 [21630.934151] el0t_64_sync+0x1ac/0x1b0 [21630.934923] Code: aa1803e0 97ffef2b a9446bf9 17ffff9c (d4210000) [21630.936461] SMP: stopping secondary CPUs [21630.939550] Starting crashdump kernel... [21630.940108] Bye! Link: https://lkml.kernel.org/r/[email protected] Fixes: 09c4656 ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations") Signed-off-by: Hao Ge <[email protected]> Reviewed-by: Suren Baghdasaryan <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: David Rientjes <[email protected]> Cc: gehao <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

maple_tree tracepoints contain pointers to function names. Such a pointer is saved when a tracepoint logs an event. There's no guarantee that it's still valid when the event is parsed later and the pointer is dereferenced. The kernel warns about these unsafe pointers. event 'ma_read' has unsafe pointer field 'fn' WARNING: kernel/trace/trace.c:3779 at ignore_event+0x1da/0x1e4 Mark the function names as tracepoint_string() to fix the events. One case that doesn't work without my patch would be trace-cmd record to save the binary ringbuffer and trace-cmd report to parse it in userspace. The address of __func__ can't be dereferenced from userspace but tracepoint_string will add an entry to /sys/kernel/tracing/printk_formats Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b ("Maple Tree: add new data structure") Signed-off-by: Martin Kaiser <[email protected]> Acked-by: Liam R. Howlett <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

…el/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes. All changes are device-specific, and nothing stands out. - A regression fix for HD-audio HDMI probe - USB-audio hardening patches for issues spotted by fuzzers - ASoC fixes for TAS278x, SoundWire and Cirrus - Usual HD-audio and USB-audio quirks" * tag 'sound-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Add native DSD quirks for PureAudio DAC series ASoC: rsnd: fix OF node reference leak in rsnd_ssiu_probe() ALSA: hda/tas2781: Correct the wrong project ID ALSA: usb-audio: Fix NULL pointer dereference in snd_usb_mixer_controls_badd ASoC: SDCA: bug fix while parsing mipi-sdca-control-cn-list ALSA: usb-audio: Fix potential overflow of PCM transfer buffer ALSA: hda/tas2781: Add new quirk for HP new projects ASoC: tas2781: fix getting the wrong device number ASoC: codecs: va-macro: fix resource leak in probe error path ASoC: tas2783A: Fix issues in firmware parsing ASoC: sdw_utils: fix device reference leak in is_sdca_endpoint_present() ASoC: cs4271: Fix regulator leak on probe failure ALSA: hda/hdmi: Fix breakage at probing nvhdmi-mcp driver ASoC: da7213: Use component driver suspend/resume ALSA: usb-audio: add min_mute quirk for SteelSeries Arctis ASoC: doc: cs35l56: Update firmware filename description for B0 silicon

…inux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "One simple fix for a GPIO descriptor leak in the probe error handling for the fixed regulator" * tag 'regulator-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: fixed: fix GPIO descriptor leak on register failure

…ernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few standard fixes here, plus one more interesting one from Hans which addresses an issue where a move in when we requested GPIOs on ACPI systems caused us to stop doing pinmuxing and leave things floating that we'd really rather not have floating" * tag 'spi-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: Add TODO comment about ACPI GPIO setup spi: xilinx: increase number of retries before declaring stall spi: imx: keep dma request disabled before dma transfer setup spi: Try to get ACPI GPIO IRQ earlier

…kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: - Fix incorrect device handle check for Generic Initiator - Fix offset calculation for extended linear cache poison injection - Fix lockdep warning for hmem_register_resource() * tag 'cxl-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: acpi/hmat: Fix lockdep warning for hmem_register_resource() cxl: Adjust offset calculation for poison injection acpi,srat: Fix incorrect device handle check for Generic Initiator

…kernel/git/ulfh/linux-pm Pull pmdomain fixes from Ulf Hansson: - imx: Fix reference count leak in ->remove() - samsung: Rework legacy splash-screen handover workaround - samsung: Fix potential memleak during ->probe() - arm: Fix genpd leak on provider registration failure for scmi * tag 'pmdomain-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: imx: Fix reference count leak in imx_gpc_remove pmdomain: samsung: Rework legacy splash-screen handover workaround pmdomain: arm: scmi: Fix genpd leak on provider registration failure pmdomain: samsung: plug potential memleak during probe

…l/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - dw_mmc-rockchip: Fix internal phase calculation - pxamci: Simplify and fix ->probe() error handling - sdhci-of-dwcmshc: Fix strbin signal delay - wmt-sdmmc: Fix compile test default * tag 'mmc-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: dw_mmc-rockchip: Fix wrong internal phase calculate mmc: pxamci: Simplify pxamci_probe() error handling using devm APIs mmc: sdhci-of-dwcmshc: Change DLL_STRBIN_TAPNUM_DEFAULT to 0x4 mmc: wmt-sdmmc: fix compile test default

…m/kernel Pull drm fixes from Dave Airlie: "Weekly fixes, amdgpu and vmwgfx making up the most of it, along with panthor and i915/xe. Seems about right for this time of development, nothing major outstanding. client: - Fix description of module parameter panthor: - Flush writes before mapping buffers vmwgfx: - Improve command validation - Improve ref counting - Fix cursor-plane support amdgpu: - Disallow P2P DMA for GC 12 DCC surfaces - ctx error handling fix - UserQ fixes - VRR fix - ISP fix - JPEG 5.0.1 fix amdkfd: - Save area check fix - Fix GPU mappings for APU after prefetch i915: - Fix PSR's pipe to vblank conversion - Disable Panel Replay on MST links xe: - New HW workarounds affecting PTL and WCL platforms * tag 'drm-fixes-2025-11-15' of https://gitlab.freedesktop.org/drm/kernel: drm/client: fix MODULE_PARM_DESC string for "active" drm/i915/dp_mst: Disable Panel Replay drm/amdkfd: Fix GPU mappings for APU after prefetch drm/amdkfd: relax checks for over allocation of save area drm/amdgpu/jpeg: Add parse_cs for JPEG5_0_1 drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BO drm/amd/display: Allow VRR params change if unsynced with the stream drm/amdgpu: fix lock warning in amdgpu_userq_fence_driver_process drm/amdgpu: jump to the correct label on failure drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces drm/xe/xe3lpg: Extend Wa_15016589081 for xe3lpg drm/xe/xe3: Extend wa_14023061436 drm/xe/xe3: Add WA_14024681466 for Xe3_LPG drm/i915/psr: fix pipe to vblank conversion drm/panthor: Flush shmem writes before mapping buffers CPU-uncached drm/vmwgfx: Restore Guest-Backed only cursor plane support drm/vmwgfx: Use kref in vmw_bo_dirty drm/vmwgfx: Validate command header size against SVGA_CMD_MAX_DATASIZE

…inux-nfs Pull NFS client fixes from Anna Schumaker: - Various fixes when using NFS with TLS - Localio direct-IO fixes - Fix error handling in nfs_atomic_open_v23() - Fix sysfs memory leak when nfs_client kobject add fails - Fix an incorrect parameter when calling nfs4_call_sync() - Fix a failing LTP test when using delegated timestamps * tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix LTP test failures when timestamps are delegated NFSv4: Fix an incorrect parameter when calling nfs4_call_sync() NFS: sysfs: fix leak when nfs_client kobject add fails NFSv2/v3: Fix error handling in nfs_atomic_open_v23() nfs/localio: do not issue misaligned DIO out-of-order nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion nfs/localio: backfill missing partial read support for misaligned DIO nfs/localio: add refcounting for each iocb IO associated with NFS pgio header nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support NFS: Check the TLS certificate fields in nfs_match_client() pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect() pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect()

…ernel/git/ojeda/linux Pull Rust fix from Miguel Ojeda: - Fix a Rust 1.91.0 build issue due to 'bindings.o' not containing DWARF debug information anymore by teaching gendwarfksyms to skip object files without exports * tag 'rust-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: gendwarfksyms: Skip files with no exports

…t/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix interaction between livepatch and BPF fexit programs (Song Liu) With Steven and Masami acks. - Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa) With Steven and Masami acks. - Fix out of bounds access in widen_imprecise_scalars() in the verifier (Eduard Zingerman) - Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen) - Fix net_sched storage collision with BPF data_meta/data_end (Eric Dumazet) - Add _impl suffix to BPF kfuncs with implicit args to avoid breaking them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test widen_imprecise_scalars() with different stack depth bpf: account for current allocated stack depth in widen_imprecise_scalars() bpf: Add bpf_prog_run_data_pointers() selftests/bpf: Add mptcp test with sockmap mptcp: Fix proto fallback detection with BPF mptcp: Disallow MPTCP subflows from sockmap selftests/bpf: Add stacktrace ips test for raw_tp selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()" bpf: add _impl suffix for bpf_stream_vprintk() kfunc bpf:add _impl suffix for bpf_task_work_schedule* kfuncs selftests/bpf: Add tests for livepatch + bpf trampoline ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct() ftrace: Fix BPF fexit with livepatch

…ernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Cache the ASPM L0s/L1 Supported bits early so quirks can override them if necessary (Bjorn Helgaas) - Add quirks for PA Semi and Freescale Root Ports and a HiSilicon Wi-Fi device that are reported to have broken L0s and L1 (Shawn Lin, Bjorn Helgaas) * tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports PCI/ASPM: Convert quirks to override advertised link states PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden

…nux/kernel/git/tip/tip Pull core fix from Ingo Molnar: "Fix a broken #ifndef in the <linux/entry-virt.h> header. It hasn't caused problems upstream yet because no arch overrides arch_xfer_to_guest_mode_handle_work() at this moment" * tag 'core-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Fix ifndef around arch_xfer_to_guest_mode_handle_work() stub

…ux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix an irqchip driver release bug in the riscv-intc irqchip driver" * tag 'irq-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/riscv-intc: Add missing free() callback in riscv_intc_domain_ops

…linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a memory leak in the posix timer creation logic" * tag 'timers-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-timers: Plug potential memory leak in do_timer_create()

…ux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Update the list of AMD microcode minimum Entrysign revisions - Add additional fixed AMD RDSEED microcode revisions - Update the language transliteration for Kiryl Shutsemau's name in the MAINTAINERS entry * tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev x86/CPU/AMD: Add additional fixed RDSEED microcode revisions MAINTAINERS: Update name spelling

…git/s390/linux Pull s390 fix from Heiko Carstens: - Fix a bug in the __ptep_rdp() inline assembly which may lead to missing TLB flushes * tag 's390-6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: Fix __ptep_rdp() inline assembly

In the past, CONFIG_ARCH_HAS_GIGANTIC_PAGE indicated that we support runtime allocation of gigantic hugetlb folios. In the meantime it evolved into a generic way for the architecture to state that it supports gigantic hugetlb folios. In commit fae7d83 ("mm: add __dump_folio()") we started using CONFIG_ARCH_HAS_GIGANTIC_PAGE to decide MAX_FOLIO_ORDER: whether we could have folios larger than what the buddy can handle. In the context of that commit, we started using MAX_FOLIO_ORDER to detect page corruptions when dumping tail pages of folios. Before that commit, we assumed that we cannot have folios larger than the highest buddy order, which was obviously wrong. In commit 7b4f21f ("mm/hugetlb: check for unreasonable folio sizes when registering hstate"), we used MAX_FOLIO_ORDER to detect inconsistencies, and in fact, we found some now. Powerpc allows for configs that can allocate gigantic folio during boot (not at runtime), that do not set CONFIG_ARCH_HAS_GIGANTIC_PAGE and can exceed PUD_ORDER. To fix it, let's make powerpc select CONFIG_ARCH_HAS_GIGANTIC_PAGE with hugetlb on powerpc, and increase the maximum folio size with hugetlb to 16 GiB on 64bit (possible on arm64 and powerpc) and 1 GiB on 32 bit (powerpc). Note that on some powerpc configurations, whether we actually have gigantic pages depends on the setting of CONFIG_ARCH_FORCE_MAX_ORDER, but there is nothing really problematic about setting it unconditionally: we just try to keep the value small so we can better detect problems in __dump_folio() and inconsistencies around the expected largest folio in the system. Ideally, we'd have a better way to obtain the maximum hugetlb folio size and detect ourselves whether we really end up with gigantic folios. Let's defer bigger changes and fix the warnings first. While at it, handle gigantic DAX folios more clearly: DAX can only end up creating gigantic folios with HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD. Add a new Kconfig option HAVE_GIGANTIC_FOLIOS to make both cases clearer. In particular, worry about ARCH_HAS_GIGANTIC_PAGE only with HUGETLB_PAGE. Note: with enabling CONFIG_ARCH_HAS_GIGANTIC_PAGE on powerpc, we will now also allow for runtime allocations of folios in some more powerpc configs. I don't think this is a problem, but if it is we could handle it through __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED. While __dump_page()/__dump_folio was also problematic (not handling dumping of tail pages of such gigantic folios correctly), it doesn't seem critical enough to mark it as a fix. Link: https://lkml.kernel.org/r/[email protected] Fixes: 7b4f21f ("mm/hugetlb: check for unreasonable folio sizes when registering hstate") Reported-by: Christophe Leroy <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Reported-by: Sourabh Jain <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Signed-off-by: David Hildenbrand (Red Hat) <[email protected]> Cc: Ritesh Harjani (IBM) <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Donet Tom <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: "Liam R. Howlett" <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

When crashkernel is configured with a high reservation, shrinking its value below the low crashkernel reservation causes two issues: 1. Invalid crashkernel resource objects 2. Kernel crash if crashkernel shrinking is done twice For example, with crashkernel=200M,high, the kernel reserves 200MB of high memory and some default low memory (say 256MB). The reservation appears as: cat /proc/iomem | grep -i crash af000000-beffffff : Crash kernel 433000000-43f7fffff : Crash kernel If crashkernel is then shrunk to 50MB (echo 52428800 > /sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved: af000000-beffffff : Crash kernel Instead, it should show 50MB: af000000-b21fffff : Crash kernel Further shrinking crashkernel to 40MB causes a kernel crash with the following trace (x86): BUG: kernel NULL pointer dereference, address: 0000000000000038 PGD 0 P4D 0 Oops: 0000 [qualcomm-linux#1] PREEMPT SMP NOPTI <snip...> Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? __release_resource+0xd/0xb0 release_resource+0x26/0x40 __crash_shrink_memory+0xe5/0x110 crash_shrink_memory+0x12a/0x190 kexec_crash_size_store+0x41/0x80 kernfs_fop_write_iter+0x141/0x1f0 vfs_write+0x294/0x460 ksys_write+0x6d/0xf0 <snip...> This happens because __crash_shrink_memory()/kernel/crash_core.c incorrectly updates the crashk_res resource object even when crashk_low_res should be updated. Fix this by ensuring the correct crashkernel resource object is updated when shrinking crashkernel memory. Link: https://lkml.kernel.org/r/[email protected] Fixes: 16c6006 ("kexec: enable kexec_crash_size to support two crash kernel regions") Signed-off-by: Sourabh Jain <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Zhen Lei <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Switch to kernel.org email address as I will be leaving Red Hat. The old address will remain active until end of January 2026, so performing the change now should make sure that most mails will reach me. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: David Hildenbrand (Red Hat) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Both uniform and non uniform split check missed the check to prevent splitting anon folios in swapcache to non-zero order. Splitting anon folios in swapcache to non-zero order can cause data corruption since swapcache only support PMD order and order-0 entries. This can happen when one use split_huge_pages under debugfs to split anon folios in swapcache. In-tree callers do not perform such an illegal operation. Only debugfs interface could trigger it. I will put adding a test case on my TODO list. Fix the check. Link: https://lkml.kernel.org/r/[email protected] Fixes: 58729c0 ("mm/huge_memory: add buddy allocator like (non-uniform) folio_split()") Signed-off-by: Zi Yan <[email protected]> Reported-by: "David Hildenbrand (Red Hat)" <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Acked-by: David Hildenbrand (Red Hat) <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Barry Song <[email protected]> Cc: Dev Jain <[email protected]> Cc: Lance Yang <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Nico Pache <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Wei Yang <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

We must check whether KHO is enabled prior to issuing KHO commands, otherwise KHO internal data structures are not initialized. Link: https://lkml.kernel.org/r/[email protected] Fixes: b753522 ("kho: add test for kexec handover") Signed-off-by: Pasha Tatashin <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Reviewed-by: Pratyush Yadav <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Cc: Alexander Graf <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

… perf_test Accessing 'reg.write_index' directly triggers a -Waddress-of-packed-member warning due to potential unaligned pointer access: perf_test.c:239:38: warning: taking address of packed member 'write_index' of class or structure 'user_reg' may result in an unaligned pointer value [-Waddress-of-packed-member] 239 | ASSERT_NE(-1, write(self->data_fd, &reg.write_index, | ^~~~~~~~~~~~~~~ Since write(2) works with any alignment. Casting '&reg.write_index' explicitly to 'void *' to suppress this warning. Link: https://lkml.kernel.org/r/[email protected] Fixes: 42187bd ("selftests/user_events: Add perf self-test for empty arguments events") Signed-off-by: Ankit Khushwaha <[email protected]> Cc: Beau Belgrave <[email protected]> Cc: "Masami Hiramatsu (Google)" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: sunliming <[email protected]> Cc: Wei Yang <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Since commit 78524b0 ("mm, swap: avoid redundant swap device pinning"), the common helper for allocating and preparing a folio in the swap cache layer no longer tries to get a swap device reference internally, because all callers of __read_swap_cache_async are already holding a swap entry reference. The repeated swap device pinning isn't needed on the same swap device. Caller of VMA readahead is also holding a reference to the target entry's swap device, but VMA readahead walks the page table, so it might encounter swap entries from other devices, and call __read_swap_cache_async on another device without holding a reference to it. So it is possible to cause a UAF when swapoff of device A raced with swapin on device B, and VMA readahead tries to read swap entries from device A. It's not easy to trigger, but in theory, it could cause real issues. Make VMA readahead try to get the device reference first if the swap device is a different one from the target entry. Link: https://lkml.kernel.org/r/[email protected] Fixes: 78524b0 ("mm, swap: avoid redundant swap device pinning") Suggested-by: Huang Ying <[email protected]> Signed-off-by: Kairui Song <[email protected]> Acked-by: Chris Li <[email protected]> Cc: Baoquan He <[email protected]> Cc: Barry Song <[email protected]> Cc: Kemeng Shi <[email protected]> Cc: Nhat Pham <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

The generation field of topology map is updated after initialized by zero. The updated value of generation field is always zero, and is against specification. This commit fixes the bug. Fixes: 7d138cb ("firewire: core: use spin lock specific to topology map") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>

…/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - In Versalnet, handle the reporting of non-standard hw errors whose information can come in more than one remote processor message. - Explicitly reenable ECC checking after a warm reset in Altera OCRAM as those registers are reset to default otherwise - Fix single-bit error injection in Altera EDAC to not inject errors directly in ECC RAM and thus lead to false double-bit errors due to same ECC RAM being in concurrent use * tag 'edac_urgent_for_v6.18_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/altera: Use INTTEST register for Ethernet and USB SBE injection EDAC/altera: Handle OCRAM ECC enable after warm reset EDAC/versalnet: Handle split messages for non-standard errors

…inux/kernel/git/ieee1394/linux1394 Pull firewire fixes from Takashi Sakamoto: "This includes some fixes for the topology map, newly introduced in v6.18 kernel" * tag 'firewire-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: core: fix to update generation field in topology map firewire: core: Initialize topology_map.lock

…rg/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "7 hotfixes. 5 are cc:stable, 4 are against mm/ All are singletons - please see the respective changelogs for details" * tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm, swap: fix potential UAF issue for VMA readahead selftests/user_events: fix type cast for write_index packed member in perf_test lib/test_kho: check if KHO is enabled mm/huge_memory: fix folio split check for anon folios in swapcache MAINTAINERS: update David Hildenbrand's email address crash: fix crashkernel resource shrink mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb

…el.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix writing bpf_prog (infos|btfs)_cnt to data file, to not generate invalid perf.data files in some corner cases. - Fix 'perf top' segfault by ensuring libbfd is initialized. This is an opt-in feature due to license incompatibilities. - Fix segfault in 'perf lock' due to missing kernel map. - Fix 'perf lock contention' test. - Don't fail fast path detection if binutils-devel isn't available. - Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason. * tag 'perf-tools-fixes-for-v6.18-2-2025-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf libbfd: Ensure libbfd is initialized prior to use perf test: Fix lock contention test perf lock: Fix segfault due to missing kernel map tools headers UAPI: Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason perf build: Don't fail fast path feature detection when binutils-devel is not available perf header: Write bpf_prog (infos|btfs)_cnt to data file

shashim-quic · 2025-11-27T13:52:35Z

For rebase you dont need to raise a PR for topic branch you maintain. You can simply rebase in your local branch and push it to repo.

chenhuacai and others added 30 commits November 10, 2025 08:37

torvalds and others added 29 commits November 14, 2025 12:50

Linux 6.18-rc6

6a23ae0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rebase tech/net/ath to baseline #323

Rebase tech/net/ath to baseline #323

Uh oh!

miaoqing-quic commented Nov 19, 2025

Uh oh!

shashim-quic commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

106 participants

Rebase tech/net/ath to baseline #323

Are you sure you want to change the base?

Rebase tech/net/ath to baseline #323

Uh oh!

Conversation

miaoqing-quic commented Nov 19, 2025

Uh oh!

shashim-quic commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

106 participants