-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Labels
c/autoscaling/neonvmComponent: autoscaling: NeonVMComponent: autoscaling: NeonVMmigrated_to_jirat/bugIssue Type: BugIssue Type: Bug
Description
Environment
Production
Context
We recently upgraded from kernel version 6.6.64 to 6.12.26 in production (ref #1376).
Since then, we've seen a very small (~1 in a million) rate of our VMs hitting a bug that looks roughly like:
- We try to hotplug more memory with virtio-mem
- There's an allocation failure while setting up the physical pagetables
- That allocation failure isn't handled, and we end up dereferencing the null pointer
See this slack thread for more: https://neondb.slack.com/archives/C0807C9SSJ2/p1748363834798349
Impact
- The rate of occurrence is very small
- When VMs hit this, the kworker responsible for handling virtio-mem operations exits and does not restart — i.e.
So overall impact is pretty small: the VMs continue operating normally, with memory scaling broken.
(notably in contrast to the kcompactd issue, where the VM will eventually fall over)
Example stack trace(s)
Here's an example of what we saw for a particular VM:
Allocation failure
[ 1259.521867] virtio_mem virtio1: plugged size: 0x0
[ 1259.521939] virtio_mem virtio1: requested size: 0x40000000
[ 1259.534610] kworker/0:2: page allocation failure: order:0, mode:0x920(GFP_ATOMIC|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[ 1259.534738] CPU: 0 UID: 0 PID: 140 Comm: kworker/0:2 Not tainted 6.12.26 #1
[ 1259.534740] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 1259.534742] Workqueue: events_freezable virtio_mem_run_wq
[ 1259.534748] Call Trace:
[ 1259.534750] <TASK>
[ 1259.534751] dump_stack_lvl+0x5b/0x70
[ 1259.534755] dump_stack+0x10/0x20
[ 1259.534756] warn_alloc+0x103/0x180
[ 1259.534760] __alloc_pages_slowpath.constprop.0+0x738/0xf30
[ 1259.534763] __alloc_pages_noprof+0x1e9/0x340
[ 1259.534765] alloc_pages_mpol_noprof+0x47/0x100
[ 1259.534767] alloc_pages_noprof+0x4b/0x80
[ 1259.534768] get_free_pages_noprof+0xc/0x40
[ 1259.534770] alloc_low_pages+0xc2/0x150
[ 1259.534772] phys_pud_init+0x82/0x390
[ 1259.534775] phys_p4d_init+0x93/0x330
[ 1259.534777] __kernel_physical_mapping_init+0xa1/0x370
[ 1259.534778] kernel_physical_mapping_init+0xf/0x20
[ 1259.534780] init_memory_mapping+0x1fa/0x430
[ 1259.534781] arch_add_memory+0x2b/0x50
[ 1259.534783] add_memory_resource+0xe6/0x260
[ 1259.534785] add_memory_driver_managed+0x78/0xc0
[ 1259.534787] virtio_mem_add_memory+0x46/0xc0
[ 1259.534789] virtio_mem_sbm_plug_and_add_mb+0xa3/0x160
[ 1259.534791] virtio_mem_run_wq+0x1035/0x16c0
[ 1259.534792] process_one_work+0x17a/0x3c0
[ 1259.534795] worker_thread+0x2c5/0x3f0
[ 1259.534797] ? _raw_spin_unlock_irqrestore+0x9/0x30
[ 1259.534799] ? __pfx_worker_thread+0x10/0x10
[ 1259.534801] kthread+0xdc/0x110
[ 1259.534804] ? __pfx_kthread+0x10/0x10
[ 1259.534805] ret_from_fork+0x35/0x60
[ 1259.534810] ? __pfx_kthread+0x10/0x10
[ 1259.534811] ret_from_fork_asm+0x1a/0x30
[ 1259.534814] </TASK>
[ 1259.534814] Mem-Info:
[ 1259.536035] active_anon:23991 inactive_anon:85286 isolated_anon:0
[ 1259.536035] active_file:23055 inactive_file:79009 isolated_file:0
[ 1259.536035] unevictable:0 dirty:5821 writeback:0
[ 1259.536035] slab_reclaimable:4649 slab_unreclaimable:4807
[ 1259.536035] mapped:87717 shmem:74808 pagetables:3067
[ 1259.536035] sec_pagetables:0 bounce:0
[ 1259.536035] kernel_misc_reclaimable:0
[ 1259.536035] free:226 free_pcp:2199 free_cma:0
[ 1259.536323] Node 0 active_anon:95964kB inactive_anon:341144kB active_file:92220kB inactive_file:316036kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:350868kB dirty:23284kB writeback:0kB shmem:299232kB writeback_tmp:0kB kernel_stack:2828kB pagetables:12268kB sec_pagetables:0kB all_unreclaimable? no
[ 1259.536526] Node 0 DMA32 free:904kB boost:8676kB min:12532kB low:13496kB high:14460kB reserved_highatomic:2048KB active_anon:95964kB inactive_anon:341144kB active_file:92220kB inactive_file:316036kB unevictable:0kB writepending:23284kB present:1047984kB managed:936648kB mlocked:0kB bounce:0kB free_pcp:8796kB local_pcp:8796kB free_cma:0kB
[ 1259.536748] lowmem_reserve[]: 0 0 0
[ 1259.536788] Node 0 DMA32: 138*4kB (M) 13*8kB (H) 0*16kB 1*32kB (H) 1*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 880kB
[ 1259.536902] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1259.536981] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1259.537060] 176887 total pagecache pages
[ 1259.537101] 0 pages in swap cache
[ 1259.537141] Free swap = 16775916kB
[ 1259.537183] Total swap = 16777212kB
[ 1259.537224] 261996 pages RAM
[ 1259.537264] 0 pages HighMem/MovableOnly
[ 1259.537305] 27834 pages reserved
Null pointer dereference (immediately afterwards)
[ 1259.537348] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1259.537404] #PF: supervisor read access in kernel mode
[ 1259.537449] #PF: error_code(0x0000) - not-present page
[ 1259.537496] PGD 423b067 P4D 0
[ 1259.537538] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1259.537587] CPU: 0 UID: 0 PID: 140 Comm: kworker/0:2 Not tainted 6.12.26 #1
[ 1259.537647] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 1259.537734] Workqueue: events_freezable virtio_mem_run_wq
[ 1259.537784] RIP: 0010:phys_pmd_init+0xf0/0x3a0
[ 1259.537834] Code: 49 c1 e9 12 48 81 e7 00 00 e0 ff 48 8b 4d d0 4c 8d af 00 00 20 00 41 81 e1 f8 0f 00 00 4d 39 fe 4a 8d 1c 08 0f 83 76 01 00 00 <48> 8b 03 48 a9 9f ff ff ff 0f 85 48 ff ff ff f6 45 b8 04 0f 84 e0
[ 1259.537969] RSP: 0018:ff689bbc001bfa10 EFLAGS: 00010287
[ 1259.538007] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 8000000000000163
[ 1259.538064] RDX: 0000000108000000 RSI: 0000000000000000 RDI: 0000000100000000
[ 1259.538122] RBP: ff689bbc001bfa70 R08: 8000000000000163 R09: 0000000000000000
[ 1259.538179] R10: 000000000000000a R11: ffffffff870ce008 R12: 0000000000000000
[ 1259.538237] R13: 0000000100200000 R14: 0000000100000000 R15: 0000000108000000
[ 1259.538294] FS: 0000000000000000(0000) GS:ff22bd333e800000(0000) knlGS:0000000000000000
[ 1259.538366] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1259.538423] CR2: 0000000000000000 CR3: 0000000002594001 CR4: 0000000000371eb0
[ 1259.538483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1259.538551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1259.538609] Call Trace:
[ 1259.538628] <TASK>
[ 1259.538648] phys_pud_init+0xa0/0x390
[ 1259.538688] phys_p4d_init+0x93/0x330
[ 1259.538717] __kernel_physical_mapping_init+0xa1/0x370
[ 1259.538768] kernel_physical_mapping_init+0xf/0x20
[ 1259.538818] init_memory_mapping+0x1fa/0x430
[ 1259.538868] arch_add_memory+0x2b/0x50
[ 1259.538908] add_memory_resource+0xe6/0x260
[ 1259.538949] add_memory_driver_managed+0x78/0xc0
[ 1259.538999] virtio_mem_add_memory+0x46/0xc0
[ 1259.539038] virtio_mem_sbm_plug_and_add_mb+0xa3/0x160
[ 1259.539088] virtio_mem_run_wq+0x1035/0x16c0
[ 1259.539138] process_one_work+0x17a/0x3c0
[ 1259.539166] worker_thread+0x2c5/0x3f0
[ 1259.539196] ? _raw_spin_unlock_irqrestore+0x9/0x30
[ 1259.539245] ? __pfx_worker_thread+0x10/0x10
[ 1259.539295] kthread+0xdc/0x110
[ 1259.539336] ? __pfx_kthread+0x10/0x10
[ 1259.539377] ret_from_fork+0x35/0x60
[ 1259.539418] ? __pfx_kthread+0x10/0x10
[ 1259.539459] ret_from_fork_asm+0x1a/0x30
[ 1259.539500] </TASK>
[ 1259.539519] Modules linked in:
[ 1259.539549] CR2: 0000000000000000
[ 1259.539578] ---[ end trace 0000000000000000 ]---
[ 1259.539627] RIP: 0010:phys_pmd_init+0xf0/0x3a0
[ 1259.539678] Code: 49 c1 e9 12 48 81 e7 00 00 e0 ff 48 8b 4d d0 4c 8d af 00 00 20 00 41 81 e1 f8 0f 00 00 4d 39 fe 4a 8d 1c 08 0f 83 76 01 00 00 <48> 8b 03 48 a9 9f ff ff ff 0f 85 48 ff ff ff f6 45 b8 04 0f 84 e0
[ 1259.539813] RSP: 0018:ff689bbc001bfa10 EFLAGS: 00010287
[ 1259.539851] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 8000000000000163
[ 1259.539909] RDX: 0000000108000000 RSI: 0000000000000000 RDI: 0000000100000000
[ 1259.539967] RBP: ff689bbc001bfa70 R08: 8000000000000163 R09: 0000000000000000
[ 1259.540025] R10: 000000000000000a R11: ffffffff870ce008 R12: 0000000000000000
[ 1259.540082] R13: 0000000100200000 R14: 0000000100000000 R15: 0000000108000000
[ 1259.540140] FS: 0000000000000000(0000) GS:ff22bd333e800000(0000) knlGS:0000000000000000
[ 1259.540198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1259.540257] CR2: 0000000000000000 CR3: 0000000002594001 CR4: 0000000000371eb0
[ 1259.540316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1259.540384] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1259.540442] note: kworker/0:2[140] exited with irqs disabled
This was on a kernel from autoscaling release v0.47.0.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
c/autoscaling/neonvmComponent: autoscaling: NeonVMComponent: autoscaling: NeonVMmigrated_to_jirat/bugIssue Type: BugIssue Type: Bug