Skip to content

Commit e9abe31

Browse files
rananta468oupton
authored andcommitted
KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
When a large VM, specifically one that holds a significant number of PTEs, gets abruptly destroyed, the following warning is seen during the page-table walk: sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV #3 NONE Tainted: [O]=OOT_MODULE Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x3c/0xb8 dump_stack+0x18/0x30 resched_latency_warn+0x7c/0x88 sched_tick+0x1c4/0x268 update_process_times+0xa8/0xd8 tick_nohz_handler+0xc8/0x168 __hrtimer_run_queues+0x11c/0x338 hrtimer_interrupt+0x104/0x308 arch_timer_handler_phys+0x40/0x58 handle_percpu_devid_irq+0x8c/0x1b0 generic_handle_domain_irq+0x48/0x78 gic_handle_irq+0x1b8/0x408 call_on_irq_stack+0x24/0x30 do_interrupt_handler+0x54/0x78 el1_interrupt+0x44/0x88 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x84/0x88 stage2_free_walker+0x30/0xa0 (P) __kvm_pgtable_walk+0x11c/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 kvm_pgtable_walk+0xc4/0x140 kvm_pgtable_stage2_destroy+0x5c/0xf0 kvm_free_stage2_pgd+0x6c/0xe8 kvm_uninit_stage2_mmu+0x24/0x48 kvm_arch_flush_shadow_all+0x80/0xa0 kvm_mmu_notifier_release+0x38/0x78 __mmu_notifier_release+0x15c/0x250 exit_mmap+0x68/0x400 __mmput+0x38/0x1c8 mmput+0x30/0x68 exit_mm+0xd4/0x198 do_exit+0x1a4/0xb00 do_group_exit+0x8c/0x120 get_signal+0x6d4/0x778 do_signal+0x90/0x718 do_notify_resume+0x70/0x170 el0_svc+0x74/0xd8 el0t_64_sync_handler+0x60/0xc8 el0t_64_sync+0x1b0/0x1b8 The warning is seen majorly on the host kernels that are configured not to force-preempt, such as CONFIG_PREEMPT_NONE=y. To avoid this, instead of walking the entire page-table in one go, split it into smaller ranges, by checking for cond_resched() between each range. Since the path is executed during VM destruction, after the page-table structure is unlinked from the KVM MMU, relying on cond_resched_rwlock_write() isn't necessary. Signed-off-by: Raghavendra Rao Ananta <[email protected]> Suggested-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
1 parent 0e89ca1 commit e9abe31

File tree

1 file changed

+25
-1
lines changed

1 file changed

+25
-1
lines changed

arch/arm64/kvm/mmu.c

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -904,11 +904,35 @@ static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
904904
return 0;
905905
}
906906

907+
/*
908+
* Assume that @pgt is valid and unlinked from the KVM MMU to free the
909+
* page-table without taking the kvm_mmu_lock and without performing any
910+
* TLB invalidations.
911+
*
912+
* Also, the range of addresses can be large enough to cause need_resched
913+
* warnings, for instance on CONFIG_PREEMPT_NONE kernels. Hence, invoke
914+
* cond_resched() periodically to prevent hogging the CPU for a long time
915+
* and schedule something else, if required.
916+
*/
917+
static void stage2_destroy_range(struct kvm_pgtable *pgt, phys_addr_t addr,
918+
phys_addr_t end)
919+
{
920+
u64 next;
921+
922+
do {
923+
next = stage2_range_addr_end(addr, end);
924+
KVM_PGT_FN(kvm_pgtable_stage2_destroy_range)(pgt, addr,
925+
next - addr);
926+
if (next != end)
927+
cond_resched();
928+
} while (addr = next, addr != end);
929+
}
930+
907931
static void kvm_stage2_destroy(struct kvm_pgtable *pgt)
908932
{
909933
unsigned int ia_bits = VTCR_EL2_IPA(pgt->mmu->vtcr);
910934

911-
KVM_PGT_FN(kvm_pgtable_stage2_destroy_range)(pgt, 0, BIT(ia_bits));
935+
stage2_destroy_range(pgt, 0, BIT(ia_bits));
912936
KVM_PGT_FN(kvm_pgtable_stage2_destroy_pgd)(pgt);
913937
}
914938

0 commit comments

Comments
 (0)