Skip to content

Commit aa8d1f4

Browse files
yanzhao56bonzini
authored andcommitted
KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
Introduce the quirk KVM_X86_QUIRK_SLOT_ZAP_ALL to allow users to select KVM's behavior when a memslot is moved or deleted for KVM_X86_DEFAULT_VM VMs. Make sure KVM behave as if the quirk is always disabled for non-KVM_X86_DEFAULT_VM VMs. The KVM_X86_QUIRK_SLOT_ZAP_ALL quirk offers two behavior options: - when enabled: Invalidate/zap all SPTEs ("zap-all"), - when disabled: Precisely zap only the leaf SPTEs within the range of the moving/deleting memory slot ("zap-slot-leafs-only"). "zap-all" is today's KVM behavior to work around a bug [1] where the changing the zapping behavior of memslot move/deletion would cause VM instability for VMs with an Nvidia GPU assigned; while "zap-slot-leafs-only" allows for more precise zapping of SPTEs within the memory slot range, improving performance in certain scenarios [2], and meeting the functional requirements for TDX. Previous attempts to select "zap-slot-leafs-only" include a per-VM capability approach [3] (which was not preferred because the root cause of the bug remained unidentified) and a per-memslot flag approach [4]. Sean and Paolo finally recommended the implementation of this quirk and explained that it's the least bad option [5]. By default, the quirk is enabled on KVM_X86_DEFAULT_VM VMs to use "zap-all". Users have the option to disable the quirk to select "zap-slot-leafs-only" for specific KVM_X86_DEFAULT_VM VMs that are unaffected by this bug. For non-KVM_X86_DEFAULT_VM VMs, the "zap-slot-leafs-only" behavior is always selected without user's opt-in, regardless of if the user opts for "zap-all". This is because it is assumed until proven otherwise that non- KVM_X86_DEFAULT_VM VMs will not be exposed to the bug [1], and most importantly, it's because TDX must have "zap-slot-leafs-only" always selected. In TDX's case a memslot's GPA range can be a mixture of "private" or "shared" memory. Shared is roughly analogous to how EPT is handled for normal VMs, but private GPAs need lots of special treatment: 1) "zap-all" would require to zap private root page or non-leaf entries or at least leaf-entries beyond the deleting memslot scope. However, TDX demands that the root page of the private page table remains unchanged, with leaf entries being zapped before non-leaf entries, and any dropped private guest pages must be re-accepted by the guest. 2) if "zap-all" zaps only shared page tables, it would result in private pages still being mapped when the memslot is gone. This may affect even other processes if later the gmem fd was whole punched, causing the pages being freed on the host while still mapped in the TD, because there's no pgoff to the gfn information to zap the private page table after memslot is gone. So, simply go "zap-slot-leafs-only" as if the quirk is always disabled for non-KVM_X86_DEFAULT_VM VMs to avoid manual opt-in for every VM type [6] or complicating quirk disabling interface (current quirk disabling interface is limited, no way to query quirks, or force them to be disabled). Add a new function kvm_mmu_zap_memslot_leafs() to implement "zap-slot-leafs-only". This function does not call kvm_unmap_gfn_range(), bypassing special handling to APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, as 1) The APIC_ACCESS_PAGE_PRIVATE_MEMSLOT cannot be created by users, nor can it be moved. It is only deleted by KVM when APICv is permanently inhibited. 2) kvm_vcpu_reload_apic_access_page() effectively does nothing when APIC_ACCESS_PAGE_PRIVATE_MEMSLOT is deleted. 3) Avoid making all cpus request of KVM_REQ_APIC_PAGE_RELOAD can save on costly IPIs. Suggested-by: Kai Huang <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Link: https://patchwork.kernel.org/project/kvm/patch/[email protected] [1] Link: https://patchwork.kernel.org/project/kvm/patch/[email protected]/#25054908 [2] Link: https://lore.kernel.org/kvm/[email protected]/T/#mabc0119583dacf621025e9d873c85f4fbaa66d5c [3] Link: https://lore.kernel.org/all/[email protected] [4] Link: https://lore.kernel.org/all/[email protected] [5] Link: https://lore.kernel.org/all/[email protected] [6] Co-developed-by: Rick Edgecombe <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Yan Zhao <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
1 parent 66155de commit aa8d1f4

File tree

4 files changed

+52
-2
lines changed

4 files changed

+52
-2
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8082,6 +8082,14 @@ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
80828082
guest CPUID on writes to MISC_ENABLE if
80838083
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
80848084
disabled.
8085+
8086+
KVM_X86_QUIRK_SLOT_ZAP_ALL By default, KVM invalidates all SPTEs in
8087+
fast way for memslot deletion when VM type
8088+
is KVM_X86_DEFAULT_VM.
8089+
When this quirk is disabled or when VM type
8090+
is other than KVM_X86_DEFAULT_VM, KVM zaps
8091+
only leaf SPTEs that are within the range of
8092+
the memslot being deleted.
80858093
=================================== ============================================
80868094

80878095
7.32 KVM_CAP_MAX_VCPU_ID

arch/x86/include/asm/kvm_host.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2345,7 +2345,8 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
23452345
KVM_X86_QUIRK_OUT_7E_INC_RIP | \
23462346
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
23472347
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
2348-
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS)
2348+
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
2349+
KVM_X86_QUIRK_SLOT_ZAP_ALL)
23492350

23502351
/*
23512352
* KVM previously used a u32 field in kvm_run to indicate the hypercall was

arch/x86/include/uapi/asm/kvm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -439,6 +439,7 @@ struct kvm_sync_regs {
439439
#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
440440
#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5)
441441
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
442+
#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
442443

443444
#define KVM_STATE_NESTED_FORMAT_VMX 0
444445
#define KVM_STATE_NESTED_FORMAT_SVM 1

arch/x86/kvm/mmu/mmu.c

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6997,10 +6997,50 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
69976997
kvm_mmu_zap_all(kvm);
69986998
}
69996999

7000+
/*
7001+
* Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
7002+
*
7003+
* Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
7004+
* case scenario we'll have unused shadow pages lying around until they
7005+
* are recycled due to age or when the VM is destroyed.
7006+
*/
7007+
static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
7008+
{
7009+
struct kvm_gfn_range range = {
7010+
.slot = slot,
7011+
.start = slot->base_gfn,
7012+
.end = slot->base_gfn + slot->npages,
7013+
.may_block = true,
7014+
};
7015+
bool flush = false;
7016+
7017+
write_lock(&kvm->mmu_lock);
7018+
7019+
if (kvm_memslots_have_rmaps(kvm))
7020+
flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap);
7021+
7022+
if (tdp_mmu_enabled)
7023+
flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush);
7024+
7025+
if (flush)
7026+
kvm_flush_remote_tlbs_memslot(kvm, slot);
7027+
7028+
write_unlock(&kvm->mmu_lock);
7029+
}
7030+
7031+
static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
7032+
{
7033+
return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
7034+
kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
7035+
}
7036+
70007037
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
70017038
struct kvm_memory_slot *slot)
70027039
{
7003-
kvm_mmu_zap_all_fast(kvm);
7040+
if (kvm_memslot_flush_zap_all(kvm))
7041+
kvm_mmu_zap_all_fast(kvm);
7042+
else
7043+
kvm_mmu_zap_memslot_leafs(kvm, slot);
70047044
}
70057045

70067046
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)

0 commit comments

Comments
 (0)