Skip to content

Commit 5c5ddf7

Browse files
committed
Merge tag 'kvm-x86-mtrrs-6.11' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MTRR virtualization removal Remove support for virtualizing MTRRs on Intel CPUs, along with a nasty CR0.CD hack, and instead always honor guest PAT on CPUs that support self-snoop.
2 parents 34b69ed + 377b2f3 commit 5c5ddf7

File tree

10 files changed

+105
-702
lines changed

10 files changed

+105
-702
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8025,7 +8025,11 @@ The valid bits in cap.args[0] are:
80258025
When this quirk is disabled, the reset value
80268026
is 0x10000 (APIC_LVT_MASKED).
80278027

8028-
KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW.
8028+
KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on
8029+
AMD CPUs to workaround buggy guest firmware
8030+
that runs in perpetuity with CR0.CD, i.e.
8031+
with caches in "no fill" mode.
8032+
80298033
When this quirk is disabled, KVM does not
80308034
change the value of CR0.CD and CR0.NW.
80318035

Documentation/virt/kvm/x86/errata.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,21 @@ have the same physical APIC ID, KVM will deliver events targeting that APIC ID
4848
only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
4949
not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
5050
matching the target APIC ID receive the interrupt).
51+
52+
MTRRs
53+
-----
54+
KVM does not virtualize guest MTRR memory types. KVM emulates accesses to MTRR
55+
MSRs, i.e. {RD,WR}MSR in the guest will behave as expected, but KVM does not
56+
honor guest MTRRs when determining the effective memory type, and instead
57+
treats all of guest memory as having Writeback (WB) MTRRs.
58+
59+
CR0.CD
60+
------
61+
KVM does not virtualize CR0.CD on Intel CPUs. Similar to MTRR MSRs, KVM
62+
emulates CR0.CD accesses so that loads and stores from/to CR0 behave as
63+
expected, but setting CR0.CD=1 has no impact on the cachaeability of guest
64+
memory.
65+
66+
Note, this erratum does not affect AMD CPUs, which fully virtualize CR0.CD in
67+
hardware, i.e. put the CPU caches into "no fill" mode when CR0.CD=1, even when
68+
running in the guest.

arch/x86/include/asm/kvm_host.h

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,6 @@
160160
#define KVM_MIN_FREE_MMU_PAGES 5
161161
#define KVM_REFILL_PAGES 25
162162
#define KVM_MAX_CPUID_ENTRIES 256
163-
#define KVM_NR_FIXED_MTRR_REGION 88
164163
#define KVM_NR_VAR_MTRR 8
165164

166165
#define ASYNC_PF_PER_VCPU 64
@@ -605,18 +604,12 @@ enum {
605604
KVM_DEBUGREG_WONT_EXIT = 2,
606605
};
607606

608-
struct kvm_mtrr_range {
609-
u64 base;
610-
u64 mask;
611-
struct list_head node;
612-
};
613-
614607
struct kvm_mtrr {
615-
struct kvm_mtrr_range var_ranges[KVM_NR_VAR_MTRR];
616-
mtrr_type fixed_ranges[KVM_NR_FIXED_MTRR_REGION];
608+
u64 var[KVM_NR_VAR_MTRR * 2];
609+
u64 fixed_64k;
610+
u64 fixed_16k[2];
611+
u64 fixed_4k[8];
617612
u64 deftype;
618-
619-
struct list_head head;
620613
};
621614

622615
/* Hyper-V SynIC timer */

arch/x86/kvm/mmu.h

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -221,12 +221,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
221221
return -(u32)fault & errcode;
222222
}
223223

224-
bool __kvm_mmu_honors_guest_mtrrs(bool vm_has_noncoherent_dma);
225-
226-
static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
227-
{
228-
return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
229-
}
224+
bool kvm_mmu_may_ignore_guest_pat(void);
230225

231226
int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
232227

arch/x86/kvm/mmu/mmu.c

Lines changed: 10 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4671,38 +4671,23 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
46714671
}
46724672
#endif
46734673

4674-
bool __kvm_mmu_honors_guest_mtrrs(bool vm_has_noncoherent_dma)
4674+
bool kvm_mmu_may_ignore_guest_pat(void)
46754675
{
46764676
/*
4677-
* If host MTRRs are ignored (shadow_memtype_mask is non-zero), and the
4678-
* VM has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is
4679-
* to honor the memtype from the guest's MTRRs so that guest accesses
4680-
* to memory that is DMA'd aren't cached against the guest's wishes.
4681-
*
4682-
* Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
4683-
* e.g. KVM will force UC memtype for host MMIO.
4677+
* When EPT is enabled (shadow_memtype_mask is non-zero), the CPU does
4678+
* not support self-snoop (or is affected by an erratum), and the VM
4679+
* has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
4680+
* honor the memtype from the guest's PAT so that guest accesses to
4681+
* memory that is DMA'd aren't cached against the guest's wishes. As a
4682+
* result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
4683+
* KVM _always_ ignores or honors guest PAT, i.e. doesn't toggle SPTE
4684+
* bits in response to non-coherent device (un)registration.
46844685
*/
4685-
return vm_has_noncoherent_dma && shadow_memtype_mask;
4686+
return !static_cpu_has(X86_FEATURE_SELFSNOOP) && shadow_memtype_mask;
46864687
}
46874688

46884689
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
46894690
{
4690-
/*
4691-
* If the guest's MTRRs may be used to compute the "real" memtype,
4692-
* restrict the mapping level to ensure KVM uses a consistent memtype
4693-
* across the entire mapping.
4694-
*/
4695-
if (kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) {
4696-
for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) {
4697-
int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
4698-
gfn_t base = gfn_round_for_level(fault->gfn,
4699-
fault->max_level);
4700-
4701-
if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num))
4702-
break;
4703-
}
4704-
}
4705-
47064691
#ifdef CONFIG_X86_64
47074692
if (tdp_mmu_enabled)
47084693
return kvm_tdp_mmu_page_fault(vcpu, fault);

0 commit comments

Comments
 (0)