Skip to content

Commit 5dcc1e7

Browse files
committed
Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.11 - Add a global struct to consolidate tracking of host values, e.g. EFER, and move "shadow_phys_bits" into the structure as "maxphyaddr". - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX. - Print the name of the APICv/AVIC inhibits in the relevant tracepoint. - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor. - Misc cleanups
2 parents 86014c1 + 82222ee commit 5dcc1e7

File tree

31 files changed

+503
-209
lines changed

31 files changed

+503
-209
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 57 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -6483,9 +6483,12 @@ More architecture-specific flags detailing state of the VCPU that may
64836483
affect the device's behavior. Current defined flags::
64846484

64856485
/* x86, set if the VCPU is in system management mode */
6486-
#define KVM_RUN_X86_SMM (1 << 0)
6486+
#define KVM_RUN_X86_SMM (1 << 0)
64876487
/* x86, set if bus lock detected in VM */
6488-
#define KVM_RUN_BUS_LOCK (1 << 1)
6488+
#define KVM_RUN_X86_BUS_LOCK (1 << 1)
6489+
/* x86, set if the VCPU is executing a nested (L2) guest */
6490+
#define KVM_RUN_X86_GUEST_MODE (1 << 2)
6491+
64896492
/* arm64, set for KVM_EXIT_DEBUG */
64906493
#define KVM_DEBUG_ARCH_HSR_HIGH_VALID (1 << 0)
64916494

@@ -7831,29 +7834,31 @@ Valid bits in args[0] are::
78317834
#define KVM_BUS_LOCK_DETECTION_OFF (1 << 0)
78327835
#define KVM_BUS_LOCK_DETECTION_EXIT (1 << 1)
78337836

7834-
Enabling this capability on a VM provides userspace with a way to select
7835-
a policy to handle the bus locks detected in guest. Userspace can obtain
7836-
the supported modes from the result of KVM_CHECK_EXTENSION and define it
7837-
through the KVM_ENABLE_CAP.
7837+
Enabling this capability on a VM provides userspace with a way to select a
7838+
policy to handle the bus locks detected in guest. Userspace can obtain the
7839+
supported modes from the result of KVM_CHECK_EXTENSION and define it through
7840+
the KVM_ENABLE_CAP. The supported modes are mutually-exclusive.
78387841

7839-
KVM_BUS_LOCK_DETECTION_OFF and KVM_BUS_LOCK_DETECTION_EXIT are supported
7840-
currently and mutually exclusive with each other. More bits can be added in
7841-
the future.
7842+
This capability allows userspace to force VM exits on bus locks detected in the
7843+
guest, irrespective whether or not the host has enabled split-lock detection
7844+
(which triggers an #AC exception that KVM intercepts). This capability is
7845+
intended to mitigate attacks where a malicious/buggy guest can exploit bus
7846+
locks to degrade the performance of the whole system.
78427847

7843-
With KVM_BUS_LOCK_DETECTION_OFF set, bus locks in guest will not cause vm exits
7844-
so that no additional actions are needed. This is the default mode.
7848+
If KVM_BUS_LOCK_DETECTION_OFF is set, KVM doesn't force guest bus locks to VM
7849+
exit, although the host kernel's split-lock #AC detection still applies, if
7850+
enabled.
78457851

7846-
With KVM_BUS_LOCK_DETECTION_EXIT set, vm exits happen when bus lock detected
7847-
in VM. KVM just exits to userspace when handling them. Userspace can enforce
7848-
its own throttling or other policy based mitigations.
7852+
If KVM_BUS_LOCK_DETECTION_EXIT is set, KVM enables a CPU feature that ensures
7853+
bus locks in the guest trigger a VM exit, and KVM exits to userspace for all
7854+
such VM exits, e.g. to allow userspace to throttle the offending guest and/or
7855+
apply some other policy-based mitigation. When exiting to userspace, KVM sets
7856+
KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason
7857+
to KVM_EXIT_X86_BUS_LOCK.
78497858

7850-
This capability is aimed to address the thread that VM can exploit bus locks to
7851-
degree the performance of the whole system. Once the userspace enable this
7852-
capability and select the KVM_BUS_LOCK_DETECTION_EXIT mode, KVM will set the
7853-
KVM_RUN_BUS_LOCK flag in vcpu-run->flags field and exit to userspace. Concerning
7854-
the bus lock vm exit can be preempted by a higher priority VM exit, the exit
7855-
notifications to userspace can be KVM_EXIT_BUS_LOCK or other reasons.
7856-
KVM_RUN_BUS_LOCK flag is used to distinguish between them.
7859+
Note! Detected bus locks may be coincident with other exits to userspace, i.e.
7860+
KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if
7861+
userspace wants to take action on all detected bus locks.
78577862

78587863
7.23 KVM_CAP_PPC_DAWR1
78597864
----------------------
@@ -8137,6 +8142,37 @@ error/annotated fault.
81378142

81388143
See KVM_EXIT_MEMORY_FAULT for more information.
81398144

8145+
7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
8146+
-----------------------------------
8147+
8148+
:Architectures: x86
8149+
:Target: VM
8150+
:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
8151+
:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
8152+
frequency or if any vCPUs have been created, -ENXIO if a virtual
8153+
local APIC has not been created using KVM_CREATE_IRQCHIP.
8154+
8155+
This capability sets the VM's APIC bus clock frequency, used by KVM's in-kernel
8156+
virtual APIC when emulating APIC timers. KVM's default value can be retrieved
8157+
by KVM_CHECK_EXTENSION.
8158+
8159+
Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
8160+
core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
8161+
8162+
7.36 KVM_CAP_X86_GUEST_MODE
8163+
------------------------------
8164+
8165+
:Architectures: x86
8166+
:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
8167+
8168+
The presence of this capability indicates that KVM_RUN will update the
8169+
KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the
8170+
vCPU was executing nested guest code when it exited.
8171+
8172+
KVM exits with the register state of either the L1 or L2 guest
8173+
depending on which executed at the time of an exit. Userspace must
8174+
take care to differentiate between these cases.
8175+
81408176
8. Other capabilities.
81418177
======================
81428178

arch/x86/include/asm/kvm_host.h

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1208,7 +1208,7 @@ enum kvm_apicv_inhibit {
12081208
* APIC acceleration is disabled by a module parameter
12091209
* and/or not supported in hardware.
12101210
*/
1211-
APICV_INHIBIT_REASON_DISABLE,
1211+
APICV_INHIBIT_REASON_DISABLED,
12121212

12131213
/*
12141214
* APIC acceleration is inhibited because AutoEOI feature is
@@ -1278,8 +1278,27 @@ enum kvm_apicv_inhibit {
12781278
* mapping between logical ID and vCPU.
12791279
*/
12801280
APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,
1281+
1282+
NR_APICV_INHIBIT_REASONS,
12811283
};
12821284

1285+
#define __APICV_INHIBIT_REASON(reason) \
1286+
{ BIT(APICV_INHIBIT_REASON_##reason), #reason }
1287+
1288+
#define APICV_INHIBIT_REASONS \
1289+
__APICV_INHIBIT_REASON(DISABLED), \
1290+
__APICV_INHIBIT_REASON(HYPERV), \
1291+
__APICV_INHIBIT_REASON(ABSENT), \
1292+
__APICV_INHIBIT_REASON(BLOCKIRQ), \
1293+
__APICV_INHIBIT_REASON(PHYSICAL_ID_ALIASED), \
1294+
__APICV_INHIBIT_REASON(APIC_ID_MODIFIED), \
1295+
__APICV_INHIBIT_REASON(APIC_BASE_MODIFIED), \
1296+
__APICV_INHIBIT_REASON(NESTED), \
1297+
__APICV_INHIBIT_REASON(IRQWIN), \
1298+
__APICV_INHIBIT_REASON(PIT_REINJ), \
1299+
__APICV_INHIBIT_REASON(SEV), \
1300+
__APICV_INHIBIT_REASON(LOGICAL_ID_ALIASED)
1301+
12831302
struct kvm_arch {
12841303
unsigned long n_used_mmu_pages;
12851304
unsigned long n_requested_mmu_pages;
@@ -1365,6 +1384,7 @@ struct kvm_arch {
13651384

13661385
u32 default_tsc_khz;
13671386
bool user_set_tsc;
1387+
u64 apic_bus_cycle_ns;
13681388

13691389
seqcount_raw_spinlock_t pvclock_sc;
13701390
bool use_master_clock;
@@ -1709,7 +1729,6 @@ struct kvm_x86_ops {
17091729
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
17101730
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
17111731
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
1712-
bool (*check_apicv_inhibit_reasons)(enum kvm_apicv_inhibit reason);
17131732
const unsigned long required_apicv_inhibits;
17141733
bool allow_apicv_in_x2apic_without_x2apic_virtualization;
17151734
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
@@ -1855,7 +1874,6 @@ struct kvm_arch_async_pf {
18551874
};
18561875

18571876
extern u32 __read_mostly kvm_nr_uret_msrs;
1858-
extern u64 __read_mostly host_efer;
18591877
extern bool __read_mostly allow_smaller_maxphyaddr;
18601878
extern bool __read_mostly enable_apicv;
18611879
extern struct kvm_x86_ops kvm_x86_ops;

arch/x86/include/uapi/asm/kvm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ struct kvm_ioapic_state {
106106

107107
#define KVM_RUN_X86_SMM (1 << 0)
108108
#define KVM_RUN_X86_BUS_LOCK (1 << 1)
109+
#define KVM_RUN_X86_GUEST_MODE (1 << 2)
109110

110111
/* for KVM_GET_REGS and KVM_SET_REGS */
111112
struct kvm_regs {

arch/x86/kvm/cpuid.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -335,6 +335,18 @@ static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
335335
#endif
336336
}
337337

338+
static bool guest_cpuid_is_amd_or_hygon(struct kvm_vcpu *vcpu)
339+
{
340+
struct kvm_cpuid_entry2 *entry;
341+
342+
entry = kvm_find_cpuid_entry(vcpu, 0);
343+
if (!entry)
344+
return false;
345+
346+
return is_guest_vendor_amd(entry->ebx, entry->ecx, entry->edx) ||
347+
is_guest_vendor_hygon(entry->ebx, entry->ecx, entry->edx);
348+
}
349+
338350
static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
339351
{
340352
struct kvm_lapic *apic = vcpu->arch.apic;

arch/x86/kvm/cpuid.h

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -102,24 +102,6 @@ static __always_inline void guest_cpuid_clear(struct kvm_vcpu *vcpu,
102102
*reg &= ~__feature_bit(x86_feature);
103103
}
104104

105-
static inline bool guest_cpuid_is_amd_or_hygon(struct kvm_vcpu *vcpu)
106-
{
107-
struct kvm_cpuid_entry2 *best;
108-
109-
best = kvm_find_cpuid_entry(vcpu, 0);
110-
return best &&
111-
(is_guest_vendor_amd(best->ebx, best->ecx, best->edx) ||
112-
is_guest_vendor_hygon(best->ebx, best->ecx, best->edx));
113-
}
114-
115-
static inline bool guest_cpuid_is_intel(struct kvm_vcpu *vcpu)
116-
{
117-
struct kvm_cpuid_entry2 *best;
118-
119-
best = kvm_find_cpuid_entry(vcpu, 0);
120-
return best && is_guest_vendor_intel(best->ebx, best->ecx, best->edx);
121-
}
122-
123105
static inline bool guest_cpuid_is_amd_compatible(struct kvm_vcpu *vcpu)
124106
{
125107
return vcpu->arch.is_amd_compatible;

arch/x86/kvm/emulate.c

Lines changed: 21 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -2354,50 +2354,6 @@ setup_syscalls_segments(struct desc_struct *cs, struct desc_struct *ss)
23542354
ss->avl = 0;
23552355
}
23562356

2357-
static bool vendor_intel(struct x86_emulate_ctxt *ctxt)
2358-
{
2359-
u32 eax, ebx, ecx, edx;
2360-
2361-
eax = ecx = 0;
2362-
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, true);
2363-
return is_guest_vendor_intel(ebx, ecx, edx);
2364-
}
2365-
2366-
static bool em_syscall_is_enabled(struct x86_emulate_ctxt *ctxt)
2367-
{
2368-
const struct x86_emulate_ops *ops = ctxt->ops;
2369-
u32 eax, ebx, ecx, edx;
2370-
2371-
/*
2372-
* syscall should always be enabled in longmode - so only become
2373-
* vendor specific (cpuid) if other modes are active...
2374-
*/
2375-
if (ctxt->mode == X86EMUL_MODE_PROT64)
2376-
return true;
2377-
2378-
eax = 0x00000000;
2379-
ecx = 0x00000000;
2380-
ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, true);
2381-
/*
2382-
* remark: Intel CPUs only support "syscall" in 64bit longmode. Also a
2383-
* 64bit guest with a 32bit compat-app running will #UD !! While this
2384-
* behaviour can be fixed (by emulating) into AMD response - CPUs of
2385-
* AMD can't behave like Intel.
2386-
*/
2387-
if (is_guest_vendor_intel(ebx, ecx, edx))
2388-
return false;
2389-
2390-
if (is_guest_vendor_amd(ebx, ecx, edx) ||
2391-
is_guest_vendor_hygon(ebx, ecx, edx))
2392-
return true;
2393-
2394-
/*
2395-
* default: (not Intel, not AMD, not Hygon), apply Intel's
2396-
* stricter rules...
2397-
*/
2398-
return false;
2399-
}
2400-
24012357
static int em_syscall(struct x86_emulate_ctxt *ctxt)
24022358
{
24032359
const struct x86_emulate_ops *ops = ctxt->ops;
@@ -2411,7 +2367,15 @@ static int em_syscall(struct x86_emulate_ctxt *ctxt)
24112367
ctxt->mode == X86EMUL_MODE_VM86)
24122368
return emulate_ud(ctxt);
24132369

2414-
if (!(em_syscall_is_enabled(ctxt)))
2370+
/*
2371+
* Intel compatible CPUs only support SYSCALL in 64-bit mode, whereas
2372+
* AMD allows SYSCALL in any flavor of protected mode. Note, it's
2373+
* infeasible to emulate Intel behavior when running on AMD hardware,
2374+
* as SYSCALL won't fault in the "wrong" mode, i.e. there is no #UD
2375+
* for KVM to trap-and-emulate, unlike emulating AMD on Intel.
2376+
*/
2377+
if (ctxt->mode != X86EMUL_MODE_PROT64 &&
2378+
ctxt->ops->guest_cpuid_is_intel_compatible(ctxt))
24152379
return emulate_ud(ctxt);
24162380

24172381
ops->get_msr(ctxt, MSR_EFER, &efer);
@@ -2471,11 +2435,11 @@ static int em_sysenter(struct x86_emulate_ctxt *ctxt)
24712435
return emulate_gp(ctxt, 0);
24722436

24732437
/*
2474-
* Not recognized on AMD in compat mode (but is recognized in legacy
2475-
* mode).
2438+
* Intel's architecture allows SYSENTER in compatibility mode, but AMD
2439+
* does not. Note, AMD does allow SYSENTER in legacy protected mode.
24762440
*/
2477-
if ((ctxt->mode != X86EMUL_MODE_PROT64) && (efer & EFER_LMA)
2478-
&& !vendor_intel(ctxt))
2441+
if ((ctxt->mode != X86EMUL_MODE_PROT64) && (efer & EFER_LMA) &&
2442+
!ctxt->ops->guest_cpuid_is_intel_compatible(ctxt))
24792443
return emulate_ud(ctxt);
24802444

24812445
/* sysenter/sysexit have not been tested in 64bit mode. */
@@ -2647,7 +2611,14 @@ static void string_registers_quirk(struct x86_emulate_ctxt *ctxt)
26472611
* manner when ECX is zero due to REP-string optimizations.
26482612
*/
26492613
#ifdef CONFIG_X86_64
2650-
if (ctxt->ad_bytes != 4 || !vendor_intel(ctxt))
2614+
u32 eax, ebx, ecx, edx;
2615+
2616+
if (ctxt->ad_bytes != 4)
2617+
return;
2618+
2619+
eax = ecx = 0;
2620+
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, true);
2621+
if (!is_guest_vendor_intel(ebx, ecx, edx))
26512622
return;
26522623

26532624
*reg_write(ctxt, VCPU_REGS_RCX) = 0;

arch/x86/kvm/hyperv.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1737,7 +1737,8 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
17371737
data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
17381738
break;
17391739
case HV_X64_MSR_APIC_FREQUENCY:
1740-
data = APIC_BUS_FREQUENCY;
1740+
data = div64_u64(1000000000ULL,
1741+
vcpu->kvm->arch.apic_bus_cycle_ns);
17411742
break;
17421743
default:
17431744
kvm_pr_unimpl_rdmsr(vcpu, msr);

arch/x86/kvm/kvm_emulate.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,7 @@ struct x86_emulate_ops {
223223
bool (*guest_has_movbe)(struct x86_emulate_ctxt *ctxt);
224224
bool (*guest_has_fxsr)(struct x86_emulate_ctxt *ctxt);
225225
bool (*guest_has_rdpid)(struct x86_emulate_ctxt *ctxt);
226+
bool (*guest_cpuid_is_intel_compatible)(struct x86_emulate_ctxt *ctxt);
226227

227228
void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
228229

arch/x86/kvm/lapic.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1557,7 +1557,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
15571557
remaining = 0;
15581558

15591559
ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
1560-
return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
1560+
return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
1561+
apic->divide_count));
15611562
}
15621563

15631564
static void __report_tpr_access(struct kvm_lapic *apic, bool write)
@@ -1973,7 +1974,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
19731974

19741975
static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
19751976
{
1976-
return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
1977+
return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
1978+
(u64)apic->divide_count;
19771979
}
19781980

19791981
static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)

arch/x86/kvm/lapic.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,7 @@
1616
#define APIC_DEST_NOSHORT 0x0
1717
#define APIC_DEST_MASK 0x800
1818

19-
#define APIC_BUS_CYCLE_NS 1
20-
#define APIC_BUS_FREQUENCY (1000000000ULL / APIC_BUS_CYCLE_NS)
19+
#define APIC_BUS_CYCLE_NS_DEFAULT 1
2120

2221
#define APIC_BROADCAST 0xFF
2322
#define X2APIC_BROADCAST 0xFFFFFFFFul

0 commit comments

Comments
 (0)