Skip to content

Commit 12abeb8

Browse files
committed
Merge tag 'kvm-x86-cet-6.18' of https://github.com/kvm-x86/linux into HEAD
KVM x86 CET virtualization support for 6.18 Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect Branch Tracking (IBT), that can be utilized by software to help provide Control-flow integrity (CFI). SHSTK defends against backward-edge attacks (a.k.a. Return-oriented programming (ROP)), while IBT defends against forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)). Attackers commonly use ROP and COP/JOP methodologies to redirect the control- flow to unauthorized targets in order to execute small snippets of code, a.k.a. gadgets, of the attackers choice. By chaining together several gadgets, an attacker can perform arbitrary operations and circumvent the system's defenses. SHSTK defends against backward-edge attacks, which execute gadgets by modifying the stack to branch to the attacker's target via RET, by providing a second stack that is used exclusively to track control transfer operations. The shadow stack is separate from the data/normal stack, and can be enabled independently in user and kernel mode. When SHSTK is is enabled, CALL instructions push the return address on both the data and shadow stack. RET then pops the return address from both stacks and compares the addresses. If the return addresses from the two stacks do not match, the CPU generates a Control Protection (#CP) exception. IBT defends against backward-edge attacks, which branch to gadgets by executing indirect CALL and JMP instructions with attacker controlled register or memory state, by requiring the target of indirect branches to start with a special marker instruction, ENDBRANCH. If an indirect branch is executed and the next instruction is not an ENDBRANCH, the CPU generates a #CP. Note, ENDBRANCH behaves as a NOP if IBT is disabled or unsupported. From a virtualization perspective, CET presents several problems. While SHSTK and IBT have two layers of enabling, a global control in the form of a CR4 bit, and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS. Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable option due to complexity, and outright disallowing use of XSTATE to context switch SHSTK/IBT state would render the features unusable to most guests. To limit the overall complexity without sacrificing performance or usability, simply ignore the potential virtualization hole, but ensure that all paths in KVM treat SHSTK/IBT as usable by the guest if the feature is supported in hardware, and the guest has access to at least one of SHSTK or IBT. I.e. allow userspace to advertise one of SHSTK or IBT if both are supported in hardware, even though doing so would allow a misbehaving guest to use the unadvertised feature. Fully emulating SHSTK and IBT would also require significant complexity, e.g. to track and update branch state for IBT, and shadow stack state for SHSTK. Given that emulating large swaths of the guest code stream isn't necessary on modern CPUs, punt on emulating instructions that meaningful impact or consume SHSTK or IBT. However, instead of doing nothing, explicitly reject emulation of such instructions so that KVM's emulator can't be abused to circumvent CET. Disable support for SHSTK and IBT if KVM is configured such that emulation of arbitrary guest instructions may be required, specifically if Unrestricted Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR. Lastly disable SHSTK support if shadow paging is enabled, as the protections for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so that they can't be directly modified by software), i.e. would require non-trivial support in the Shadow MMU. Note, AMD CPUs currently only support SHSTK. Explicitly disable IBT support so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT in SVM requires KVM modifications.
2 parents d05ca6b + d292035 commit 12abeb8

File tree

28 files changed

+1563
-95
lines changed

28 files changed

+1563
-95
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2908,6 +2908,16 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
29082908

29092909
0x9030 0000 0002 <reg:16>
29102910

2911+
x86 MSR registers have the following id bit patterns::
2912+
0x2030 0002 <msr number:32>
2913+
2914+
Following are the KVM-defined registers for x86:
2915+
2916+
======================= ========= =============================================
2917+
Encoding Register Description
2918+
======================= ========= =============================================
2919+
0x2030 0003 0000 0000 SSP Shadow Stack Pointer
2920+
======================= ========= =============================================
29112921

29122922
4.69 KVM_GET_ONE_REG
29132923
--------------------
@@ -3588,7 +3598,7 @@ VCPU matching underlying host.
35883598
---------------------
35893599

35903600
:Capability: basic
3591-
:Architectures: arm64, mips, riscv
3601+
:Architectures: arm64, mips, riscv, x86 (if KVM_CAP_ONE_REG)
35923602
:Type: vcpu ioctl
35933603
:Parameters: struct kvm_reg_list (in/out)
35943604
:Returns: 0 on success; -1 on error
@@ -3631,6 +3641,8 @@ Note that s390 does not support KVM_GET_REG_LIST for historical reasons
36313641

36323642
- KVM_REG_S390_GBEA
36333643

3644+
Note, for x86, all MSRs enumerated by KVM_GET_MSR_INDEX_LIST are supported as
3645+
type KVM_X86_REG_TYPE_MSR, but are NOT enumerated via KVM_GET_REG_LIST.
36343646

36353647
4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
36363648
-----------------------------------------

arch/x86/include/asm/kvm_host.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@
142142
| X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
143143
| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
144144
| X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
145-
| X86_CR4_LAM_SUP))
145+
| X86_CR4_LAM_SUP | X86_CR4_CET))
146146

147147
#define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
148148

@@ -267,6 +267,7 @@ enum x86_intercept_stage;
267267
#define PFERR_RSVD_MASK BIT(3)
268268
#define PFERR_FETCH_MASK BIT(4)
269269
#define PFERR_PK_MASK BIT(5)
270+
#define PFERR_SS_MASK BIT(6)
270271
#define PFERR_SGX_MASK BIT(15)
271272
#define PFERR_GUEST_RMP_MASK BIT_ULL(31)
272273
#define PFERR_GUEST_FINAL_MASK BIT_ULL(32)
@@ -815,7 +816,6 @@ struct kvm_vcpu_arch {
815816
bool at_instruction_boundary;
816817
bool tpr_access_reporting;
817818
bool xfd_no_write_intercept;
818-
u64 ia32_xss;
819819
u64 microcode_version;
820820
u64 arch_capabilities;
821821
u64 perf_capabilities;
@@ -876,6 +876,8 @@ struct kvm_vcpu_arch {
876876

877877
u64 xcr0;
878878
u64 guest_supported_xcr0;
879+
u64 ia32_xss;
880+
u64 guest_supported_xss;
879881

880882
struct kvm_pio_request pio;
881883
void *pio_data;

arch/x86/include/asm/vmx.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@
106106
#define VM_EXIT_CLEAR_BNDCFGS 0x00800000
107107
#define VM_EXIT_PT_CONCEAL_PIP 0x01000000
108108
#define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000
109+
#define VM_EXIT_LOAD_CET_STATE 0x10000000
109110

110111
#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
111112

@@ -119,6 +120,7 @@
119120
#define VM_ENTRY_LOAD_BNDCFGS 0x00010000
120121
#define VM_ENTRY_PT_CONCEAL_PIP 0x00020000
121122
#define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000
123+
#define VM_ENTRY_LOAD_CET_STATE 0x00100000
122124

123125
#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
124126

@@ -132,6 +134,7 @@
132134
#define VMX_BASIC_DUAL_MONITOR_TREATMENT BIT_ULL(49)
133135
#define VMX_BASIC_INOUT BIT_ULL(54)
134136
#define VMX_BASIC_TRUE_CTLS BIT_ULL(55)
137+
#define VMX_BASIC_NO_HW_ERROR_CODE_CC BIT_ULL(56)
135138

136139
static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
137140
{
@@ -369,6 +372,9 @@ enum vmcs_field {
369372
GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
370373
GUEST_SYSENTER_ESP = 0x00006824,
371374
GUEST_SYSENTER_EIP = 0x00006826,
375+
GUEST_S_CET = 0x00006828,
376+
GUEST_SSP = 0x0000682a,
377+
GUEST_INTR_SSP_TABLE = 0x0000682c,
372378
HOST_CR0 = 0x00006c00,
373379
HOST_CR3 = 0x00006c02,
374380
HOST_CR4 = 0x00006c04,
@@ -381,6 +387,9 @@ enum vmcs_field {
381387
HOST_IA32_SYSENTER_EIP = 0x00006c12,
382388
HOST_RSP = 0x00006c14,
383389
HOST_RIP = 0x00006c16,
390+
HOST_S_CET = 0x00006c18,
391+
HOST_SSP = 0x00006c1a,
392+
HOST_INTR_SSP_TABLE = 0x00006c1c
384393
};
385394

386395
/*

arch/x86/include/uapi/asm/kvm.h

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,11 @@
3535
#define MC_VECTOR 18
3636
#define XM_VECTOR 19
3737
#define VE_VECTOR 20
38+
#define CP_VECTOR 21
39+
40+
#define HV_VECTOR 28
41+
#define VC_VECTOR 29
42+
#define SX_VECTOR 30
3843

3944
/* Select x86 specific features in <linux/kvm.h> */
4045
#define __KVM_HAVE_PIT
@@ -411,6 +416,35 @@ struct kvm_xcrs {
411416
__u64 padding[16];
412417
};
413418

419+
#define KVM_X86_REG_TYPE_MSR 2
420+
#define KVM_X86_REG_TYPE_KVM 3
421+
422+
#define KVM_X86_KVM_REG_SIZE(reg) \
423+
({ \
424+
reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0; \
425+
})
426+
427+
#define KVM_X86_REG_TYPE_SIZE(type, reg) \
428+
({ \
429+
__u64 type_size = (__u64)type << 32; \
430+
\
431+
type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 : \
432+
type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) : \
433+
0; \
434+
type_size; \
435+
})
436+
437+
#define KVM_X86_REG_ID(type, index) \
438+
(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
439+
440+
#define KVM_X86_REG_MSR(index) \
441+
KVM_X86_REG_ID(KVM_X86_REG_TYPE_MSR, index)
442+
#define KVM_X86_REG_KVM(index) \
443+
KVM_X86_REG_ID(KVM_X86_REG_TYPE_KVM, index)
444+
445+
/* KVM-defined registers starting from 0 */
446+
#define KVM_REG_GUEST_SSP 0
447+
414448
#define KVM_SYNC_X86_REGS (1UL << 0)
415449
#define KVM_SYNC_X86_SREGS (1UL << 1)
416450
#define KVM_SYNC_X86_EVENTS (1UL << 2)

arch/x86/kvm/cpuid.c

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
263263
return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
264264
}
265265

266+
static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu)
267+
{
268+
struct kvm_cpuid_entry2 *best;
269+
270+
best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1);
271+
if (!best)
272+
return 0;
273+
274+
return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
275+
}
276+
266277
static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
267278
struct kvm_cpuid_entry2 *entry,
268279
unsigned int x86_feature,
@@ -305,7 +316,8 @@ static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
305316
best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
306317
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
307318
cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
308-
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
319+
best->ebx = xstate_required_size(vcpu->arch.xcr0 |
320+
vcpu->arch.ia32_xss, true);
309321
}
310322

311323
static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
@@ -424,6 +436,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
424436
}
425437

426438
vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);
439+
vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu);
427440

428441
vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
429442

@@ -933,6 +946,7 @@ void kvm_set_cpu_caps(void)
933946
VENDOR_F(WAITPKG),
934947
F(SGX_LC),
935948
F(BUS_LOCK_DETECT),
949+
X86_64_F(SHSTK),
936950
);
937951

938952
/*
@@ -942,6 +956,14 @@ void kvm_set_cpu_caps(void)
942956
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
943957
kvm_cpu_cap_clear(X86_FEATURE_PKU);
944958

959+
/*
960+
* Shadow Stacks aren't implemented in the Shadow MMU. Shadow Stack
961+
* accesses require "magic" Writable=0,Dirty=1 protection, which KVM
962+
* doesn't know how to emulate or map.
963+
*/
964+
if (!tdp_enabled)
965+
kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
966+
945967
kvm_cpu_cap_init(CPUID_7_EDX,
946968
F(AVX512_4VNNIW),
947969
F(AVX512_4FMAPS),
@@ -959,8 +981,19 @@ void kvm_set_cpu_caps(void)
959981
F(AMX_INT8),
960982
F(AMX_BF16),
961983
F(FLUSH_L1D),
984+
F(IBT),
962985
);
963986

987+
/*
988+
* Disable support for IBT and SHSTK if KVM is configured to emulate
989+
* accesses to reserved GPAs, as KVM's emulator doesn't support IBT or
990+
* SHSTK, nor does KVM handle Shadow Stack #PFs (see above).
991+
*/
992+
if (allow_smaller_maxphyaddr) {
993+
kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
994+
kvm_cpu_cap_clear(X86_FEATURE_IBT);
995+
}
996+
964997
if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
965998
boot_cpu_has(X86_FEATURE_AMD_IBPB) &&
966999
boot_cpu_has(X86_FEATURE_AMD_IBRS))

0 commit comments

Comments
 (0)