Skip to content

Commit 7f01cab

Browse files
sean-jcbonzini
authored andcommitted
KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE
For TD guest, the current way to emulate MMIO doesn't work any more, as KVM is not able to access the private memory of TD guest and do the emulation. Instead, TD guest expects to receive #VE when it accesses the MMIO and then it can explicitly make hypercall to KVM to get the expected information. To achieve this, the TDX module always enables "EPT-violation #VE" in the VMCS control. And accordingly, for the MMIO spte for the shared GPA, 1. KVM needs to set "suppress #VE" bit for the non-present SPTE so that EPT violation happens on TD accessing MMIO range. 2. On EPT violation, KVM sets the MMIO spte to clear "suppress #VE" bit so the TD guest can receive the #VE instead of EPT misconfiguration unlike VMX case. For the shared GPA that is not populated yet, EPT violation need to be triggered when TD guest accesses such shared GPA. The non-present SPTE value for shared GPA should set "suppress #VE" bit. Add "suppress #VE" bit (bit 63) to SHADOW_NONPRESENT_VALUE and REMOVED_SPTE. Unconditionally set the "suppress #VE" bit (which is bit 63) for both AMD and Intel as: 1) AMD hardware doesn't use this bit when present bit is off; 2) for normal VMX guest, KVM never enables the "EPT-violation #VE" in VMCS control and "suppress #VE" bit is ignored by hardware. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Isaku Yamahata <[email protected]> Reviewed-by: Binbin Wu <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Message-Id: <a99cb866897c7083430dce7f24c63b17d7121134.1705965635.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <[email protected]>
1 parent d8fa203 commit 7f01cab

File tree

3 files changed

+28
-14
lines changed

3 files changed

+28
-14
lines changed

arch/x86/kvm/mmu/paging_tmpl.h

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -933,13 +933,13 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
933933
return 0;
934934

935935
/*
936-
* Drop the SPTE if the new protections would result in a RWX=0
937-
* SPTE or if the gfn is changing. The RWX=0 case only affects
938-
* EPT with execute-only support, i.e. EPT without an effective
939-
* "present" bit, as all other paging modes will create a
940-
* read-only SPTE if pte_access is zero.
936+
* Drop the SPTE if the new protections result in no effective
937+
* "present" bit or if the gfn is changing. The former case
938+
* only affects EPT with execute-only support with pte_access==0;
939+
* all other paging modes will create a read-only SPTE if
940+
* pte_access is zero.
941941
*/
942-
if ((!pte_access && !shadow_present_mask) ||
942+
if ((pte_access | shadow_present_mask) == SHADOW_NONPRESENT_VALUE ||
943943
gfn != kvm_mmu_page_get_gfn(sp, i)) {
944944
drop_spte(vcpu->kvm, &sp->spt[i]);
945945
return 1;

arch/x86/kvm/mmu/spte.c

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -144,19 +144,19 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
144144
u64 spte = SPTE_MMU_PRESENT_MASK;
145145
bool wrprot = false;
146146

147-
WARN_ON_ONCE(!pte_access && !shadow_present_mask);
147+
/*
148+
* For the EPT case, shadow_present_mask has no RWX bits set if
149+
* exec-only page table entries are supported. In that case,
150+
* ACC_USER_MASK and shadow_user_mask are used to represent
151+
* read access. See FNAME(gpte_access) in paging_tmpl.h.
152+
*/
153+
WARN_ON_ONCE((pte_access | shadow_present_mask) == SHADOW_NONPRESENT_VALUE);
148154

149155
if (sp->role.ad_disabled)
150156
spte |= SPTE_TDP_AD_DISABLED;
151157
else if (kvm_mmu_page_ad_need_write_protect(sp))
152158
spte |= SPTE_TDP_AD_WRPROT_ONLY;
153159

154-
/*
155-
* For the EPT case, shadow_present_mask is 0 if hardware
156-
* supports exec-only page table entries. In that case,
157-
* ACC_USER_MASK and shadow_user_mask are used to represent
158-
* read access. See FNAME(gpte_access) in paging_tmpl.h.
159-
*/
160160
spte |= shadow_present_mask;
161161
if (!prefetch)
162162
spte |= spte_shadow_accessed_mask(spte);

arch/x86/kvm/mmu/spte.h

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,21 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
149149

150150
#define MMIO_SPTE_GEN_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0)
151151

152+
/*
153+
* Non-present SPTE value needs to set bit 63 for TDX, in order to suppress
154+
* #VE and get EPT violations on non-present PTEs. We can use the
155+
* same value also without TDX for both VMX and SVM:
156+
*
157+
* For SVM NPT, for non-present spte (bit 0 = 0), other bits are ignored.
158+
* For VMX EPT, bit 63 is ignored if #VE is disabled. (EPT_VIOLATION_VE=0)
159+
* bit 63 is #VE suppress if #VE is enabled. (EPT_VIOLATION_VE=1)
160+
*/
161+
#ifdef CONFIG_X86_64
162+
#define SHADOW_NONPRESENT_VALUE BIT_ULL(63)
163+
static_assert(!(SHADOW_NONPRESENT_VALUE & SPTE_MMU_PRESENT_MASK));
164+
#else
152165
#define SHADOW_NONPRESENT_VALUE 0ULL
166+
#endif
153167

154168
extern u64 __read_mostly shadow_host_writable_mask;
155169
extern u64 __read_mostly shadow_mmu_writable_mask;
@@ -192,7 +206,7 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
192206
*
193207
* Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
194208
* both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
195-
* vulnerability. Use only low bits to avoid 64-bit immediates.
209+
* vulnerability.
196210
*
197211
* Only used by the TDP MMU.
198212
*/

0 commit comments

Comments
 (0)