Skip to content

Commit 5bd74f6

Browse files
sean-jcbonzini
authored andcommitted
KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots
Allow mapping KVM's internal memslots used for EPT without unrestricted guest into L2, i.e. allow mapping the hidden TSS and the identity mapped page tables into L2. Unlike the APIC access page, there is no correctness issue with letting L2 access the "hidden" memory. Allowing these memslots to be mapped into L2 fixes a largely theoretical bug where KVM could incorrectly emulate subsequent _L1_ accesses as MMIO, and also ensures consistent KVM behavior for L2. If KVM is using TDP, but L1 is using shadow paging for L2, then routing through kvm_handle_noslot_fault() will incorrectly cache the gfn as MMIO, and create an MMIO SPTE. Creating an MMIO SPTE is ok, but only because kvm_mmu_page_role.guest_mode ensure KVM uses different roots for L1 vs. L2. But vcpu->arch.mmio_gfn will remain valid, and could cause KVM to incorrectly treat an L1 access to the hidden TSS or identity mapped page tables as MMIO. Furthermore, forcing L2 accesses to be treated as "no slot" faults doesn't actually prevent exposing KVM's internal memslots to L2, it simply forces KVM to emulate the access. In most cases, that will trigger MMIO, amusingly due to filling vcpu->arch.mmio_gfn, but also because vcpu_is_mmio_gpa() unconditionally treats APIC accesses as MMIO, i.e. APIC accesses are ok. But the hidden TSS and identity mapped page tables could go either way (MMIO or access the private memslot's backing memory). Alternatively, the inconsistent emulator behavior could be addressed by forcing MMIO emulation for L2 access to all internal memslots, not just to the APIC. But that's arguably less correct than letting L2 access the hidden TSS and identity mapped page tables, not to mention that it's *extremely* unlikely anyone cares what KVM does in this case. From L1's perspective there is R/W memory at those memslots, the memory just happens to be initialized with non-zero data. Making the memory disappear when it is accessed by L2 is far more magical and arbitrary than the memory existing in the first place. The APIC access page is special because KVM _must_ emulate the access to do the right thing (emulate an APIC access instead of reading/writing the APIC access page). And despite what commit 3a2936d ("kvm: mmu: Don't expose private memslots to L2") said, it's not just necessary when L1 is accelerating L2's virtual APIC, it's just as important (likely *more* imporant for correctness when L1 is passing through its own APIC to L2. Fixes: 3a2936d ("kvm: mmu: Don't expose private memslots to L2") Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Kai Huang <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
1 parent 44f42ef commit 5bd74f6

File tree

1 file changed

+13
-4
lines changed

1 file changed

+13
-4
lines changed

arch/x86/kvm/mmu/mmu.c

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4297,8 +4297,18 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
42974297
if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
42984298
return RET_PF_RETRY;
42994299

4300-
if (!kvm_is_visible_memslot(slot)) {
4301-
/* Don't expose private memslots to L2. */
4300+
if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
4301+
/*
4302+
* Don't map L1's APIC access page into L2, KVM doesn't support
4303+
* using APICv/AVIC to accelerate L2 accesses to L1's APIC,
4304+
* i.e. the access needs to be emulated. Emulating access to
4305+
* L1's APIC is also correct if L1 is accelerating L2's own
4306+
* virtual APIC, but for some reason L1 also maps _L1's_ APIC
4307+
* into L2. Note, vcpu_is_mmio_gpa() always treats access to
4308+
* the APIC as MMIO. Allow an MMIO SPTE to be created, as KVM
4309+
* uses different roots for L1 vs. L2, i.e. there is no danger
4310+
* of breaking APICv/AVIC for L1.
4311+
*/
43024312
if (is_guest_mode(vcpu)) {
43034313
fault->slot = NULL;
43044314
fault->pfn = KVM_PFN_NOSLOT;
@@ -4311,8 +4321,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
43114321
* MMIO SPTE. That way the cache doesn't need to be purged
43124322
* when the AVIC is re-enabled.
43134323
*/
4314-
if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
4315-
!kvm_apicv_activated(vcpu->kvm))
4324+
if (!kvm_apicv_activated(vcpu->kvm))
43164325
return RET_PF_EMULATE;
43174326
}
43184327

0 commit comments

Comments
 (0)