Skip to content

Commit 226d9b8

Browse files
committed
KVM: x86/mmu: Fix a largely theoretical race in kvm_mmu_track_write()
Add full memory barriers in kvm_mmu_track_write() and account_shadowed() to plug a (very, very theoretical) race where kvm_mmu_track_write() could miss a 0->1 transition of indirect_shadow_pages and fail to zap relevant, *stale* SPTEs. Without the barriers, because modern x86 CPUs allow (per the SDM): Reads may be reordered with older writes to different locations but not with older writes to the same location. it's possible that the following could happen (terms of values being visible/resolved): CPU0 CPU1 read memory[gfn] (=Y) memory[gfn] Y=>X read indirect_shadow_pages (=0) indirect_shadow_pages 0=>1 or conversely: CPU0 CPU1 indirect_shadow_pages 0=>1 read indirect_shadow_pages (=0) read memory[gfn] (=Y) memory[gfn] Y=>X E.g. in the below scenario, CPU0 could fail to zap SPTEs, and CPU1 could fail to retry the faulting instruction, resulting in a KVM entering the guest with a stale SPTE (map PTE=X instead of PTE=Y). PTE = X; CPU0: emulator_write_phys() PTE = Y kvm_page_track_write() kvm_mmu_track_write() // memory barrier missing here if (indirect_shadow_pages) zap(); CPU1: FNAME(page_fault) FNAME(walk_addr) FNAME(walk_addr_generic) gw->pte = PTE; // X FNAME(fetch) kvm_mmu_get_child_sp kvm_mmu_get_shadow_page __kvm_mmu_get_shadow_page kvm_mmu_alloc_shadow_page account_shadowed indirect_shadow_pages++ // memory barrier missing here if (FNAME(gpte_changed)) // if (PTE == X) return RET_PF_RETRY; In practice, this bug likely cannot be observed as both the 0=>1 transition and reordering of this scope are extremely rare occurrences. Note, if the cost of the barrier (which is simply a locked ADD, see commit 450cbdd ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")), is problematic, KVM could avoid the barrier by bailing earlier if checking kvm_memslots_have_rmaps() is false. But the odds of the barrier being problematic is extremely low, *and* the odds of the extra checks being meaningfully faster overall is also low. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
1 parent aca4855 commit 226d9b8

File tree

1 file changed

+17
-3
lines changed

1 file changed

+17
-3
lines changed

arch/x86/kvm/mmu/mmu.c

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -831,6 +831,15 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
831831
gfn_t gfn;
832832

833833
kvm->arch.indirect_shadow_pages++;
834+
/*
835+
* Ensure indirect_shadow_pages is elevated prior to re-reading guest
836+
* child PTEs in FNAME(gpte_changed), i.e. guarantee either in-flight
837+
* emulated writes are visible before re-reading guest PTEs, or that
838+
* an emulated write will see the elevated count and acquire mmu_lock
839+
* to update SPTEs. Pairs with the smp_mb() in kvm_mmu_track_write().
840+
*/
841+
smp_mb();
842+
834843
gfn = sp->gfn;
835844
slots = kvm_memslots_for_spte_role(kvm, sp->role);
836845
slot = __gfn_to_memslot(slots, gfn);
@@ -5802,10 +5811,15 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
58025811
bool flush = false;
58035812

58045813
/*
5805-
* If we don't have indirect shadow pages, it means no page is
5806-
* write-protected, so we can exit simply.
5814+
* When emulating guest writes, ensure the written value is visible to
5815+
* any task that is handling page faults before checking whether or not
5816+
* KVM is shadowing a guest PTE. This ensures either KVM will create
5817+
* the correct SPTE in the page fault handler, or this task will see
5818+
* a non-zero indirect_shadow_pages. Pairs with the smp_mb() in
5819+
* account_shadowed().
58075820
*/
5808-
if (!READ_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
5821+
smp_mb();
5822+
if (!vcpu->kvm->arch.indirect_shadow_pages)
58095823
return;
58105824

58115825
write_lock(&vcpu->kvm->mmu_lock);

0 commit comments

Comments
 (0)