Skip to content

Commit 13ec930

Browse files
dmatlackoupton
authored andcommitted
KVM: arm64: Retry fault if vma_lookup() results become invalid
Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift. Re-order the MMU cache top-up to earlier in user_mem_abort() so that it is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid inducing spurious fault retries). This bug has existed since KVM/ARM's inception. It's unlikely that any sane userspace currently modifies VMAs in such a way as to trigger this race. And even with directed testing I was unable to reproduce it. But a sufficiently motivated host userspace might be able to exploit this race. Fixes: 94f8e64 ("KVM: ARM: Handle guest faults in KVM") Cc: [email protected] Reported-by: Sean Christopherson <[email protected]> Signed-off-by: David Matlack <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
1 parent f6da81f commit 13ec930

File tree

1 file changed

+21
-27
lines changed

1 file changed

+21
-27
lines changed

arch/arm64/kvm/mmu.c

Lines changed: 21 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1217,6 +1217,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
12171217
return -EFAULT;
12181218
}
12191219

1220+
/*
1221+
* Permission faults just need to update the existing leaf entry,
1222+
* and so normally don't require allocations from the memcache. The
1223+
* only exception to this is when dirty logging is enabled at runtime
1224+
* and a write fault needs to collapse a block entry into a table.
1225+
*/
1226+
if (fault_status != ESR_ELx_FSC_PERM ||
1227+
(logging_active && write_fault)) {
1228+
ret = kvm_mmu_topup_memory_cache(memcache,
1229+
kvm_mmu_cache_min_pages(kvm));
1230+
if (ret)
1231+
return ret;
1232+
}
1233+
12201234
/*
12211235
* Let's check if we will get back a huge page backed by hugetlbfs, or
12221236
* get block mapping for device MMIO region.
@@ -1269,37 +1283,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
12691283
fault_ipa &= ~(vma_pagesize - 1);
12701284

12711285
gfn = fault_ipa >> PAGE_SHIFT;
1272-
mmap_read_unlock(current->mm);
1273-
1274-
/*
1275-
* Permission faults just need to update the existing leaf entry,
1276-
* and so normally don't require allocations from the memcache. The
1277-
* only exception to this is when dirty logging is enabled at runtime
1278-
* and a write fault needs to collapse a block entry into a table.
1279-
*/
1280-
if (fault_status != ESR_ELx_FSC_PERM ||
1281-
(logging_active && write_fault)) {
1282-
ret = kvm_mmu_topup_memory_cache(memcache,
1283-
kvm_mmu_cache_min_pages(kvm));
1284-
if (ret)
1285-
return ret;
1286-
}
12871286

1288-
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
12891287
/*
1290-
* Ensure the read of mmu_invalidate_seq happens before we call
1291-
* gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk
1292-
* the page we just got a reference to gets unmapped before we have a
1293-
* chance to grab the mmu_lock, which ensure that if the page gets
1294-
* unmapped afterwards, the call to kvm_unmap_gfn will take it away
1295-
* from us again properly. This smp_rmb() interacts with the smp_wmb()
1296-
* in kvm_mmu_notifier_invalidate_<page|range_end>.
1288+
* Read mmu_invalidate_seq so that KVM can detect if the results of
1289+
* vma_lookup() or __gfn_to_pfn_memslot() become stale prior to
1290+
* acquiring kvm->mmu_lock.
12971291
*
1298-
* Besides, __gfn_to_pfn_memslot() instead of gfn_to_pfn_prot() is
1299-
* used to avoid unnecessary overhead introduced to locate the memory
1300-
* slot because it's always fixed even @gfn is adjusted for huge pages.
1292+
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
1293+
* with the smp_wmb() in kvm_mmu_invalidate_end().
13011294
*/
1302-
smp_rmb();
1295+
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
1296+
mmap_read_unlock(current->mm);
13031297

13041298
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
13051299
write_fault, &writable, NULL);

0 commit comments

Comments
 (0)