Skip to content

Commit 92783a9

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the PMCR_EL0 reset value after the PMU rework - Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots - Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot - Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it - Reviewer updates: Alex is stepping down, replaced by Zenghui x86: - Fix various rare locking issues in Xen emulation and teach lockdep to detect them - Documentation improvements - Do not return host topology information from KVM_GET_SUPPORTED_CPUID" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking Documentation: kvm: fix SRCU locking order docs KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID KVM: nSVM: clarify recalc_intercepts() wrt CR8 MAINTAINERS: Remove myself as a KVM/arm64 reviewer MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* KVM: arm64: Document the behaviour of S1PTW faults on RO memslots KVM: arm64: Fix S1PTW handling on RO memslots KVM: arm64: PMU: Fix PMCR_EL0 reset value
2 parents f5fe24e + 310bc39 commit 92783a9

File tree

17 files changed

+175
-115
lines changed

17 files changed

+175
-115
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1354,6 +1354,14 @@ the memory region are automatically reflected into the guest. For example, an
13541354
mmap() that affects the region will be made visible immediately. Another
13551355
example is madvise(MADV_DROP).
13561356

1357+
Note: On arm64, a write generated by the page-table walker (to update
1358+
the Access and Dirty flags, for example) never results in a
1359+
KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
1360+
is because KVM cannot provide the data that would be written by the
1361+
page-table walker, making it impossible to emulate the access.
1362+
Instead, an abort (data abort if the cause of the page-table update
1363+
was a load or a store, instruction abort if it was an instruction
1364+
fetch) is injected in the guest.
13571365

13581366
4.36 KVM_SET_TSS_ADDR
13591367
---------------------
@@ -8310,6 +8318,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``
83108318
It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
83118319
has enabled in-kernel emulation of the local APIC.
83128320

8321+
CPU topology
8322+
~~~~~~~~~~~~
8323+
8324+
Several CPUID values include topology information for the host CPU:
8325+
0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems. Different
8326+
versions of KVM return different values for this information and userspace
8327+
should not rely on it. Currently they return all zeroes.
8328+
8329+
If userspace wishes to set up a guest topology, it should be careful that
8330+
the values of these three leaves differ for each CPU. In particular,
8331+
the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX
8332+
for 0x8000001e; the latter also encodes the core id and node id in bits
8333+
7:0 of EBX and ECX respectively.
8334+
83138335
Obsolete ioctls and capabilities
83148336
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
83158337

Documentation/virt/kvm/locking.rst

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -24,21 +24,22 @@ The acquisition orders for mutexes are as follows:
2424

2525
For SRCU:
2626

27-
- ``synchronize_srcu(&kvm->srcu)`` is called _inside_
28-
the kvm->slots_lock critical section, therefore kvm->slots_lock
29-
cannot be taken inside a kvm->srcu read-side critical section.
30-
Instead, kvm->slots_arch_lock is released before the call
31-
to ``synchronize_srcu()`` and _can_ be taken inside a
32-
kvm->srcu read-side critical section.
33-
34-
- kvm->lock is taken inside kvm->srcu, therefore
35-
``synchronize_srcu(&kvm->srcu)`` cannot be called inside
36-
a kvm->lock critical section. If you cannot delay the
37-
call until after kvm->lock is released, use ``call_srcu``.
27+
- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
28+
for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_
29+
be taken inside a kvm->srcu read-side critical section; that is, the
30+
following is broken::
31+
32+
srcu_read_lock(&kvm->srcu);
33+
mutex_lock(&kvm->slots_lock);
34+
35+
- kvm->slots_arch_lock instead is released before the call to
36+
``synchronize_srcu()``. It _can_ therefore be taken inside a
37+
kvm->srcu read-side critical section, for example while processing
38+
a vmexit.
3839

3940
On x86:
4041

41-
- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
42+
- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock and kvm->arch.xen.xen_lock
4243

4344
- kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock and
4445
kvm->arch.mmu_unsync_pages_lock are taken inside kvm->arch.mmu_lock, and

MAINTAINERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11356,9 +11356,9 @@ F: virt/kvm/*
1135611356
KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
1135711357
M: Marc Zyngier <[email protected]>
1135811358
R: James Morse <[email protected]>
11359-
R: Alexandru Elisei <[email protected]>
1136011359
R: Suzuki K Poulose <[email protected]>
1136111360
R: Oliver Upton <[email protected]>
11361+
R: Zenghui Yu <[email protected]>
1136211362
L: [email protected] (moderated for non-subscribers)
1136311363
1136411364
L: [email protected] (deprecated, moderated for non-subscribers)

arch/arm64/include/asm/cputype.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,8 @@
124124
#define APPLE_CPU_PART_M1_FIRESTORM_PRO 0x025
125125
#define APPLE_CPU_PART_M1_ICESTORM_MAX 0x028
126126
#define APPLE_CPU_PART_M1_FIRESTORM_MAX 0x029
127+
#define APPLE_CPU_PART_M2_BLIZZARD 0x032
128+
#define APPLE_CPU_PART_M2_AVALANCHE 0x033
127129

128130
#define AMPERE_CPU_PART_AMPERE1 0xAC3
129131

@@ -177,6 +179,8 @@
177179
#define MIDR_APPLE_M1_FIRESTORM_PRO MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM_PRO)
178180
#define MIDR_APPLE_M1_ICESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM_MAX)
179181
#define MIDR_APPLE_M1_FIRESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM_MAX)
182+
#define MIDR_APPLE_M2_BLIZZARD MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_BLIZZARD)
183+
#define MIDR_APPLE_M2_AVALANCHE MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_AVALANCHE)
180184
#define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, AMPERE_CPU_PART_AMPERE1)
181185

182186
/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */

arch/arm64/include/asm/esr.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,15 @@
114114
#define ESR_ELx_FSC_ACCESS (0x08)
115115
#define ESR_ELx_FSC_FAULT (0x04)
116116
#define ESR_ELx_FSC_PERM (0x0C)
117+
#define ESR_ELx_FSC_SEA_TTW0 (0x14)
118+
#define ESR_ELx_FSC_SEA_TTW1 (0x15)
119+
#define ESR_ELx_FSC_SEA_TTW2 (0x16)
120+
#define ESR_ELx_FSC_SEA_TTW3 (0x17)
121+
#define ESR_ELx_FSC_SECC (0x18)
122+
#define ESR_ELx_FSC_SECC_TTW0 (0x1c)
123+
#define ESR_ELx_FSC_SECC_TTW1 (0x1d)
124+
#define ESR_ELx_FSC_SECC_TTW2 (0x1e)
125+
#define ESR_ELx_FSC_SECC_TTW3 (0x1f)
117126

118127
/* ISS field definitions for Data Aborts */
119128
#define ESR_ELx_ISV_SHIFT (24)

arch/arm64/include/asm/kvm_arm.h

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -319,21 +319,6 @@
319319
BIT(18) | \
320320
GENMASK(16, 15))
321321

322-
/* For compatibility with fault code shared with 32-bit */
323-
#define FSC_FAULT ESR_ELx_FSC_FAULT
324-
#define FSC_ACCESS ESR_ELx_FSC_ACCESS
325-
#define FSC_PERM ESR_ELx_FSC_PERM
326-
#define FSC_SEA ESR_ELx_FSC_EXTABT
327-
#define FSC_SEA_TTW0 (0x14)
328-
#define FSC_SEA_TTW1 (0x15)
329-
#define FSC_SEA_TTW2 (0x16)
330-
#define FSC_SEA_TTW3 (0x17)
331-
#define FSC_SECC (0x18)
332-
#define FSC_SECC_TTW0 (0x1c)
333-
#define FSC_SECC_TTW1 (0x1d)
334-
#define FSC_SECC_TTW2 (0x1e)
335-
#define FSC_SECC_TTW3 (0x1f)
336-
337322
/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
338323
#define HPFAR_MASK (~UL(0xf))
339324
/*

arch/arm64/include/asm/kvm_emulate.h

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -349,16 +349,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *v
349349
static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
350350
{
351351
switch (kvm_vcpu_trap_get_fault(vcpu)) {
352-
case FSC_SEA:
353-
case FSC_SEA_TTW0:
354-
case FSC_SEA_TTW1:
355-
case FSC_SEA_TTW2:
356-
case FSC_SEA_TTW3:
357-
case FSC_SECC:
358-
case FSC_SECC_TTW0:
359-
case FSC_SECC_TTW1:
360-
case FSC_SECC_TTW2:
361-
case FSC_SECC_TTW3:
352+
case ESR_ELx_FSC_EXTABT:
353+
case ESR_ELx_FSC_SEA_TTW0:
354+
case ESR_ELx_FSC_SEA_TTW1:
355+
case ESR_ELx_FSC_SEA_TTW2:
356+
case ESR_ELx_FSC_SEA_TTW3:
357+
case ESR_ELx_FSC_SECC:
358+
case ESR_ELx_FSC_SECC_TTW0:
359+
case ESR_ELx_FSC_SECC_TTW1:
360+
case ESR_ELx_FSC_SECC_TTW2:
361+
case ESR_ELx_FSC_SECC_TTW3:
362362
return true;
363363
default:
364364
return false;
@@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
373373

374374
static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
375375
{
376-
if (kvm_vcpu_abt_iss1tw(vcpu))
377-
return true;
376+
if (kvm_vcpu_abt_iss1tw(vcpu)) {
377+
/*
378+
* Only a permission fault on a S1PTW should be
379+
* considered as a write. Otherwise, page tables baked
380+
* in a read-only memslot will result in an exception
381+
* being delivered in the guest.
382+
*
383+
* The drawback is that we end-up faulting twice if the
384+
* guest is using any of HW AF/DB: a translation fault
385+
* to map the page containing the PT (read only at
386+
* first), then a permission fault to allow the flags
387+
* to be set.
388+
*/
389+
switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
390+
case ESR_ELx_FSC_PERM:
391+
return true;
392+
default:
393+
return false;
394+
}
395+
}
378396

379397
if (kvm_vcpu_trap_is_iabt(vcpu))
380398
return false;

arch/arm64/kvm/hyp/include/hyp/fault.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
6060
*/
6161
if (!(esr & ESR_ELx_S1PTW) &&
6262
(cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
63-
(esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
63+
(esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM)) {
6464
if (!__translate_far_to_hpfar(far, &hpfar))
6565
return false;
6666
} else {

arch/arm64/kvm/hyp/include/hyp/switch.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,7 @@ static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
367367
if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
368368
bool valid;
369369

370-
valid = kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT &&
370+
valid = kvm_vcpu_trap_get_fault_type(vcpu) == ESR_ELx_FSC_FAULT &&
371371
kvm_vcpu_dabt_isvalid(vcpu) &&
372372
!kvm_vcpu_abt_issea(vcpu) &&
373373
!kvm_vcpu_abt_iss1tw(vcpu);

arch/arm64/kvm/mmu.c

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1212,7 +1212,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
12121212
exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
12131213
VM_BUG_ON(write_fault && exec_fault);
12141214

1215-
if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
1215+
if (fault_status == ESR_ELx_FSC_PERM && !write_fault && !exec_fault) {
12161216
kvm_err("Unexpected L2 read permission error\n");
12171217
return -EFAULT;
12181218
}
@@ -1277,7 +1277,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
12771277
* only exception to this is when dirty logging is enabled at runtime
12781278
* and a write fault needs to collapse a block entry into a table.
12791279
*/
1280-
if (fault_status != FSC_PERM || (logging_active && write_fault)) {
1280+
if (fault_status != ESR_ELx_FSC_PERM ||
1281+
(logging_active && write_fault)) {
12811282
ret = kvm_mmu_topup_memory_cache(memcache,
12821283
kvm_mmu_cache_min_pages(kvm));
12831284
if (ret)
@@ -1342,15 +1343,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
13421343
* backed by a THP and thus use block mapping if possible.
13431344
*/
13441345
if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
1345-
if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
1346+
if (fault_status == ESR_ELx_FSC_PERM &&
1347+
fault_granule > PAGE_SIZE)
13461348
vma_pagesize = fault_granule;
13471349
else
13481350
vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
13491351
hva, &pfn,
13501352
&fault_ipa);
13511353
}
13521354

1353-
if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
1355+
if (fault_status != ESR_ELx_FSC_PERM && !device && kvm_has_mte(kvm)) {
13541356
/* Check the VMM hasn't introduced a new disallowed VMA */
13551357
if (kvm_vma_mte_allowed(vma)) {
13561358
sanitise_mte_tags(kvm, pfn, vma_pagesize);
@@ -1376,7 +1378,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
13761378
* permissions only if vma_pagesize equals fault_granule. Otherwise,
13771379
* kvm_pgtable_stage2_map() should be called to change block size.
13781380
*/
1379-
if (fault_status == FSC_PERM && vma_pagesize == fault_granule)
1381+
if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule)
13801382
ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
13811383
else
13821384
ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
@@ -1441,7 +1443,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
14411443
fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
14421444
is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
14431445

1444-
if (fault_status == FSC_FAULT) {
1446+
if (fault_status == ESR_ELx_FSC_FAULT) {
14451447
/* Beyond sanitised PARange (which is the IPA limit) */
14461448
if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
14471449
kvm_inject_size_fault(vcpu);
@@ -1476,8 +1478,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
14761478
kvm_vcpu_get_hfar(vcpu), fault_ipa);
14771479

14781480
/* Check the stage-2 fault is trans. fault or write fault */
1479-
if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
1480-
fault_status != FSC_ACCESS) {
1481+
if (fault_status != ESR_ELx_FSC_FAULT &&
1482+
fault_status != ESR_ELx_FSC_PERM &&
1483+
fault_status != ESR_ELx_FSC_ACCESS) {
14811484
kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
14821485
kvm_vcpu_trap_get_class(vcpu),
14831486
(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1539,7 +1542,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
15391542
/* Userspace should not be able to register out-of-bounds IPAs */
15401543
VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
15411544

1542-
if (fault_status == FSC_ACCESS) {
1545+
if (fault_status == ESR_ELx_FSC_ACCESS) {
15431546
handle_access_fault(vcpu, fault_ipa);
15441547
ret = 1;
15451548
goto out_unlock;

0 commit comments

Comments
 (0)