Skip to content

Commit eb56189

Browse files
committed
KVM/arm64 updates for 6.2 - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on. - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints, stage-2 faults and access tracking. You name it, we got it, we probably broke it. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. As a side effect, this tag also drags: - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring series - A shared branch with the arm64 tree that repaints all the system registers to match the ARM ARM's naming, and resulting in interesting conflicts
2 parents 1e79a9e + 753d734 commit eb56189

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

91 files changed

+6077
-1770
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7418,8 +7418,9 @@ hibernation of the host; however the VMM needs to manually save/restore the
74187418
tags as appropriate if the VM is migrated.
74197419

74207420
When this capability is enabled all memory in memslots must be mapped as
7421-
not-shareable (no MAP_SHARED), attempts to create a memslot with a
7422-
MAP_SHARED mmap will result in an -EINVAL return.
7421+
``MAP_ANONYMOUS`` or with a RAM-based file mapping (``tmpfs``, ``memfd``),
7422+
attempts to create a memslot with an invalid mmap will result in an
7423+
-EINVAL return.
74237424

74247425
When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
74257426
perform a bulk copy of tags to/from the guest.
@@ -7954,7 +7955,7 @@ regardless of what has actually been exposed through the CPUID leaf.
79547955
8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
79557956
----------------------------------------------------------
79567957

7957-
:Architectures: x86
7958+
:Architectures: x86, arm64
79587959
:Parameters: args[0] - size of the dirty log ring
79597960

79607961
KVM is capable of tracking dirty memory using ring buffers that are
@@ -8036,13 +8037,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
80368037
needs to kick the vcpu out of KVM_RUN using a signal. The resulting
80378038
vmexit ensures that all dirty GFNs are flushed to the dirty rings.
80388039

8039-
NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
8040-
ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
8041-
KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
8042-
KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
8043-
machine will switch to ring-buffer dirty page tracking and further
8044-
KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
8045-
80468040
NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
80478041
should be exposed by weakly ordered architecture, in order to indicate
80488042
the additional memory ordering requirements imposed on userspace when
@@ -8051,6 +8045,33 @@ Architecture with TSO-like ordering (such as x86) are allowed to
80518045
expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
80528046
to userspace.
80538047

8048+
After enabling the dirty rings, the userspace needs to detect the
8049+
capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the
8050+
ring structures can be backed by per-slot bitmaps. With this capability
8051+
advertised, it means the architecture can dirty guest pages without
8052+
vcpu/ring context, so that some of the dirty information will still be
8053+
maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
8054+
can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
8055+
hasn't been enabled, or any memslot has been existing.
8056+
8057+
Note that the bitmap here is only a backup of the ring structure. The
8058+
use of the ring and bitmap combination is only beneficial if there is
8059+
only a very small amount of memory that is dirtied out of vcpu/ring
8060+
context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
8061+
be considered.
8062+
8063+
To collect dirty bits in the backup bitmap, userspace can use the same
8064+
KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
8065+
the generation of the dirty bits is done in a single pass. Collecting
8066+
the dirty bitmap should be the very last thing that the VMM does before
8067+
considering the state as complete. VMM needs to ensure that the dirty
8068+
state is final and avoid missing dirty pages from another ioctl ordered
8069+
after the bitmap collection.
8070+
8071+
NOTE: One example of using the backup bitmap is saving arm64 vgic/its
8072+
tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
8073+
KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
8074+
80548075
8.30 KVM_CAP_XEN_HVM
80558076
--------------------
80568077

Documentation/virt/kvm/arm/pvtime.rst

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,21 +23,23 @@ the PV_TIME_FEATURES hypercall should be probed using the SMCCC 1.1
2323
ARCH_FEATURES mechanism before calling it.
2424

2525
PV_TIME_FEATURES
26-
============= ======== ==========
26+
27+
============= ======== =================================================
2728
Function ID: (uint32) 0xC5000020
2829
PV_call_id: (uint32) The function to query for support.
2930
Currently only PV_TIME_ST is supported.
3031
Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
3132
PV-time feature is supported by the hypervisor.
32-
============= ======== ==========
33+
============= ======== =================================================
3334

3435
PV_TIME_ST
35-
============= ======== ==========
36+
37+
============= ======== ==============================================
3638
Function ID: (uint32) 0xC5000021
3739
Return value: (int64) IPA of the stolen time data structure for this
3840
VCPU. On failure:
3941
NOT_SUPPORTED (-1)
40-
============= ======== ==========
42+
============= ======== ==============================================
4143

4244
The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
4345
with inner and outer write back caching attributes, in the inner shareable
@@ -76,5 +78,5 @@ It is advisable that one or more 64k pages are set aside for the purpose of
7678
these structures and not used for other purposes, this enables the guest to map
7779
the region using 64k pages and avoids conflicting attributes with other memory.
7880

79-
For the user space interface see Documentation/virt/kvm/devices/vcpu.rst
80-
section "3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL".
81+
For the user space interface see
82+
:ref:`Documentation/virt/kvm/devices/vcpu.rst <kvm_arm_vcpu_pvtime_ctrl>`.

Documentation/virt/kvm/devices/arm-vgic-its.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
5252

5353
KVM_DEV_ARM_ITS_SAVE_TABLES
5454
save the ITS table data into guest RAM, at the location provisioned
55-
by the guest in corresponding registers/table entries.
55+
by the guest in corresponding registers/table entries. Should userspace
56+
require a form of dirty tracking to identify which pages are modified
57+
by the saving process, it should use a bitmap even if using another
58+
mechanism to track the memory dirtied by the vCPUs.
5659

5760
The layout of the tables in guest memory defines an ABI. The entries
5861
are laid out in little endian format as described in the last paragraph.

Documentation/virt/kvm/devices/vcpu.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,8 @@ configured values on other VCPUs. Userspace should configure the interrupt
171171
numbers on at least one VCPU after creating all VCPUs and before running any
172172
VCPUs.
173173

174+
.. _kvm_arm_vcpu_pvtime_ctrl:
175+
174176
3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
175177
==================================
176178

arch/arm64/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1965,6 +1965,7 @@ config ARM64_MTE
19651965
depends on ARM64_PAN
19661966
select ARCH_HAS_SUBPAGE_FAULTS
19671967
select ARCH_USES_HIGH_VMA_FLAGS
1968+
select ARCH_USES_PG_ARCH_X
19681969
help
19691970
Memory Tagging (part of the ARMv8.5 Extensions) provides
19701971
architectural support for run-time, always-on detection of

arch/arm64/include/asm/kvm_arm.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@
135135
* 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are
136136
* not known to exist and will break with this configuration.
137137
*
138-
* The VTCR_EL2 is configured per VM and is initialised in kvm_arm_setup_stage2().
138+
* The VTCR_EL2 is configured per VM and is initialised in kvm_init_stage2_mmu.
139139
*
140140
* Note that when using 4K pages, we concatenate two first level page tables
141141
* together. With 16K pages, we concatenate 16 first level page tables.
@@ -340,9 +340,13 @@
340340
* We have
341341
* PAR [PA_Shift - 1 : 12] = PA [PA_Shift - 1 : 12]
342342
* HPFAR [PA_Shift - 9 : 4] = FIPA [PA_Shift - 1 : 12]
343+
*
344+
* Always assume 52 bit PA since at this point, we don't know how many PA bits
345+
* the page table has been set up for. This should be safe since unused address
346+
* bits in PAR are res0.
343347
*/
344348
#define PAR_TO_HPFAR(par) \
345-
(((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)
349+
(((par) & GENMASK_ULL(52 - 1, 12)) >> 8)
346350

347351
#define ECN(x) { ESR_ELx_EC_##x, #x }
348352

arch/arm64/include/asm/kvm_asm.h

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,9 @@ enum __kvm_host_smccc_func {
7676
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
7777
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
7878
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
79+
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
80+
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
81+
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
7982
};
8083

8184
#define DECLARE_KVM_VHE_SYM(sym) extern char sym[]
@@ -106,7 +109,7 @@ enum __kvm_host_smccc_func {
106109
#define per_cpu_ptr_nvhe_sym(sym, cpu) \
107110
({ \
108111
unsigned long base, off; \
109-
base = kvm_arm_hyp_percpu_base[cpu]; \
112+
base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu]; \
110113
off = (unsigned long)&CHOOSE_NVHE_SYM(sym) - \
111114
(unsigned long)&CHOOSE_NVHE_SYM(__per_cpu_start); \
112115
base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL; \
@@ -211,7 +214,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
211214
#define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
212215
#define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)
213216

214-
extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
217+
extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
215218
DECLARE_KVM_NVHE_SYM(__per_cpu_start);
216219
DECLARE_KVM_NVHE_SYM(__per_cpu_end);
217220

arch/arm64/include/asm/kvm_host.h

Lines changed: 74 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,63 @@ u32 __attribute_const__ kvm_target_cpu(void);
7373
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
7474
void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
7575

76+
struct kvm_hyp_memcache {
77+
phys_addr_t head;
78+
unsigned long nr_pages;
79+
};
80+
81+
static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
82+
phys_addr_t *p,
83+
phys_addr_t (*to_pa)(void *virt))
84+
{
85+
*p = mc->head;
86+
mc->head = to_pa(p);
87+
mc->nr_pages++;
88+
}
89+
90+
static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
91+
void *(*to_va)(phys_addr_t phys))
92+
{
93+
phys_addr_t *p = to_va(mc->head);
94+
95+
if (!mc->nr_pages)
96+
return NULL;
97+
98+
mc->head = *p;
99+
mc->nr_pages--;
100+
101+
return p;
102+
}
103+
104+
static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
105+
unsigned long min_pages,
106+
void *(*alloc_fn)(void *arg),
107+
phys_addr_t (*to_pa)(void *virt),
108+
void *arg)
109+
{
110+
while (mc->nr_pages < min_pages) {
111+
phys_addr_t *p = alloc_fn(arg);
112+
113+
if (!p)
114+
return -ENOMEM;
115+
push_hyp_memcache(mc, p, to_pa);
116+
}
117+
118+
return 0;
119+
}
120+
121+
static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
122+
void (*free_fn)(void *virt, void *arg),
123+
void *(*to_va)(phys_addr_t phys),
124+
void *arg)
125+
{
126+
while (mc->nr_pages)
127+
free_fn(pop_hyp_memcache(mc, to_va), arg);
128+
}
129+
130+
void free_hyp_memcache(struct kvm_hyp_memcache *mc);
131+
int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);
132+
76133
struct kvm_vmid {
77134
atomic64_t id;
78135
};
@@ -115,6 +172,13 @@ struct kvm_smccc_features {
115172
unsigned long vendor_hyp_bmap;
116173
};
117174

175+
typedef unsigned int pkvm_handle_t;
176+
177+
struct kvm_protected_vm {
178+
pkvm_handle_t handle;
179+
struct kvm_hyp_memcache teardown_mc;
180+
};
181+
118182
struct kvm_arch {
119183
struct kvm_s2_mmu mmu;
120184

@@ -163,9 +227,19 @@ struct kvm_arch {
163227

164228
u8 pfr0_csv2;
165229
u8 pfr0_csv3;
230+
struct {
231+
u8 imp:4;
232+
u8 unimp:4;
233+
} dfr0_pmuver;
166234

167235
/* Hypercall features firmware registers' descriptor */
168236
struct kvm_smccc_features smccc_feat;
237+
238+
/*
239+
* For an untrusted host VM, 'pkvm.handle' is used to lookup
240+
* the associated pKVM instance in the hypervisor.
241+
*/
242+
struct kvm_protected_vm pkvm;
169243
};
170244

171245
struct kvm_vcpu_fault_info {
@@ -915,8 +989,6 @@ int kvm_set_ipa_limit(void);
915989
#define __KVM_HAVE_ARCH_VM_ALLOC
916990
struct kvm *kvm_arch_alloc_vm(void);
917991

918-
int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
919-
920992
static inline bool kvm_vm_is_protected(struct kvm *kvm)
921993
{
922994
return false;

arch/arm64/include/asm/kvm_hyp.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,4 +123,7 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
123123
extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
124124
extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
125125

126+
extern unsigned long kvm_nvhe_sym(__icache_flags);
127+
extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
128+
126129
#endif /* __ARM64_KVM_HYP_H__ */

arch/arm64/include/asm/kvm_mmu.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
166166
void free_hyp_pgds(void);
167167

168168
void stage2_unmap_vm(struct kvm *kvm);
169-
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
169+
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
170170
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
171171
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
172172
phys_addr_t pa, unsigned long size, bool writable);

0 commit comments

Comments
 (0)