Skip to content

Commit 86bdf3e

Browse files
Gavin ShanMarc Zyngier
authored andcommitted
KVM: Support dirty ring in conjunction with bitmap
ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is enabled. It's conflicting with that ring-based dirty page tracking always requires a running VCPU context. Introduce a new flavor of dirty ring that requires the use of both VCPU dirty rings and a dirty bitmap. The expectation is that for non-VCPU sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to the dirty bitmap. Userspace should scan the dirty bitmap before migrating the VM to the target. Use an additional capability to advertise this behavior. The newly added capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Suggested-by: Marc Zyngier <[email protected]> Suggested-by: Peter Xu <[email protected]> Co-developed-by: Oliver Upton <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Gavin Shan <[email protected]> Acked-by: Peter Xu <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent e8a1856 commit 86bdf3e

File tree

8 files changed

+112
-17
lines changed

8 files changed

+112
-17
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
80038003
needs to kick the vcpu out of KVM_RUN using a signal. The resulting
80048004
vmexit ensures that all dirty GFNs are flushed to the dirty rings.
80058005

8006-
NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
8007-
ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
8008-
KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
8009-
KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
8010-
machine will switch to ring-buffer dirty page tracking and further
8011-
KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
8012-
80138006
NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
80148007
should be exposed by weakly ordered architecture, in order to indicate
80158008
the additional memory ordering requirements imposed on userspace when
@@ -8018,6 +8011,33 @@ Architecture with TSO-like ordering (such as x86) are allowed to
80188011
expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
80198012
to userspace.
80208013

8014+
After enabling the dirty rings, the userspace needs to detect the
8015+
capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the
8016+
ring structures can be backed by per-slot bitmaps. With this capability
8017+
advertised, it means the architecture can dirty guest pages without
8018+
vcpu/ring context, so that some of the dirty information will still be
8019+
maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
8020+
can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
8021+
hasn't been enabled, or any memslot has been existing.
8022+
8023+
Note that the bitmap here is only a backup of the ring structure. The
8024+
use of the ring and bitmap combination is only beneficial if there is
8025+
only a very small amount of memory that is dirtied out of vcpu/ring
8026+
context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
8027+
be considered.
8028+
8029+
To collect dirty bits in the backup bitmap, userspace can use the same
8030+
KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
8031+
the generation of the dirty bits is done in a single pass. Collecting
8032+
the dirty bitmap should be the very last thing that the VMM does before
8033+
considering the state as complete. VMM needs to ensure that the dirty
8034+
state is final and avoid missing dirty pages from another ioctl ordered
8035+
after the bitmap collection.
8036+
8037+
NOTE: One example of using the backup bitmap is saving arm64 vgic/its
8038+
tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
8039+
KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
8040+
80218041
8.30 KVM_CAP_XEN_HVM
80228042
--------------------
80238043

Documentation/virt/kvm/devices/arm-vgic-its.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
5252

5353
KVM_DEV_ARM_ITS_SAVE_TABLES
5454
save the ITS table data into guest RAM, at the location provisioned
55-
by the guest in corresponding registers/table entries.
55+
by the guest in corresponding registers/table entries. Should userspace
56+
require a form of dirty tracking to identify which pages are modified
57+
by the saving process, it should use a bitmap even if using another
58+
mechanism to track the memory dirtied by the vCPUs.
5659

5760
The layout of the tables in guest memory defines an ABI. The entries
5861
are laid out in little endian format as described in the last paragraph.

include/linux/kvm_dirty_ring.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
3737
return 0;
3838
}
3939

40+
static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
41+
{
42+
return true;
43+
}
44+
4045
static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
4146
int index, u32 size)
4247
{
@@ -67,6 +72,8 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
6772
#else /* CONFIG_HAVE_KVM_DIRTY_RING */
6873

6974
int kvm_cpu_dirty_log_size(void);
75+
bool kvm_use_dirty_bitmap(struct kvm *kvm);
76+
bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
7077
u32 kvm_dirty_ring_get_rsvd_entries(void);
7178
int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
7279

include/linux/kvm_host.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -779,6 +779,7 @@ struct kvm {
779779
pid_t userspace_pid;
780780
unsigned int max_halt_poll_ns;
781781
u32 dirty_ring_size;
782+
bool dirty_ring_with_bitmap;
782783
bool vm_bugged;
783784
bool vm_dead;
784785

include/uapi/linux/kvm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
11781178
#define KVM_CAP_S390_ZPCI_OP 221
11791179
#define KVM_CAP_S390_CPU_TOPOLOGY 222
11801180
#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
1181+
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
11811182

11821183
#ifdef KVM_CAP_IRQ_ROUTING
11831184

virt/kvm/Kconfig

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,12 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
3333
bool
3434
select HAVE_KVM_DIRTY_RING
3535

36+
# Allow enabling both the dirty bitmap and dirty ring. Only architectures
37+
# that need to dirty memory outside of a vCPU context should select this.
38+
config NEED_KVM_DIRTY_RING_WITH_BITMAP
39+
bool
40+
depends on HAVE_KVM_DIRTY_RING
41+
3642
config HAVE_KVM_EVENTFD
3743
bool
3844
select EVENTFD

virt/kvm/dirty_ring.c

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,20 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
2121
return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
2222
}
2323

24+
bool kvm_use_dirty_bitmap(struct kvm *kvm)
25+
{
26+
lockdep_assert_held(&kvm->slots_lock);
27+
28+
return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
29+
}
30+
31+
#ifndef CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP
32+
bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
33+
{
34+
return false;
35+
}
36+
#endif
37+
2438
static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
2539
{
2640
return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);

virt/kvm/kvm_main.c

Lines changed: 52 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
16171617
new->dirty_bitmap = NULL;
16181618
else if (old && old->dirty_bitmap)
16191619
new->dirty_bitmap = old->dirty_bitmap;
1620-
else if (!kvm->dirty_ring_size) {
1620+
else if (kvm_use_dirty_bitmap(kvm)) {
16211621
r = kvm_alloc_dirty_bitmap(new);
16221622
if (r)
16231623
return r;
@@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
20602060
unsigned long n;
20612061
unsigned long any = 0;
20622062

2063-
/* Dirty ring tracking is exclusive to dirty log tracking */
2064-
if (kvm->dirty_ring_size)
2063+
/* Dirty ring tracking may be exclusive to dirty log tracking */
2064+
if (!kvm_use_dirty_bitmap(kvm))
20652065
return -ENXIO;
20662066

20672067
*memslot = NULL;
@@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
21252125
unsigned long *dirty_bitmap_buffer;
21262126
bool flush;
21272127

2128-
/* Dirty ring tracking is exclusive to dirty log tracking */
2129-
if (kvm->dirty_ring_size)
2128+
/* Dirty ring tracking may be exclusive to dirty log tracking */
2129+
if (!kvm_use_dirty_bitmap(kvm))
21302130
return -ENXIO;
21312131

21322132
as_id = log->slot >> 16;
@@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
22372237
unsigned long *dirty_bitmap_buffer;
22382238
bool flush;
22392239

2240-
/* Dirty ring tracking is exclusive to dirty log tracking */
2241-
if (kvm->dirty_ring_size)
2240+
/* Dirty ring tracking may be exclusive to dirty log tracking */
2241+
if (!kvm_use_dirty_bitmap(kvm))
22422242
return -ENXIO;
22432243

22442244
as_id = log->slot >> 16;
@@ -3305,15 +3305,18 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
33053305
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
33063306

33073307
#ifdef CONFIG_HAVE_KVM_DIRTY_RING
3308-
if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
3308+
if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
3309+
return;
3310+
3311+
if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
33093312
return;
33103313
#endif
33113314

33123315
if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
33133316
unsigned long rel_gfn = gfn - memslot->base_gfn;
33143317
u32 slot = (memslot->as_id << 16) | memslot->id;
33153318

3316-
if (kvm->dirty_ring_size)
3319+
if (kvm->dirty_ring_size && vcpu)
33173320
kvm_dirty_ring_push(vcpu, slot, rel_gfn);
33183321
else
33193322
set_bit_le(rel_gfn, memslot->dirty_bitmap);
@@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
44824485
return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
44834486
#else
44844487
return 0;
4488+
#endif
4489+
#ifdef CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP
4490+
case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
44854491
#endif
44864492
case KVM_CAP_BINARY_STATS_FD:
44874493
case KVM_CAP_SYSTEM_EVENT_DATA:
@@ -4558,6 +4564,20 @@ int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm,
45584564
return -EINVAL;
45594565
}
45604566

4567+
static bool kvm_are_all_memslots_empty(struct kvm *kvm)
4568+
{
4569+
int i;
4570+
4571+
lockdep_assert_held(&kvm->slots_lock);
4572+
4573+
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
4574+
if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
4575+
return false;
4576+
}
4577+
4578+
return true;
4579+
}
4580+
45614581
static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
45624582
struct kvm_enable_cap *cap)
45634583
{
@@ -4588,6 +4608,29 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
45884608
return -EINVAL;
45894609

45904610
return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
4611+
case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
4612+
int r = -EINVAL;
4613+
4614+
if (!IS_ENABLED(CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP) ||
4615+
!kvm->dirty_ring_size || cap->flags)
4616+
return r;
4617+
4618+
mutex_lock(&kvm->slots_lock);
4619+
4620+
/*
4621+
* For simplicity, allow enabling ring+bitmap if and only if
4622+
* there are no memslots, e.g. to ensure all memslots allocate
4623+
* a bitmap after the capability is enabled.
4624+
*/
4625+
if (kvm_are_all_memslots_empty(kvm)) {
4626+
kvm->dirty_ring_with_bitmap = true;
4627+
r = 0;
4628+
}
4629+
4630+
mutex_unlock(&kvm->slots_lock);
4631+
4632+
return r;
4633+
}
45914634
default:
45924635
return kvm_vm_ioctl_enable_cap(kvm, cap);
45934636
}

0 commit comments

Comments
 (0)