Skip to content

Commit 7f9039c

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini: Generic: - Clean up locking of all vCPUs for a VM by using the *_nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra - Add MGLRU support to the access tracking perf test ARM fixes: - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet s390: - Fix interaction between some filesystems and Secure Execution - Some cleanups and refactorings, preparing for an upcoming big series x86: - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM - Refine and harden handling of spurious faults - Add support for ALLOWED_SEV_FEATURES - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits - Don't account temporary allocations in sev_send_update_data() - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX - Advertise support to userspace for WRMSRNS and PREFETCHI - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing - Add a module param to control and enumerate support for device posted interrupts - Fix a potential overflow with nested virt on Intel systems running 32-bit kernels - Flush shadow VMCSes on emergency reboot - Add support for SNP to the various SEV selftests - Add a selftest to verify fastops instructions via forced emulation - Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) rtmutex_api: provide correct extern functions KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock KVM: arm64: Use lock guard in vgic_v4_set_forwarding() KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue KVM: s390: Simplify and move pv code KVM: s390: Refactor and split some gmap helpers KVM: s390: Remove unneeded srcu lock s390: Remove unneeded includes s390/uv: Improve splitting of large folios that cannot be split while dirty s390/uv: Always return 0 from s390_wiggle_split_folio() if successful s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful rust: add helper for mutex_trylock RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation ...
2 parents df7b9b4 + 61374cc commit 7f9039c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+2884
-1243
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8001,6 +8001,11 @@ apply some other policy-based mitigation. When exiting to userspace, KVM sets
80018001
KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason
80028002
to KVM_EXIT_X86_BUS_LOCK.
80038003

8004+
Due to differences in the underlying hardware implementation, the vCPU's RIP at
8005+
the time of exit diverges between Intel and AMD. On Intel hosts, RIP points at
8006+
the next instruction, i.e. the exit is trap-like. On AMD hosts, RIP points at
8007+
the offending instruction, i.e. the exit is fault-like.
8008+
80048009
Note! Detected bus locks may be coincident with other exits to userspace, i.e.
80058010
KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if
80068011
userspace wants to take action on all detected bus locks.

MAINTAINERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13227,12 +13227,14 @@ S: Supported
1322713227
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
1322813228
F: Documentation/virt/kvm/s390*
1322913229
F: arch/s390/include/asm/gmap.h
13230+
F: arch/s390/include/asm/gmap_helpers.h
1323013231
F: arch/s390/include/asm/kvm*
1323113232
F: arch/s390/include/uapi/asm/kvm*
1323213233
F: arch/s390/include/uapi/asm/uvdevice.h
1323313234
F: arch/s390/kernel/uv.c
1323413235
F: arch/s390/kvm/
1323513236
F: arch/s390/mm/gmap.c
13237+
F: arch/s390/mm/gmap_helpers.c
1323613238
F: drivers/s390/char/uvdevice.c
1323713239
F: tools/testing/selftests/drivers/s390x/uvdevice/
1323813240
F: tools/testing/selftests/kvm/*/s390/

arch/arm64/include/asm/kvm_host.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1320,9 +1320,6 @@ int __init populate_sysreg_config(const struct sys_reg_desc *sr,
13201320
unsigned int idx);
13211321
int __init populate_nv_trap_config(void);
13221322

1323-
bool lock_all_vcpus(struct kvm *kvm);
1324-
void unlock_all_vcpus(struct kvm *kvm);
1325-
13261323
void kvm_calculate_traps(struct kvm_vcpu *vcpu);
13271324

13281325
/* MMIO helpers */

arch/arm64/include/asm/sysreg.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
#include <linux/bits.h>
1313
#include <linux/stringify.h>
1414
#include <linux/kasan-tags.h>
15+
#include <linux/kconfig.h>
1516

1617
#include <asm/gpr-num.h>
1718

arch/arm64/kvm/arch_timer.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1766,7 +1766,7 @@ int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
17661766

17671767
mutex_lock(&kvm->lock);
17681768

1769-
if (lock_all_vcpus(kvm)) {
1769+
if (!kvm_trylock_all_vcpus(kvm)) {
17701770
set_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &kvm->arch.flags);
17711771

17721772
/*
@@ -1778,7 +1778,7 @@ int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
17781778
kvm->arch.timer_data.voffset = offset->counter_offset;
17791779
kvm->arch.timer_data.poffset = offset->counter_offset;
17801780

1781-
unlock_all_vcpus(kvm);
1781+
kvm_unlock_all_vcpus(kvm);
17821782
} else {
17831783
ret = -EBUSY;
17841784
}

arch/arm64/kvm/arm.c

Lines changed: 24 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1924,49 +1924,6 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
19241924
}
19251925
}
19261926

1927-
/* unlocks vcpus from @vcpu_lock_idx and smaller */
1928-
static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
1929-
{
1930-
struct kvm_vcpu *tmp_vcpu;
1931-
1932-
for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
1933-
tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
1934-
mutex_unlock(&tmp_vcpu->mutex);
1935-
}
1936-
}
1937-
1938-
void unlock_all_vcpus(struct kvm *kvm)
1939-
{
1940-
lockdep_assert_held(&kvm->lock);
1941-
1942-
unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
1943-
}
1944-
1945-
/* Returns true if all vcpus were locked, false otherwise */
1946-
bool lock_all_vcpus(struct kvm *kvm)
1947-
{
1948-
struct kvm_vcpu *tmp_vcpu;
1949-
unsigned long c;
1950-
1951-
lockdep_assert_held(&kvm->lock);
1952-
1953-
/*
1954-
* Any time a vcpu is in an ioctl (including running), the
1955-
* core KVM code tries to grab the vcpu->mutex.
1956-
*
1957-
* By grabbing the vcpu->mutex of all VCPUs we ensure that no
1958-
* other VCPUs can fiddle with the state while we access it.
1959-
*/
1960-
kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
1961-
if (!mutex_trylock(&tmp_vcpu->mutex)) {
1962-
unlock_vcpus(kvm, c - 1);
1963-
return false;
1964-
}
1965-
}
1966-
1967-
return true;
1968-
}
1969-
19701927
static unsigned long nvhe_percpu_size(void)
19711928
{
19721929
return (unsigned long)CHOOSE_NVHE_SYM(__per_cpu_end) -
@@ -2790,6 +2747,7 @@ int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
27902747
return kvm_vgic_v4_set_forwarding(irqfd->kvm, prod->irq,
27912748
&irqfd->irq_entry);
27922749
}
2750+
27932751
void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
27942752
struct irq_bypass_producer *prod)
27952753
{
@@ -2800,8 +2758,29 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
28002758
if (irq_entry->type != KVM_IRQ_ROUTING_MSI)
28012759
return;
28022760

2803-
kvm_vgic_v4_unset_forwarding(irqfd->kvm, prod->irq,
2804-
&irqfd->irq_entry);
2761+
kvm_vgic_v4_unset_forwarding(irqfd->kvm, prod->irq);
2762+
}
2763+
2764+
bool kvm_arch_irqfd_route_changed(struct kvm_kernel_irq_routing_entry *old,
2765+
struct kvm_kernel_irq_routing_entry *new)
2766+
{
2767+
if (new->type != KVM_IRQ_ROUTING_MSI)
2768+
return true;
2769+
2770+
return memcmp(&old->msi, &new->msi, sizeof(new->msi));
2771+
}
2772+
2773+
int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
2774+
uint32_t guest_irq, bool set)
2775+
{
2776+
/*
2777+
* Remapping the vLPI requires taking the its_lock mutex to resolve
2778+
* the new translation. We're in spinlock land at this point, so no
2779+
* chance of resolving the translation.
2780+
*
2781+
* Unmap the vLPI and fall back to software LPI injection.
2782+
*/
2783+
return kvm_vgic_v4_unset_forwarding(kvm, host_irq);
28052784
}
28062785

28072786
void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons)

arch/arm64/kvm/nested.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -918,6 +918,8 @@ static void invalidate_vncr_va(struct kvm *kvm,
918918
}
919919
}
920920

921+
#define tlbi_va_s1_to_va(v) (u64)sign_extend64((v) << 12, 48)
922+
921923
static void compute_s1_tlbi_range(struct kvm_vcpu *vcpu, u32 inst, u64 val,
922924
struct s1e2_tlbi_scope *scope)
923925
{
@@ -964,7 +966,7 @@ static void compute_s1_tlbi_range(struct kvm_vcpu *vcpu, u32 inst, u64 val,
964966
scope->size = ttl_to_size(FIELD_GET(TLBI_TTL_MASK, val));
965967
if (!scope->size)
966968
scope->size = SZ_1G;
967-
scope->va = (val << 12) & ~(scope->size - 1);
969+
scope->va = tlbi_va_s1_to_va(val) & ~(scope->size - 1);
968970
scope->asid = FIELD_GET(TLBIR_ASID_MASK, val);
969971
break;
970972
case OP_TLBI_ASIDE1:
@@ -992,7 +994,7 @@ static void compute_s1_tlbi_range(struct kvm_vcpu *vcpu, u32 inst, u64 val,
992994
scope->size = ttl_to_size(FIELD_GET(TLBI_TTL_MASK, val));
993995
if (!scope->size)
994996
scope->size = SZ_1G;
995-
scope->va = (val << 12) & ~(scope->size - 1);
997+
scope->va = tlbi_va_s1_to_va(val) & ~(scope->size - 1);
996998
break;
997999
case OP_TLBI_RVAE2:
9981000
case OP_TLBI_RVAE2IS:

arch/arm64/kvm/vgic/vgic-debug.c

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -490,6 +490,9 @@ static int vgic_its_debug_show(struct seq_file *s, void *v)
490490
struct its_device *dev = iter->dev;
491491
struct its_ite *ite = iter->ite;
492492

493+
if (!ite)
494+
return 0;
495+
493496
if (list_is_first(&ite->ite_list, &dev->itt_head)) {
494497
seq_printf(s, "\n");
495498
seq_printf(s, "Device ID: 0x%x, Event ID Range: [0 - %llu]\n",
@@ -498,7 +501,7 @@ static int vgic_its_debug_show(struct seq_file *s, void *v)
498501
seq_printf(s, "-----------------------------------------------\n");
499502
}
500503

501-
if (ite && ite->irq && ite->collection) {
504+
if (ite->irq && ite->collection) {
502505
seq_printf(s, "%8u %8u %8u %8u %8u %2d\n",
503506
ite->event_id, ite->irq->intid, ite->irq->hwintid,
504507
ite->collection->target_addr,

arch/arm64/kvm/vgic/vgic-init.c

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,15 +84,40 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
8484
!kvm_vgic_global_state.can_emulate_gicv2)
8585
return -ENODEV;
8686

87-
/* Must be held to avoid race with vCPU creation */
87+
/*
88+
* Ensure mutual exclusion with vCPU creation and any vCPU ioctls by:
89+
*
90+
* - Holding kvm->lock to prevent KVM_CREATE_VCPU from reaching
91+
* kvm_arch_vcpu_precreate() and ensuring created_vcpus is stable.
92+
* This alone is insufficient, as kvm_vm_ioctl_create_vcpu() drops
93+
* the kvm->lock before completing the vCPU creation.
94+
*/
8895
lockdep_assert_held(&kvm->lock);
8996

97+
/*
98+
* - Acquiring the vCPU mutex for every *online* vCPU to prevent
99+
* concurrent vCPU ioctls for vCPUs already visible to userspace.
100+
*/
90101
ret = -EBUSY;
91-
if (!lock_all_vcpus(kvm))
102+
if (kvm_trylock_all_vcpus(kvm))
92103
return ret;
93104

105+
/*
106+
* - Taking the config_lock which protects VGIC data structures such
107+
* as the per-vCPU arrays of private IRQs (SGIs, PPIs).
108+
*/
94109
mutex_lock(&kvm->arch.config_lock);
95110

111+
/*
112+
* - Bailing on the entire thing if a vCPU is in the middle of creation,
113+
* dropped the kvm->lock, but hasn't reached kvm_arch_vcpu_create().
114+
*
115+
* The whole combination of this guarantees that no vCPU can get into
116+
* KVM with a VGIC configuration inconsistent with the VM's VGIC.
117+
*/
118+
if (kvm->created_vcpus != atomic_read(&kvm->online_vcpus))
119+
goto out_unlock;
120+
96121
if (irqchip_in_kernel(kvm)) {
97122
ret = -EEXIST;
98123
goto out_unlock;
@@ -142,7 +167,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
142167

143168
out_unlock:
144169
mutex_unlock(&kvm->arch.config_lock);
145-
unlock_all_vcpus(kvm);
170+
kvm_unlock_all_vcpus(kvm);
146171
return ret;
147172
}
148173

arch/arm64/kvm/vgic/vgic-its.c

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -306,39 +306,34 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
306306
}
307307
}
308308

309-
raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
310-
311309
if (irq->hw)
312-
return its_prop_update_vlpi(irq->host_irq, prop, needs_inv);
310+
ret = its_prop_update_vlpi(irq->host_irq, prop, needs_inv);
313311

314-
return 0;
312+
raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
313+
return ret;
315314
}
316315

317316
static int update_affinity(struct vgic_irq *irq, struct kvm_vcpu *vcpu)
318317
{
319-
int ret = 0;
320-
unsigned long flags;
318+
struct its_vlpi_map map;
319+
int ret;
321320

322-
raw_spin_lock_irqsave(&irq->irq_lock, flags);
321+
guard(raw_spinlock_irqsave)(&irq->irq_lock);
323322
irq->target_vcpu = vcpu;
324-
raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
325323

326-
if (irq->hw) {
327-
struct its_vlpi_map map;
328-
329-
ret = its_get_vlpi(irq->host_irq, &map);
330-
if (ret)
331-
return ret;
324+
if (!irq->hw)
325+
return 0;
332326

333-
if (map.vpe)
334-
atomic_dec(&map.vpe->vlpi_count);
335-
map.vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
336-
atomic_inc(&map.vpe->vlpi_count);
327+
ret = its_get_vlpi(irq->host_irq, &map);
328+
if (ret)
329+
return ret;
337330

338-
ret = its_map_vlpi(irq->host_irq, &map);
339-
}
331+
if (map.vpe)
332+
atomic_dec(&map.vpe->vlpi_count);
340333

341-
return ret;
334+
map.vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
335+
atomic_inc(&map.vpe->vlpi_count);
336+
return its_map_vlpi(irq->host_irq, &map);
342337
}
343338

344339
static struct kvm_vcpu *collection_to_vcpu(struct kvm *kvm,
@@ -756,12 +751,17 @@ int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
756751
/* Requires the its_lock to be held. */
757752
static void its_free_ite(struct kvm *kvm, struct its_ite *ite)
758753
{
754+
struct vgic_irq *irq = ite->irq;
759755
list_del(&ite->ite_list);
760756

761757
/* This put matches the get in vgic_add_lpi. */
762-
if (ite->irq) {
763-
if (ite->irq->hw)
764-
WARN_ON(its_unmap_vlpi(ite->irq->host_irq));
758+
if (irq) {
759+
scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
760+
if (irq->hw)
761+
WARN_ON(its_unmap_vlpi(ite->irq->host_irq));
762+
763+
irq->hw = false;
764+
}
765765

766766
vgic_put_irq(kvm, ite->irq);
767767
}
@@ -1971,7 +1971,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
19711971

19721972
mutex_lock(&dev->kvm->lock);
19731973

1974-
if (!lock_all_vcpus(dev->kvm)) {
1974+
if (kvm_trylock_all_vcpus(dev->kvm)) {
19751975
mutex_unlock(&dev->kvm->lock);
19761976
return -EBUSY;
19771977
}
@@ -2006,7 +2006,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
20062006
}
20072007
out:
20082008
mutex_unlock(&dev->kvm->arch.config_lock);
2009-
unlock_all_vcpus(dev->kvm);
2009+
kvm_unlock_all_vcpus(dev->kvm);
20102010
mutex_unlock(&dev->kvm->lock);
20112011
return ret;
20122012
}
@@ -2676,7 +2676,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
26762676

26772677
mutex_lock(&kvm->lock);
26782678

2679-
if (!lock_all_vcpus(kvm)) {
2679+
if (kvm_trylock_all_vcpus(kvm)) {
26802680
mutex_unlock(&kvm->lock);
26812681
return -EBUSY;
26822682
}
@@ -2698,7 +2698,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
26982698

26992699
mutex_unlock(&its->its_lock);
27002700
mutex_unlock(&kvm->arch.config_lock);
2701-
unlock_all_vcpus(kvm);
2701+
kvm_unlock_all_vcpus(kvm);
27022702
mutex_unlock(&kvm->lock);
27032703
return ret;
27042704
}

0 commit comments

Comments
 (0)