Skip to content

Commit c7fa848

Browse files
npigginmpe
authored andcommitted
KVM: PPC: Book3S HV P9: Fix "lost kick" race
When new work is created that requires attention from the hypervisor (e.g., to inject an interrupt into the guest), fast_vcpu_kick is used to pull the target vcpu out of the guest if it may have been running. Therefore the work creation side looks like this: vcpu->arch.doorbell_request = 1; kvmppc_fast_vcpu_kick_hv(vcpu) { smp_mb(); cpu = vcpu->cpu; if (cpu != -1) send_ipi(cpu); } And the guest entry side *should* look like this: vcpu->cpu = smp_processor_id(); smp_mb(); if (vcpu->arch.doorbell_request) { // do something (abort entry or inject doorbell etc) } But currently the store and load are flipped, so it is possible for the entry to see no doorbell pending, and the doorbell creation misses the store to set cpu, resulting lost work (or at least delayed until the next guest exit). Fix this by reordering the entry operations and adding a smp_mb between them. The P8 path appears to have a similar race which is commented but not addressed yet. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent e40b38a commit c7fa848

File tree

1 file changed

+33
-8
lines changed

1 file changed

+33
-8
lines changed

arch/powerpc/kvm/book3s_hv.c

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,13 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
225225
int cpu;
226226
struct rcuwait *waitp;
227227

228+
/*
229+
* rcuwait_wake_up contains smp_mb() which orders prior stores that
230+
* create pending work vs below loads of cpu fields. The other side
231+
* is the barrier in vcpu run that orders setting the cpu fields vs
232+
* testing for pending work.
233+
*/
234+
228235
waitp = kvm_arch_vcpu_get_wait(vcpu);
229236
if (rcuwait_wake_up(waitp))
230237
++vcpu->stat.generic.halt_wakeup;
@@ -1089,7 +1096,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
10891096
break;
10901097
}
10911098
tvcpu->arch.prodded = 1;
1092-
smp_mb();
1099+
smp_mb(); /* This orders prodded store vs ceded load */
10931100
if (tvcpu->arch.ceded)
10941101
kvmppc_fast_vcpu_kick_hv(tvcpu);
10951102
break;
@@ -3766,6 +3773,14 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
37663773
pvc = core_info.vc[sub];
37673774
pvc->pcpu = pcpu + thr;
37683775
for_each_runnable_thread(i, vcpu, pvc) {
3776+
/*
3777+
* XXX: is kvmppc_start_thread called too late here?
3778+
* It updates vcpu->cpu and vcpu->arch.thread_cpu
3779+
* which are used by kvmppc_fast_vcpu_kick_hv(), but
3780+
* kick is called after new exceptions become available
3781+
* and exceptions are checked earlier than here, by
3782+
* kvmppc_core_prepare_to_enter.
3783+
*/
37693784
kvmppc_start_thread(vcpu, pvc);
37703785
kvmppc_create_dtl_entry(vcpu, pvc);
37713786
trace_kvm_guest_enter(vcpu);
@@ -4487,6 +4502,21 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
44874502
if (need_resched() || !kvm->arch.mmu_ready)
44884503
goto out;
44894504

4505+
vcpu->cpu = pcpu;
4506+
vcpu->arch.thread_cpu = pcpu;
4507+
vc->pcpu = pcpu;
4508+
local_paca->kvm_hstate.kvm_vcpu = vcpu;
4509+
local_paca->kvm_hstate.ptid = 0;
4510+
local_paca->kvm_hstate.fake_suspend = 0;
4511+
4512+
/*
4513+
* Orders set cpu/thread_cpu vs testing for pending interrupts and
4514+
* doorbells below. The other side is when these fields are set vs
4515+
* kvmppc_fast_vcpu_kick_hv reading the cpu/thread_cpu fields to
4516+
* kick a vCPU to notice the pending interrupt.
4517+
*/
4518+
smp_mb();
4519+
44904520
if (!nested) {
44914521
kvmppc_core_prepare_to_enter(vcpu);
44924522
if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
@@ -4506,13 +4536,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
45064536

45074537
tb = mftb();
45084538

4509-
vcpu->cpu = pcpu;
4510-
vcpu->arch.thread_cpu = pcpu;
4511-
vc->pcpu = pcpu;
4512-
local_paca->kvm_hstate.kvm_vcpu = vcpu;
4513-
local_paca->kvm_hstate.ptid = 0;
4514-
local_paca->kvm_hstate.fake_suspend = 0;
4515-
45164539
__kvmppc_create_dtl_entry(vcpu, pcpu, tb + vc->tb_offset, 0);
45174540

45184541
trace_kvm_guest_enter(vcpu);
@@ -4614,6 +4637,8 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
46144637
run->exit_reason = KVM_EXIT_INTR;
46154638
vcpu->arch.ret = -EINTR;
46164639
out:
4640+
vcpu->cpu = -1;
4641+
vcpu->arch.thread_cpu = -1;
46174642
powerpc_local_irq_pmu_restore(flags);
46184643
preempt_enable();
46194644
goto done;

0 commit comments

Comments
 (0)