Skip to content

Commit fb802ac

Browse files
committed
ppc/spapr: Fix RTAS stopped state
This change takes the CPUPPCState 'quiesced' field added for powernv hardware CPU core controls (used to stop and start cores), and extends it to spapr to model the "RTAS stopped" state. This prevents the schedulers attempting to run stopped CPUs unexpectedly, which can cause hangs and possibly other unexpected behaviour. The detail of the problematic situation is this: A KVM spapr guest boots with all secondary CPUs defined to be in the "RTAS stopped" state. In this state, the CPU is only responsive to the start-cpu RTAS call. This behaviour is modeled in QEMU with the start_powered_off feature, which sets ->halted on secondary CPUs at boot. ->halted=true looks like an idle / sleep / power-save state which typically is responsive to asynchronous interrupts, but spapr clears wake-on-interrupt bits in the LPCR SPR. This more-or-less works. Commit e8291ec ("target/ppc: fix timebase register reset state") recently caused the decrementer to expire sooner at boot, causing a decrementer exception on secondary CPUs in RTAS stopped state. This was not a problem on TCG, but KVM limits how a guest can modify LPCR, in particular it prevents the clearing of wake-on-interrupt bits, and so in the course of CPU register synchronisation, the LPCR as set by spapr to model the RTAS stopped state is overwritten with KVM's LPCR value, and that then causes QEMU's interrupt code to notice the expired decrementer exception, turn that into an interrupt, and set CPU_INTERRUPT_HARD. That causes the CPU to be kicked, and the KVM vCPU thread to loop calling kvm_cpu_exec(). kvm_cpu_exec() calls kvm_arch_process_async_events(), which on ppc just returns ->halted. This is still true, so it returns immediately with EXCP_HLT, and the vCPU never goes to sleep because qemu_wait_io_event() sees CPU_INTERRUPT_HARD is set. All this while the vCPU holds the bql. This causes the boot CPU to eventually lock up when it needs the bql. So make 'quiesced' represent the "RTAS stopped" state, and have it explicitly not respond to exceptions (interrupt conditions) rather than rely on machine register state to model that state. This matches the powernv quiesced state very well because it essentially turns off the CPU core via a side-band control unit. There are still issues with QEMU and KVM idea of LPCR diverging and that is quite ugly and fragile that should be fixed. spapr should synchronize its LPCR properly with KVM, and not try to use values that KVM does not support. Reported-by: Misbah Anjum N <[email protected]> Tested-by: Misbah Anjum N <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]>
1 parent 1dae461 commit fb802ac

File tree

5 files changed

+30
-2
lines changed

5 files changed

+30
-2
lines changed

hw/ppc/pnv_core.c

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -248,21 +248,25 @@ static void pnv_core_power10_xscom_write(void *opaque, hwaddr addr,
248248

249249
if (val & PPC_BIT(7 + 8 * i)) { /* stop */
250250
val &= ~PPC_BIT(7 + 8 * i);
251-
cpu_pause(cs);
252251
env->quiesced = true;
252+
ppc_maybe_interrupt(env);
253+
cpu_pause(cs);
253254
}
254255
if (val & PPC_BIT(6 + 8 * i)) { /* start */
255256
val &= ~PPC_BIT(6 + 8 * i);
256257
env->quiesced = false;
258+
ppc_maybe_interrupt(env);
257259
cpu_resume(cs);
258260
}
259261
if (val & PPC_BIT(4 + 8 * i)) { /* sreset */
260262
val &= ~PPC_BIT(4 + 8 * i);
261263
env->quiesced = false;
264+
ppc_maybe_interrupt(env);
262265
pnv_cpu_do_nmi_resume(cs);
263266
}
264267
if (val & PPC_BIT(3 + 8 * i)) { /* clear maint */
265268
env->quiesced = false;
269+
ppc_maybe_interrupt(env);
266270
/*
267271
* Hardware has very particular cases for where clear maint
268272
* must be used and where start must be used to resume a

hw/ppc/spapr_cpu_core.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
3737

3838
cpu_reset(cs);
3939

40+
env->quiesced = true; /* set "RTAS stopped" state. */
41+
ppc_maybe_interrupt(env);
42+
4043
/*
4144
* "PowerPC Processor binding to IEEE 1275" defines the initial MSR state
4245
* as 32bit (MSR_SF=0) with MSR_ME=1 and MSR_FP=1 in "8.2.1. Initial
@@ -98,6 +101,9 @@ void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip,
98101
CPU(cpu)->halted = 0;
99102
/* Enable Power-saving mode Exit Cause exceptions */
100103
ppc_store_lpcr(cpu, env->spr[SPR_LPCR] | pcc->lpcr_pm);
104+
105+
env->quiesced = false; /* clear "RTAS stopped" state. */
106+
ppc_maybe_interrupt(env);
101107
}
102108

103109
/*

hw/ppc/spapr_rtas.c

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,8 @@ static void rtas_query_cpu_stopped_state(PowerPCCPU *cpu_,
110110
id = rtas_ld(args, 0);
111111
cpu = spapr_find_cpu(id);
112112
if (cpu != NULL) {
113-
if (CPU(cpu)->halted) {
113+
CPUPPCState *env = &cpu->env;
114+
if (env->quiesced) {
114115
rtas_st(rets, 1, 0);
115116
} else {
116117
rtas_st(rets, 1, 2);
@@ -215,6 +216,8 @@ static void rtas_stop_self(PowerPCCPU *cpu, SpaprMachineState *spapr,
215216
* For the same reason, set PSSCR_EC.
216217
*/
217218
env->spr[SPR_PSSCR] |= PSSCR_EC;
219+
env->quiesced = true; /* set "RTAS stopped" state. */
220+
ppc_maybe_interrupt(env);
218221
cs->halted = 1;
219222
ppc_store_lpcr(cpu, env->spr[SPR_LPCR] & ~pcc->lpcr_pm);
220223
kvmppc_set_reg_ppc_online(cpu, 0);

target/ppc/cpu.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1356,6 +1356,17 @@ struct CPUArchState {
13561356
* special way (such as routing some resume causes to 0x100, i.e. sreset).
13571357
*/
13581358
bool resume_as_sreset;
1359+
1360+
/*
1361+
* On powernv, quiesced means the CPU has been stopped using PC direct
1362+
* control xscom registers.
1363+
*
1364+
* On spapr, quiesced means it is in the "RTAS stopped" state.
1365+
*
1366+
* The core halted/stopped variables aren't sufficient for this, because
1367+
* they can be changed with various side-band operations like qmp cont,
1368+
* powersave interrupts, etc.
1369+
*/
13591370
bool quiesced;
13601371
#endif
13611372

target/ppc/excp_helper.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1951,6 +1951,10 @@ static int ppc_next_unmasked_interrupt(CPUPPCState *env)
19511951
target_ulong lpcr = env->spr[SPR_LPCR];
19521952
bool async_deliver;
19531953

1954+
if (unlikely(env->quiesced)) {
1955+
return 0;
1956+
}
1957+
19541958
#ifdef TARGET_PPC64
19551959
switch (env->excp_model) {
19561960
case POWERPC_EXCP_POWER7:

0 commit comments

Comments
 (0)