Skip to content

Commit 9c388a5

Browse files
committed
watchdog/harclockup/perf: Revert a33d448 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
Guenter reported a crash in the watchdog/perf code, which is caused by cleanup() and enable() running concurrently. The reason for this is: The watchdog functions are serialized via the watchdog_mutex and cpu hotplug locking, but the enable of the perf based watchdog happens in context of the unpark callback of the smpboot thread. But that unpark function is not synchronous inside the locking. The unparking of the thread just wakes it up and leaves so there is no guarantee when the thread is executing. If it starts running _before_ the cleanup happened then it will create a event and overwrite the dead event pointer. The new event is then cleaned up because the event is marked dead. lock(watchdog_mutex); lockup_detector_reconfigure(); cpus_read_lock(); stop(); park() update(); start(); unpark() cpus_read_unlock(); thread runs() overwrite dead event ptr cleanup(); free new event, which is active inside perf.... unlock(watchdog_mutex); The park side is safe as that actually waits for the thread to reach parked state. Commit a33d448 removed the protection against this kind of scenario under the stupid assumption that the hotplug serialization and the watchdog_mutex cover everything. Bring it back. Reverts: a33d448 ("watchdog/hardlockup/perf: Simplify deferred event destroy") Reported-and-tested-by: Guenter Roeck <[email protected]> Signed-off-by: Thomas Feels-stupid Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Don Zickus <[email protected]> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos
1 parent 153fbd1 commit 9c388a5

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

kernel/watchdog_hld.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
static DEFINE_PER_CPU(bool, hard_watchdog_warn);
2222
static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
2323
static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
24+
static DEFINE_PER_CPU(struct perf_event *, dead_event);
2425
static struct cpumask dead_events_mask;
2526

2627
static unsigned long hardlockup_allcpu_dumped;
@@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void)
203204

204205
if (event) {
205206
perf_event_disable(event);
207+
this_cpu_write(watchdog_ev, NULL);
208+
this_cpu_write(dead_event, event);
206209
cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
207210
watchdog_cpus--;
208211
}
@@ -218,15 +221,15 @@ void hardlockup_detector_perf_cleanup(void)
218221
int cpu;
219222

220223
for_each_cpu(cpu, &dead_events_mask) {
221-
struct perf_event *event = per_cpu(watchdog_ev, cpu);
224+
struct perf_event *event = per_cpu(dead_event, cpu);
222225

223226
/*
224227
* Required because for_each_cpu() reports unconditionally
225228
* CPU0 as set on UP kernels. Sigh.
226229
*/
227230
if (event)
228231
perf_event_release_kernel(event);
229-
per_cpu(watchdog_ev, cpu) = NULL;
232+
per_cpu(dead_event, cpu) = NULL;
230233
}
231234
cpumask_clear(&dead_events_mask);
232235
}

0 commit comments

Comments
 (0)