Skip to content

Commit b7082cd

Browse files
ftang1paulmckrcu
authored andcommitted
clocksource: Suspend the watchdog temporarily when high read latency detected
Bugs have been reported on 8 sockets x86 machines in which the TSC was wrongly disabled when the system is under heavy workload. [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped! [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped! [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog [ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998) [ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564. [ 821.067990] clocksource: Switched to clocksource hpet This can be reproduced by running memory intensive 'stream' tests, or some of the stress-ng subcases such as 'ioport'. The reason for these issues is the when system is under heavy load, the read latency of the clocksources can be very high. Even lightweight TSC reads can show high latencies, and latencies are much worse for external clocksources such as HPET or the APIC PM timer. These latencies can result in false-positive clocksource-unstable determinations. These issues were initially reported by a customer running on a production system, and this problem was reproduced on several generations of Xeon servers, especially when running the stress-ng test. These Xeon servers were not production systems, but they did have the latest steppings and firmware. Given that the clocksource watchdog is a continual diagnostic check with frequency of twice a second, there is no need to rush it when the system is under heavy load. Therefore, when high clocksource read latencies are detected, suspend the watchdog timer for 5 minutes. Signed-off-by: Feng Tang <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: John Stultz <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Feng Tang <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
1 parent dd02926 commit b7082cd

File tree

1 file changed

+32
-13
lines changed

1 file changed

+32
-13
lines changed

kernel/time/clocksource.c

Lines changed: 32 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -387,13 +387,23 @@ void clocksource_verify_percpu(struct clocksource *cs)
387387
}
388388
EXPORT_SYMBOL_GPL(clocksource_verify_percpu);
389389

390+
static inline void clocksource_reset_watchdog(void)
391+
{
392+
struct clocksource *cs;
393+
394+
list_for_each_entry(cs, &watchdog_list, wd_list)
395+
cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
396+
}
397+
398+
390399
static void clocksource_watchdog(struct timer_list *unused)
391400
{
392401
u64 csnow, wdnow, cslast, wdlast, delta;
393402
int next_cpu, reset_pending;
394403
int64_t wd_nsec, cs_nsec;
395404
struct clocksource *cs;
396405
enum wd_read_status read_ret;
406+
unsigned long extra_wait = 0;
397407
u32 md;
398408

399409
spin_lock(&watchdog_lock);
@@ -413,13 +423,30 @@ static void clocksource_watchdog(struct timer_list *unused)
413423

414424
read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
415425

416-
if (read_ret != WD_READ_SUCCESS) {
417-
if (read_ret == WD_READ_UNSTABLE)
418-
/* Clock readout unreliable, so give it up. */
419-
__clocksource_unstable(cs);
426+
if (read_ret == WD_READ_UNSTABLE) {
427+
/* Clock readout unreliable, so give it up. */
428+
__clocksource_unstable(cs);
420429
continue;
421430
}
422431

432+
/*
433+
* When WD_READ_SKIP is returned, it means the system is likely
434+
* under very heavy load, where the latency of reading
435+
* watchdog/clocksource is very big, and affect the accuracy of
436+
* watchdog check. So give system some space and suspend the
437+
* watchdog check for 5 minutes.
438+
*/
439+
if (read_ret == WD_READ_SKIP) {
440+
/*
441+
* As the watchdog timer will be suspended, and
442+
* cs->last could keep unchanged for 5 minutes, reset
443+
* the counters.
444+
*/
445+
clocksource_reset_watchdog();
446+
extra_wait = HZ * 300;
447+
break;
448+
}
449+
423450
/* Clocksource initialized ? */
424451
if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
425452
atomic_read(&watchdog_reset_pending)) {
@@ -523,7 +550,7 @@ static void clocksource_watchdog(struct timer_list *unused)
523550
* pair clocksource_stop_watchdog() clocksource_start_watchdog().
524551
*/
525552
if (!timer_pending(&watchdog_timer)) {
526-
watchdog_timer.expires += WATCHDOG_INTERVAL;
553+
watchdog_timer.expires += WATCHDOG_INTERVAL + extra_wait;
527554
add_timer_on(&watchdog_timer, next_cpu);
528555
}
529556
out:
@@ -548,14 +575,6 @@ static inline void clocksource_stop_watchdog(void)
548575
watchdog_running = 0;
549576
}
550577

551-
static inline void clocksource_reset_watchdog(void)
552-
{
553-
struct clocksource *cs;
554-
555-
list_for_each_entry(cs, &watchdog_list, wd_list)
556-
cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
557-
}
558-
559578
static void clocksource_resume_watchdog(void)
560579
{
561580
atomic_inc(&watchdog_reset_pending);

0 commit comments

Comments
 (0)