Skip to content

Commit 9a90ed0

Browse files
committed
x86/thermal: Fix LVT thermal setup for SMI delivery mode
There are machines out there with added value crap^WBIOS which provide an SMI handler for the local APIC thermal sensor interrupt. Out of reset, the BSP on those machines has something like 0x200 in that APIC register (timestamps left in because this whole issue is timing sensitive): [ 0.033858] read lvtthmr: 0x330, val: 0x200 which means: - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled - bits [10:8] have 010b which means SMI delivery mode. Now, later during boot, when the kernel programs the local APIC, it soft-disables it temporarily through the spurious vector register: setup_local_APIC: ... /* * If this comes from kexec/kcrash the APIC might be enabled in * SPIV. Soft disable it before doing further initialization. */ value = apic_read(APIC_SPIV); value &= ~APIC_SPIV_APIC_ENABLED; apic_write(APIC_SPIV, value); which means (from the SDM): "10.4.7.2 Local APIC State After It Has Been Software Disabled ... * The mask bits for all the LVT entries are set. Attempts to reset these bits will be ignored." And this happens too: [ 0.124111] APIC: Switch to symmetric I/O mode setup [ 0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0 [ 0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0 This results in CPU 0 soft lockups depending on the placement in time when the APIC soft-disable happens. Those soft lockups are not 100% reproducible and the reason for that can only be speculated as no one tells you what SMM does. Likely, it confuses the SMM code that the APIC is disabled and the thermal interrupt doesn't doesn't fire at all, leading to CPU 0 stuck in SMM forever... Now, before 4f432e8 ("x86/mce: Get rid of mcheck_intel_therm_init()") due to how the APIC_LVTTHMR was read before APIC initialization in mcheck_intel_therm_init(), it would read the value with the mask bit 16 clear and then intel_init_thermal() would replicate it onto the APs and all would be peachy - the thermal interrupt would remain enabled. But that commit moved that reading to a later moment in intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too late and with its interrupt mask bit set. Thus, revert back to the old behavior of reading the thermal LVT register before the APIC gets initialized. Fixes: 4f432e8 ("x86/mce: Get rid of mcheck_intel_therm_init()") Reported-by: James Feeney <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: <[email protected]> Cc: Zhang Rui <[email protected]> Cc: Srinivas Pandruvada <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent 7d65f9e commit 9a90ed0

File tree

3 files changed

+23
-5
lines changed

3 files changed

+23
-5
lines changed

arch/x86/include/asm/thermal.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,13 @@
33
#define _ASM_X86_THERMAL_H
44

55
#ifdef CONFIG_X86_THERMAL_VECTOR
6+
void therm_lvt_init(void);
67
void intel_init_thermal(struct cpuinfo_x86 *c);
78
bool x86_thermal_enabled(void);
89
void intel_thermal_interrupt(void);
910
#else
10-
static inline void intel_init_thermal(struct cpuinfo_x86 *c) { }
11+
static inline void therm_lvt_init(void) { }
12+
static inline void intel_init_thermal(struct cpuinfo_x86 *c) { }
1113
#endif
1214

1315
#endif /* _ASM_X86_THERMAL_H */

arch/x86/kernel/setup.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
#include <asm/pci-direct.h>
4545
#include <asm/prom.h>
4646
#include <asm/proto.h>
47+
#include <asm/thermal.h>
4748
#include <asm/unwind.h>
4849
#include <asm/vsyscall.h>
4950
#include <linux/vmalloc.h>
@@ -1226,6 +1227,14 @@ void __init setup_arch(char **cmdline_p)
12261227

12271228
x86_init.timers.wallclock_init();
12281229

1230+
/*
1231+
* This needs to run before setup_local_APIC() which soft-disables the
1232+
* local APIC temporarily and that masks the thermal LVT interrupt,
1233+
* leading to softlockups on machines which have configured SMI
1234+
* interrupt delivery.
1235+
*/
1236+
therm_lvt_init();
1237+
12291238
mcheck_init();
12301239

12311240
register_refined_jiffies(CLOCK_TICK_RATE);

drivers/thermal/intel/therm_throt.c

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -621,6 +621,17 @@ bool x86_thermal_enabled(void)
621621
return atomic_read(&therm_throt_en);
622622
}
623623

624+
void __init therm_lvt_init(void)
625+
{
626+
/*
627+
* This function is only called on boot CPU. Save the init thermal
628+
* LVT value on BSP and use that value to restore APs' thermal LVT
629+
* entry BIOS programmed later
630+
*/
631+
if (intel_thermal_supported(&boot_cpu_data))
632+
lvtthmr_init = apic_read(APIC_LVTTHMR);
633+
}
634+
624635
void intel_init_thermal(struct cpuinfo_x86 *c)
625636
{
626637
unsigned int cpu = smp_processor_id();
@@ -630,10 +641,6 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
630641
if (!intel_thermal_supported(c))
631642
return;
632643

633-
/* On the BSP? */
634-
if (c == &boot_cpu_data)
635-
lvtthmr_init = apic_read(APIC_LVTTHMR);
636-
637644
/*
638645
* First check if its enabled already, in which case there might
639646
* be some SMM goo which handles it, so we can't even put a handler

0 commit comments

Comments
 (0)