Skip to content

Commit ab81310

Browse files
yghannambp3tk0v
authored andcommitted
x86/CPU/AMD: Print the reason for the last reset
The following register contains bits that indicate the cause for the previous reset. PMx000000C0 (FCH::PM::S5_RESET_STATUS) This is useful for debug. The reasons for reset are broken into 6 high level categories. Decode it by category and print during boot. Specifics within a category are split off into debugging documentation. The register is accessed indirectly through a "PM" port in the FCH. Use MMIO access in order to avoid restrictions with legacy port access. Use a late_initcall() to ensure that MMIO has been set up before trying to access the register. This register was introduced with AMD Family 17h, so avoid access on older families. There is no CPUID feature bit for this register. [ bp: Simplify the reason dumping loop. - merge a fix to not access an array element after the last one: https://lore.kernel.org/r/[email protected] Reported-by: James Dutton <[email protected]> ] [ mingo: - Use consistent .rst formatting - Fix 'Sleep' class field to 'ACPI-State' - Standardize pin messages around the 'tripped' verbiage - Remove reference to ring-buffer printing & simplify the wording - Use curly braces for multi-line conditional statements ] Signed-off-by: Yazen Ghannam <[email protected]> Co-developed-by: Mario Limonciello <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/[email protected]
1 parent cafb222 commit ab81310

File tree

3 files changed

+103
-0
lines changed

3 files changed

+103
-0
lines changed

Documentation/arch/x86/amd-debugging.rst

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ report generated from this script to
5252

5353
Spurious s2idle wakeups from an IRQ
5454
===================================
55+
5556
Spurious wakeups will generally have an IRQ set to ``/sys/power/pm_wakeup_irq``.
5657
This can be matched to ``/proc/interrupts`` to determine what device woke the system.
5758

@@ -134,6 +135,7 @@ The ``amd_s2idle.py`` script will capture most of these artifacts for you.
134135

135136
s2idle PM debug messages
136137
========================
138+
137139
During the s2idle flow on AMD systems, the ACPI LPS0 driver is responsible
138140
to check all uPEP constraints. Failing uPEP constraints does not prevent
139141
s0i3 entry. This means that if some constraints are not met, it is possible
@@ -160,6 +162,7 @@ After doing this, run the suspend cycle and look specifically for errors around:
160162

161163
Historical examples of s2idle issues
162164
====================================
165+
163166
To help understand the types of issues that can occur and how to debug them,
164167
here are some historical examples of s2idle issues that have been resolved.
165168

@@ -248,6 +251,7 @@ state entry.
248251

249252
Runtime power consumption issues
250253
================================
254+
251255
Runtime power consumption is influenced by many factors, including but not
252256
limited to the configuration of the PCIe Active State Power Management (ASPM),
253257
the display brightness, the EPP policy of the CPU, and the power management
@@ -272,6 +276,7 @@ the battery life when more heavily biased towards performance.
272276

273277
BIOS debug messages
274278
===================
279+
275280
Most OEM machines don't have a serial UART for outputting kernel or BIOS
276281
debug messages. However BIOS debug messages are useful for understanding
277282
both BIOS bugs and bugs with the Linux kernel drivers that call BIOS AML.
@@ -318,3 +323,46 @@ As mentioned above, parsing by hand can be tedious, especially with a lot of
318323
messages. To help with this, a tool has been created at
319324
`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_
320325
to help parse the messages.
326+
327+
Random reboot issues
328+
====================
329+
330+
When a random reboot occurs, the high-level reason for the reboot is stored
331+
in a register that will persist onto the next boot.
332+
333+
There are 6 classes of reasons for the reboot:
334+
* Software induced
335+
* Power state transition
336+
* Pin induced
337+
* Hardware induced
338+
* Remote reset
339+
* Internal CPU event
340+
341+
.. csv-table::
342+
:header: "Bit", "Type", "Reason"
343+
:align: left
344+
345+
"0", "Pin", "thermal pin BP_THERMTRIP_L was tripped"
346+
"1", "Pin", "power button was pressed for 4 seconds"
347+
"2", "Pin", "shutdown pin was tripped"
348+
"4", "Remote", "remote ASF power off command was received"
349+
"9", "Internal", "internal CPU thermal limit was tripped"
350+
"16", "Pin", "system reset pin BP_SYS_RST_L was tripped"
351+
"17", "Software", "software issued PCI reset"
352+
"18", "Software", "software wrote 0x4 to reset control register 0xCF9"
353+
"19", "Software", "software wrote 0x6 to reset control register 0xCF9"
354+
"20", "Software", "software wrote 0xE to reset control register 0xCF9"
355+
"21", "ACPI-state", "ACPI power state transition occurred"
356+
"22", "Pin", "keyboard reset pin KB_RST_L was tripped"
357+
"23", "Internal", "internal CPU shutdown event occurred"
358+
"24", "Hardware", "system failed to boot before failed boot timer expired"
359+
"25", "Hardware", "hardware watchdog timer expired"
360+
"26", "Remote", "remote ASF reset command was received"
361+
"27", "Internal", "an uncorrected error caused a data fabric sync flood event"
362+
"29", "Internal", "FCH and MP1 failed warm reset handshake"
363+
"30", "Internal", "a parity error occurred"
364+
"31", "Internal", "a software sync flood event occurred"
365+
366+
This information is read by the kernel at bootup and printed into
367+
the syslog. When a random reboot occurs this message can be helpful
368+
to determine the next component to debug.

arch/x86/include/asm/amd/fch.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,6 @@
88
#define FCH_PM_DECODEEN 0x00
99
#define FCH_PM_DECODEEN_SMBUS0SEL GENMASK(20, 19)
1010
#define FCH_PM_SCRATCH 0x80
11+
#define FCH_PM_S5_RESET_STATUS 0xC0
1112

1213
#endif /* _ASM_X86_AMD_FCH_H_ */

arch/x86/kernel/cpu/amd.c

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
#include <linux/sched/clock.h>
1010
#include <linux/random.h>
1111
#include <linux/topology.h>
12+
#include <asm/amd/fch.h>
1213
#include <asm/processor.h>
1314
#include <asm/apic.h>
1415
#include <asm/cacheinfo.h>
@@ -1237,3 +1238,56 @@ void amd_check_microcode(void)
12371238
if (cpu_feature_enabled(X86_FEATURE_ZEN2))
12381239
on_each_cpu(zenbleed_check_cpu, NULL, 1);
12391240
}
1241+
1242+
static const char * const s5_reset_reason_txt[] = {
1243+
[0] = "thermal pin BP_THERMTRIP_L was tripped",
1244+
[1] = "power button was pressed for 4 seconds",
1245+
[2] = "shutdown pin was tripped",
1246+
[4] = "remote ASF power off command was received",
1247+
[9] = "internal CPU thermal limit was tripped",
1248+
[16] = "system reset pin BP_SYS_RST_L was tripped",
1249+
[17] = "software issued PCI reset",
1250+
[18] = "software wrote 0x4 to reset control register 0xCF9",
1251+
[19] = "software wrote 0x6 to reset control register 0xCF9",
1252+
[20] = "software wrote 0xE to reset control register 0xCF9",
1253+
[21] = "ACPI power state transition occurred",
1254+
[22] = "keyboard reset pin KB_RST_L was tripped",
1255+
[23] = "internal CPU shutdown event occurred",
1256+
[24] = "system failed to boot before failed boot timer expired",
1257+
[25] = "hardware watchdog timer expired",
1258+
[26] = "remote ASF reset command was received",
1259+
[27] = "an uncorrected error caused a data fabric sync flood event",
1260+
[29] = "FCH and MP1 failed warm reset handshake",
1261+
[30] = "a parity error occurred",
1262+
[31] = "a software sync flood event occurred",
1263+
};
1264+
1265+
static __init int print_s5_reset_status_mmio(void)
1266+
{
1267+
unsigned long value;
1268+
void __iomem *addr;
1269+
int i;
1270+
1271+
if (!cpu_feature_enabled(X86_FEATURE_ZEN))
1272+
return 0;
1273+
1274+
addr = ioremap(FCH_PM_BASE + FCH_PM_S5_RESET_STATUS, sizeof(value));
1275+
if (!addr)
1276+
return 0;
1277+
1278+
value = ioread32(addr);
1279+
iounmap(addr);
1280+
1281+
for (i = 0; i < ARRAY_SIZE(s5_reset_reason_txt); i++) {
1282+
if (!(value & BIT(i)))
1283+
continue;
1284+
1285+
if (s5_reset_reason_txt[i]) {
1286+
pr_info("x86/amd: Previous system reset reason [0x%08lx]: %s\n",
1287+
value, s5_reset_reason_txt[i]);
1288+
}
1289+
}
1290+
1291+
return 0;
1292+
}
1293+
late_initcall(print_s5_reset_status_mmio);

0 commit comments

Comments
 (0)