Skip to content

Commit 98e8f2c

Browse files
committed
Merge tag 'x86-platform-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar: "This adds support for the AMD hardware feedback interface (HFI), by Perry Yuan" * tag 'x86-platform-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/itmt: Add debugfs file to show core priorities platform/x86/amd: hfi: Add debugfs support platform/x86/amd: hfi: Set ITMT priority from ranking data cpufreq/amd-pstate: Disable preferred cores on designs with workload classification x86/process: Clear hardware feedback history for AMD processors platform/x86: hfi: Add power management callback platform/x86: hfi: Add online and offline callback support platform/x86: hfi: Init per-cpu scores for each class platform/x86: hfi: Parse CPU core ranking data from shared memory platform/x86: hfi: Introduce AMD Hardware Feedback Interface Driver x86/msr-index: Add AMD workload classification MSRs MAINTAINERS: Add maintainer entry for AMD Hardware Feedback Driver Documentation/x86: Add AMD Hardware Feedback Interface documentation
2 parents e12ac84 + f126821 commit 98e8f2c

File tree

12 files changed

+760
-0
lines changed

12 files changed

+760
-0
lines changed

Documentation/arch/x86/amd-hfi.rst

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
======================================================================
4+
Hardware Feedback Interface For Hetero Core Scheduling On AMD Platform
5+
======================================================================
6+
7+
:Copyright: 2025 Advanced Micro Devices, Inc. All Rights Reserved.
8+
9+
:Author: Perry Yuan <[email protected]>
10+
:Author: Mario Limonciello <[email protected]>
11+
12+
Overview
13+
--------
14+
15+
AMD Heterogeneous Core implementations are comprised of more than one
16+
architectural class and CPUs are comprised of cores of various efficiency and
17+
power capabilities: performance-oriented *classic cores* and power-efficient
18+
*dense cores*. As such, power management strategies must be designed to
19+
accommodate the complexities introduced by incorporating different core types.
20+
Heterogeneous systems can also extend to more than two architectural classes
21+
as well. The purpose of the scheduling feedback mechanism is to provide
22+
information to the operating system scheduler in real time such that the
23+
scheduler can direct threads to the optimal core.
24+
25+
The goal of AMD's heterogeneous architecture is to attain power benefit by
26+
sending background threads to the dense cores while sending high priority
27+
threads to the classic cores. From a performance perspective, sending
28+
background threads to dense cores can free up power headroom and allow the
29+
classic cores to optimally service demanding threads. Furthermore, the area
30+
optimized nature of the dense cores allows for an increasing number of
31+
physical cores. This improved core density will have positive multithreaded
32+
performance impact.
33+
34+
AMD Heterogeneous Core Driver
35+
-----------------------------
36+
37+
The ``amd_hfi`` driver delivers the operating system a performance and energy
38+
efficiency capability data for each CPU in the system. The scheduler can use
39+
the ranking data from the HFI driver to make task placement decisions.
40+
41+
Thread Classification and Ranking Table Interaction
42+
----------------------------------------------------
43+
44+
The thread classification is used to select into a ranking table that
45+
describes an efficiency and performance ranking for each classification.
46+
47+
Threads are classified during runtime into enumerated classes. The classes
48+
represent thread performance/power characteristics that may benefit from
49+
special scheduling behaviors. The below table depicts an example of thread
50+
classification and a preference where a given thread should be scheduled
51+
based on its thread class. The real time thread classification is consumed
52+
by the operating system and is used to inform the scheduler of where the
53+
thread should be placed.
54+
55+
Thread Classification Example Table
56+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
57+
+----------+----------------+-------------------------------+---------------------+---------+
58+
| class ID | Classification | Preferred scheduling behavior | Preemption priority | Counter |
59+
+----------+----------------+-------------------------------+---------------------+---------+
60+
| 0 | Default | Performant | Highest | |
61+
+----------+----------------+-------------------------------+---------------------+---------+
62+
| 1 | Non-scalable | Efficient | Lowest | PMCx1A1 |
63+
+----------+----------------+-------------------------------+---------------------+---------+
64+
| 2 | I/O bound | Efficient | Lowest | PMCx044 |
65+
+----------+----------------+-------------------------------+---------------------+---------+
66+
67+
Thread classification is performed by the hardware each time that the thread is switched out.
68+
Threads that don't meet any hardware specified criteria are classified as "default".
69+
70+
AMD Hardware Feedback Interface
71+
--------------------------------
72+
73+
The Hardware Feedback Interface provides to the operating system information
74+
about the performance and energy efficiency of each CPU in the system. Each
75+
capability is given as a unit-less quantity in the range [0-255]. A higher
76+
performance value indicates higher performance capability, and a higher
77+
efficiency value indicates more efficiency. Energy efficiency and performance
78+
are reported in separate capabilities in the shared memory based ranking table.
79+
80+
These capabilities may change at runtime as a result of changes in the
81+
operating conditions of the system or the action of external factors.
82+
Power Management firmware is responsible for detecting events that require
83+
a reordering of the performance and efficiency ranking. Table updates happen
84+
relatively infrequently and occur on the time scale of seconds or more.
85+
86+
The following events trigger a table update:
87+
* Thermal Stress Events
88+
* Silent Compute
89+
* Extreme Low Battery Scenarios
90+
91+
The kernel or a userspace policy daemon can use these capabilities to modify
92+
task placement decisions. For instance, if either the performance or energy
93+
capabilities of a given logical processor becomes zero, it is an indication
94+
that the hardware recommends to the operating system to not schedule any tasks
95+
on that processor for performance or energy efficiency reasons, respectively.
96+
97+
Implementation details for Linux
98+
--------------------------------
99+
100+
The implementation of threads scheduling consists of the following steps:
101+
102+
1. A thread is spawned and scheduled to the ideal core using the default
103+
heterogeneous scheduling policy.
104+
2. The processor profiles thread execution and assigns an enumerated
105+
classification ID.
106+
This classification is communicated to the OS via logical processor
107+
scope MSR.
108+
3. During the thread context switch out the operating system consumes the
109+
workload (WL) classification which resides in a logical processor scope MSR.
110+
4. The OS triggers the hardware to clear its history by writing to an MSR,
111+
after consuming the WL classification and before switching in the new thread.
112+
5. If due to the classification, ranking table, and processor availability,
113+
the thread is not on its ideal processor, the OS will then consider
114+
scheduling the thread on its ideal processor (if available).
115+
116+
Ranking Table
117+
-------------
118+
The ranking table is a shared memory region that is used to communicate the
119+
performance and energy efficiency capabilities of each CPU in the system.
120+
121+
The ranking table design includes rankings for each APIC ID in the system and
122+
rankings both for performance and efficiency for each workload classification.
123+
124+
.. kernel-doc:: drivers/platform/x86/amd/hfi/hfi.c
125+
:doc: amd_shmem_info
126+
127+
Ranking Table update
128+
---------------------------
129+
The power management firmware issues an platform interrupt after updating the
130+
ranking table and is ready for the operating system to consume it. CPUs receive
131+
such interrupt and read new ranking table from shared memory which PCCT table
132+
has provided, then ``amd_hfi`` driver parses the new table to provide new
133+
consume data for scheduling decisions.

Documentation/arch/x86/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ x86-specific Documentation
2828
amd-debugging
2929
amd-memory-encryption
3030
amd_hsmp
31+
amd-hfi
3132
tdx
3233
pti
3334
mds

MAINTAINERS

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1115,6 +1115,15 @@ F: arch/x86/include/asm/amd/hsmp.h
11151115
F: arch/x86/include/uapi/asm/amd_hsmp.h
11161116
F: drivers/platform/x86/amd/hsmp/
11171117

1118+
AMD HETERO CORE HARDWARE FEEDBACK DRIVER
1119+
M: Mario Limonciello <[email protected]>
1120+
R: Perry Yuan <[email protected]>
1121+
1122+
S: Supported
1123+
B: https://gitlab.freedesktop.org/drm/amd/-/issues
1124+
F: Documentation/arch/x86/amd-hfi.rst
1125+
F: drivers/platform/x86/amd/hfi/
1126+
11181127
AMD IOMMU (AMD-VI)
11191128
M: Joerg Roedel <[email protected]>
11201129
R: Suravee Suthikulpanit <[email protected]>

arch/x86/include/asm/msr-index.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -733,6 +733,11 @@
733733
#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301
734734
#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302
735735

736+
/* AMD Hardware Feedback Support MSRs */
737+
#define MSR_AMD_WORKLOAD_CLASS_CONFIG 0xc0000500
738+
#define MSR_AMD_WORKLOAD_CLASS_ID 0xc0000501
739+
#define MSR_AMD_WORKLOAD_HRST 0xc0000502
740+
736741
/* AMD Last Branch Record MSRs */
737742
#define MSR_AMD64_LBR_SELECT 0xc000010e
738743

arch/x86/kernel/itmt.c

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,18 @@ static ssize_t sched_itmt_enabled_write(struct file *filp,
5959
return result;
6060
}
6161

62+
static int sched_core_priority_show(struct seq_file *s, void *unused)
63+
{
64+
int cpu;
65+
66+
seq_puts(s, "CPU #\tPriority\n");
67+
for_each_possible_cpu(cpu)
68+
seq_printf(s, "%d\t%d\n", cpu, arch_asym_cpu_priority(cpu));
69+
70+
return 0;
71+
}
72+
DEFINE_SHOW_ATTRIBUTE(sched_core_priority);
73+
6274
static const struct file_operations dfs_sched_itmt_fops = {
6375
.read = debugfs_read_file_bool,
6476
.write = sched_itmt_enabled_write,
@@ -67,6 +79,7 @@ static const struct file_operations dfs_sched_itmt_fops = {
6779
};
6880

6981
static struct dentry *dfs_sched_itmt;
82+
static struct dentry *dfs_sched_core_prio;
7083

7184
/**
7285
* sched_set_itmt_support() - Indicate platform supports ITMT
@@ -102,6 +115,14 @@ int sched_set_itmt_support(void)
102115
return -ENOMEM;
103116
}
104117

118+
dfs_sched_core_prio = debugfs_create_file("sched_core_priority", 0644,
119+
arch_debugfs_dir, NULL,
120+
&sched_core_priority_fops);
121+
if (IS_ERR_OR_NULL(dfs_sched_core_prio)) {
122+
dfs_sched_core_prio = NULL;
123+
return -ENOMEM;
124+
}
125+
105126
sched_itmt_capable = true;
106127

107128
sysctl_sched_itmt_enabled = 1;
@@ -133,6 +154,8 @@ void sched_clear_itmt_support(void)
133154

134155
debugfs_remove(dfs_sched_itmt);
135156
dfs_sched_itmt = NULL;
157+
debugfs_remove(dfs_sched_core_prio);
158+
dfs_sched_core_prio = NULL;
136159

137160
if (sysctl_sched_itmt_enabled) {
138161
/* disable sched_itmt if we are no longer ITMT capable */

arch/x86/kernel/process_64.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -707,6 +707,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
707707
/* Load the Intel cache allocation PQR MSR. */
708708
resctrl_arch_sched_in(next_p);
709709

710+
/* Reset hw history on AMD CPUs */
711+
if (cpu_feature_enabled(X86_FEATURE_AMD_WORKLOAD_CLASS))
712+
wrmsrl(MSR_AMD_WORKLOAD_HRST, 0x1);
713+
710714
return prev_p;
711715
}
712716

drivers/cpufreq/amd-pstate.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -826,6 +826,13 @@ static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata)
826826
if (!amd_pstate_prefcore)
827827
return;
828828

829+
/* should use amd-hfi instead */
830+
if (cpu_feature_enabled(X86_FEATURE_AMD_WORKLOAD_CLASS) &&
831+
IS_ENABLED(CONFIG_AMD_HFI)) {
832+
amd_pstate_prefcore = false;
833+
return;
834+
}
835+
829836
cpudata->hw_prefcore = true;
830837

831838
/* Priorities must be initialized before ITMT support can be toggled on. */

drivers/platform/x86/amd/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
source "drivers/platform/x86/amd/hsmp/Kconfig"
77
source "drivers/platform/x86/amd/pmf/Kconfig"
88
source "drivers/platform/x86/amd/pmc/Kconfig"
9+
source "drivers/platform/x86/amd/hfi/Kconfig"
910

1011
config AMD_3D_VCACHE
1112
tristate "AMD 3D V-Cache Performance Optimizer Driver"

drivers/platform/x86/amd/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ obj-$(CONFIG_AMD_HSMP) += hsmp/
1111
obj-$(CONFIG_AMD_PMF) += pmf/
1212
obj-$(CONFIG_AMD_WBRF) += wbrf.o
1313
obj-$(CONFIG_AMD_ISP_PLATFORM) += amd_isp4.o
14+
obj-$(CONFIG_AMD_HFI) += hfi/

drivers/platform/x86/amd/hfi/Kconfig

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# SPDX-License-Identifier: GPL-2.0-only
2+
#
3+
# AMD Hardware Feedback Interface Driver
4+
#
5+
6+
config AMD_HFI
7+
bool "AMD Hetero Core Hardware Feedback Driver"
8+
depends on ACPI
9+
depends on CPU_SUP_AMD
10+
depends on SCHED_MC_PRIO
11+
help
12+
Select this option to enable the AMD Heterogeneous Core Hardware
13+
Feedback Interface. If selected, hardware provides runtime thread
14+
classification guidance to the operating system on the performance and
15+
energy efficiency capabilities of each heterogeneous CPU core. These
16+
capabilities may vary due to the inherent differences in the core types
17+
and can also change as a result of variations in the operating
18+
conditions of the system such as power and thermal limits.

0 commit comments

Comments
 (0)