Skip to content

Commit 1139034

Browse files
Perry Yuanbp3tk0v
authored andcommitted
Documentation/x86: Add AMD Hardware Feedback Interface documentation
Introduce a new documentation file, `amd_hfi.rst`, which delves into the implementation details of the AMD Hardware Feedback Interface and its associated driver, `amd_hfi`. This documentation describes how the driver provides hint to the OS scheduling which depends on the capability of core performance and efficiency ranking data. This documentation describes: * The design of the driver * How the driver provides hints to the OS scheduling * How the driver interfaces with the kernel for efficiency ranking data. Signed-off-by: Perry Yuan <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Bagas Sanjaya <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Acked-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/[email protected]
1 parent d7b8f8e commit 1139034

File tree

2 files changed

+134
-0
lines changed

2 files changed

+134
-0
lines changed

Documentation/arch/x86/amd-hfi.rst

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
======================================================================
4+
Hardware Feedback Interface For Hetero Core Scheduling On AMD Platform
5+
======================================================================
6+
7+
:Copyright: 2025 Advanced Micro Devices, Inc. All Rights Reserved.
8+
9+
:Author: Perry Yuan <[email protected]>
10+
:Author: Mario Limonciello <[email protected]>
11+
12+
Overview
13+
--------
14+
15+
AMD Heterogeneous Core implementations are comprised of more than one
16+
architectural class and CPUs are comprised of cores of various efficiency and
17+
power capabilities: performance-oriented *classic cores* and power-efficient
18+
*dense cores*. As such, power management strategies must be designed to
19+
accommodate the complexities introduced by incorporating different core types.
20+
Heterogeneous systems can also extend to more than two architectural classes
21+
as well. The purpose of the scheduling feedback mechanism is to provide
22+
information to the operating system scheduler in real time such that the
23+
scheduler can direct threads to the optimal core.
24+
25+
The goal of AMD's heterogeneous architecture is to attain power benefit by
26+
sending background threads to the dense cores while sending high priority
27+
threads to the classic cores. From a performance perspective, sending
28+
background threads to dense cores can free up power headroom and allow the
29+
classic cores to optimally service demanding threads. Furthermore, the area
30+
optimized nature of the dense cores allows for an increasing number of
31+
physical cores. This improved core density will have positive multithreaded
32+
performance impact.
33+
34+
AMD Heterogeneous Core Driver
35+
-----------------------------
36+
37+
The ``amd_hfi`` driver delivers the operating system a performance and energy
38+
efficiency capability data for each CPU in the system. The scheduler can use
39+
the ranking data from the HFI driver to make task placement decisions.
40+
41+
Thread Classification and Ranking Table Interaction
42+
----------------------------------------------------
43+
44+
The thread classification is used to select into a ranking table that
45+
describes an efficiency and performance ranking for each classification.
46+
47+
Threads are classified during runtime into enumerated classes. The classes
48+
represent thread performance/power characteristics that may benefit from
49+
special scheduling behaviors. The below table depicts an example of thread
50+
classification and a preference where a given thread should be scheduled
51+
based on its thread class. The real time thread classification is consumed
52+
by the operating system and is used to inform the scheduler of where the
53+
thread should be placed.
54+
55+
Thread Classification Example Table
56+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
57+
+----------+----------------+-------------------------------+---------------------+---------+
58+
| class ID | Classification | Preferred scheduling behavior | Preemption priority | Counter |
59+
+----------+----------------+-------------------------------+---------------------+---------+
60+
| 0 | Default | Performant | Highest | |
61+
+----------+----------------+-------------------------------+---------------------+---------+
62+
| 1 | Non-scalable | Efficient | Lowest | PMCx1A1 |
63+
+----------+----------------+-------------------------------+---------------------+---------+
64+
| 2 | I/O bound | Efficient | Lowest | PMCx044 |
65+
+----------+----------------+-------------------------------+---------------------+---------+
66+
67+
Thread classification is performed by the hardware each time that the thread is switched out.
68+
Threads that don't meet any hardware specified criteria are classified as "default".
69+
70+
AMD Hardware Feedback Interface
71+
--------------------------------
72+
73+
The Hardware Feedback Interface provides to the operating system information
74+
about the performance and energy efficiency of each CPU in the system. Each
75+
capability is given as a unit-less quantity in the range [0-255]. A higher
76+
performance value indicates higher performance capability, and a higher
77+
efficiency value indicates more efficiency. Energy efficiency and performance
78+
are reported in separate capabilities in the shared memory based ranking table.
79+
80+
These capabilities may change at runtime as a result of changes in the
81+
operating conditions of the system or the action of external factors.
82+
Power Management firmware is responsible for detecting events that require
83+
a reordering of the performance and efficiency ranking. Table updates happen
84+
relatively infrequently and occur on the time scale of seconds or more.
85+
86+
The following events trigger a table update:
87+
* Thermal Stress Events
88+
* Silent Compute
89+
* Extreme Low Battery Scenarios
90+
91+
The kernel or a userspace policy daemon can use these capabilities to modify
92+
task placement decisions. For instance, if either the performance or energy
93+
capabilities of a given logical processor becomes zero, it is an indication
94+
that the hardware recommends to the operating system to not schedule any tasks
95+
on that processor for performance or energy efficiency reasons, respectively.
96+
97+
Implementation details for Linux
98+
--------------------------------
99+
100+
The implementation of threads scheduling consists of the following steps:
101+
102+
1. A thread is spawned and scheduled to the ideal core using the default
103+
heterogeneous scheduling policy.
104+
2. The processor profiles thread execution and assigns an enumerated
105+
classification ID.
106+
This classification is communicated to the OS via logical processor
107+
scope MSR.
108+
3. During the thread context switch out the operating system consumes the
109+
workload (WL) classification which resides in a logical processor scope MSR.
110+
4. The OS triggers the hardware to clear its history by writing to an MSR,
111+
after consuming the WL classification and before switching in the new thread.
112+
5. If due to the classification, ranking table, and processor availability,
113+
the thread is not on its ideal processor, the OS will then consider
114+
scheduling the thread on its ideal processor (if available).
115+
116+
Ranking Table
117+
-------------
118+
The ranking table is a shared memory region that is used to communicate the
119+
performance and energy efficiency capabilities of each CPU in the system.
120+
121+
The ranking table design includes rankings for each APIC ID in the system and
122+
rankings both for performance and efficiency for each workload classification.
123+
124+
.. kernel-doc:: drivers/platform/x86/amd/hfi/hfi.c
125+
:doc: amd_shmem_info
126+
127+
Ranking Table update
128+
---------------------------
129+
The power management firmware issues an platform interrupt after updating the
130+
ranking table and is ready for the operating system to consume it. CPUs receive
131+
such interrupt and read new ranking table from shared memory which PCCT table
132+
has provided, then ``amd_hfi`` driver parses the new table to provide new
133+
consume data for scheduling decisions.

Documentation/arch/x86/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ x86-specific Documentation
2828
amd-debugging
2929
amd-memory-encryption
3030
amd_hsmp
31+
amd-hfi
3132
tdx
3233
pti
3334
mds

0 commit comments

Comments
 (0)