|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +
|
| 3 | +====================================================================== |
| 4 | +Hardware Feedback Interface For Hetero Core Scheduling On AMD Platform |
| 5 | +====================================================================== |
| 6 | + |
| 7 | +:Copyright: 2025 Advanced Micro Devices, Inc. All Rights Reserved. |
| 8 | + |
| 9 | +:Author: Perry Yuan < [email protected]> |
| 10 | +:Author: Mario Limonciello < [email protected]> |
| 11 | + |
| 12 | +Overview |
| 13 | +-------- |
| 14 | + |
| 15 | +AMD Heterogeneous Core implementations are comprised of more than one |
| 16 | +architectural class and CPUs are comprised of cores of various efficiency and |
| 17 | +power capabilities: performance-oriented *classic cores* and power-efficient |
| 18 | +*dense cores*. As such, power management strategies must be designed to |
| 19 | +accommodate the complexities introduced by incorporating different core types. |
| 20 | +Heterogeneous systems can also extend to more than two architectural classes |
| 21 | +as well. The purpose of the scheduling feedback mechanism is to provide |
| 22 | +information to the operating system scheduler in real time such that the |
| 23 | +scheduler can direct threads to the optimal core. |
| 24 | + |
| 25 | +The goal of AMD's heterogeneous architecture is to attain power benefit by |
| 26 | +sending background threads to the dense cores while sending high priority |
| 27 | +threads to the classic cores. From a performance perspective, sending |
| 28 | +background threads to dense cores can free up power headroom and allow the |
| 29 | +classic cores to optimally service demanding threads. Furthermore, the area |
| 30 | +optimized nature of the dense cores allows for an increasing number of |
| 31 | +physical cores. This improved core density will have positive multithreaded |
| 32 | +performance impact. |
| 33 | + |
| 34 | +AMD Heterogeneous Core Driver |
| 35 | +----------------------------- |
| 36 | + |
| 37 | +The ``amd_hfi`` driver delivers the operating system a performance and energy |
| 38 | +efficiency capability data for each CPU in the system. The scheduler can use |
| 39 | +the ranking data from the HFI driver to make task placement decisions. |
| 40 | + |
| 41 | +Thread Classification and Ranking Table Interaction |
| 42 | +---------------------------------------------------- |
| 43 | + |
| 44 | +The thread classification is used to select into a ranking table that |
| 45 | +describes an efficiency and performance ranking for each classification. |
| 46 | + |
| 47 | +Threads are classified during runtime into enumerated classes. The classes |
| 48 | +represent thread performance/power characteristics that may benefit from |
| 49 | +special scheduling behaviors. The below table depicts an example of thread |
| 50 | +classification and a preference where a given thread should be scheduled |
| 51 | +based on its thread class. The real time thread classification is consumed |
| 52 | +by the operating system and is used to inform the scheduler of where the |
| 53 | +thread should be placed. |
| 54 | + |
| 55 | +Thread Classification Example Table |
| 56 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 57 | ++----------+----------------+-------------------------------+---------------------+---------+ |
| 58 | +| class ID | Classification | Preferred scheduling behavior | Preemption priority | Counter | |
| 59 | ++----------+----------------+-------------------------------+---------------------+---------+ |
| 60 | +| 0 | Default | Performant | Highest | | |
| 61 | ++----------+----------------+-------------------------------+---------------------+---------+ |
| 62 | +| 1 | Non-scalable | Efficient | Lowest | PMCx1A1 | |
| 63 | ++----------+----------------+-------------------------------+---------------------+---------+ |
| 64 | +| 2 | I/O bound | Efficient | Lowest | PMCx044 | |
| 65 | ++----------+----------------+-------------------------------+---------------------+---------+ |
| 66 | + |
| 67 | +Thread classification is performed by the hardware each time that the thread is switched out. |
| 68 | +Threads that don't meet any hardware specified criteria are classified as "default". |
| 69 | + |
| 70 | +AMD Hardware Feedback Interface |
| 71 | +-------------------------------- |
| 72 | + |
| 73 | +The Hardware Feedback Interface provides to the operating system information |
| 74 | +about the performance and energy efficiency of each CPU in the system. Each |
| 75 | +capability is given as a unit-less quantity in the range [0-255]. A higher |
| 76 | +performance value indicates higher performance capability, and a higher |
| 77 | +efficiency value indicates more efficiency. Energy efficiency and performance |
| 78 | +are reported in separate capabilities in the shared memory based ranking table. |
| 79 | + |
| 80 | +These capabilities may change at runtime as a result of changes in the |
| 81 | +operating conditions of the system or the action of external factors. |
| 82 | +Power Management firmware is responsible for detecting events that require |
| 83 | +a reordering of the performance and efficiency ranking. Table updates happen |
| 84 | +relatively infrequently and occur on the time scale of seconds or more. |
| 85 | + |
| 86 | +The following events trigger a table update: |
| 87 | + * Thermal Stress Events |
| 88 | + * Silent Compute |
| 89 | + * Extreme Low Battery Scenarios |
| 90 | + |
| 91 | +The kernel or a userspace policy daemon can use these capabilities to modify |
| 92 | +task placement decisions. For instance, if either the performance or energy |
| 93 | +capabilities of a given logical processor becomes zero, it is an indication |
| 94 | +that the hardware recommends to the operating system to not schedule any tasks |
| 95 | +on that processor for performance or energy efficiency reasons, respectively. |
| 96 | + |
| 97 | +Implementation details for Linux |
| 98 | +-------------------------------- |
| 99 | + |
| 100 | +The implementation of threads scheduling consists of the following steps: |
| 101 | + |
| 102 | +1. A thread is spawned and scheduled to the ideal core using the default |
| 103 | + heterogeneous scheduling policy. |
| 104 | +2. The processor profiles thread execution and assigns an enumerated |
| 105 | + classification ID. |
| 106 | + This classification is communicated to the OS via logical processor |
| 107 | + scope MSR. |
| 108 | +3. During the thread context switch out the operating system consumes the |
| 109 | + workload (WL) classification which resides in a logical processor scope MSR. |
| 110 | +4. The OS triggers the hardware to clear its history by writing to an MSR, |
| 111 | + after consuming the WL classification and before switching in the new thread. |
| 112 | +5. If due to the classification, ranking table, and processor availability, |
| 113 | + the thread is not on its ideal processor, the OS will then consider |
| 114 | + scheduling the thread on its ideal processor (if available). |
| 115 | + |
| 116 | +Ranking Table |
| 117 | +------------- |
| 118 | +The ranking table is a shared memory region that is used to communicate the |
| 119 | +performance and energy efficiency capabilities of each CPU in the system. |
| 120 | + |
| 121 | +The ranking table design includes rankings for each APIC ID in the system and |
| 122 | +rankings both for performance and efficiency for each workload classification. |
| 123 | + |
| 124 | +.. kernel-doc:: drivers/platform/x86/amd/hfi/hfi.c |
| 125 | + :doc: amd_shmem_info |
| 126 | + |
| 127 | +Ranking Table update |
| 128 | +--------------------------- |
| 129 | +The power management firmware issues an platform interrupt after updating the |
| 130 | +ranking table and is ready for the operating system to consume it. CPUs receive |
| 131 | +such interrupt and read new ranking table from shared memory which PCCT table |
| 132 | +has provided, then ``amd_hfi`` driver parses the new table to provide new |
| 133 | +consume data for scheduling decisions. |
0 commit comments