Skip to content

Commit 09ca039

Browse files
committed
feat(linux): Add s2idle docs
Introduce the concept of s2idle and how we use it for mode selection Signed-off-by: Dhruva Gole <[email protected]>
1 parent 6bb8848 commit 09ca039

File tree

3 files changed

+308
-0
lines changed

3 files changed

+308
-0
lines changed

configs/AM62LX/AM62LX_linux_toc.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ linux/Foundational_Components_Power_Management
7676
linux/Foundational_Components/Power_Management/pm_overview
7777
linux/Foundational_Components/Power_Management/pm_cpuidle
7878
linux/Foundational_Components/Power_Management/pm_am62lx_low_power_modes
79+
linux/Foundational_Components/Power_Management/pm_psci_s2idle
7980
linux/Foundational_Components/Power_Management/pm_wakeup_sources
8081
linux/Foundational_Components/Power_Management/pm_am62lx_debug
8182

Lines changed: 306 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
.. _pm_s2idle_psci:
2+
3+
#############################################
4+
Suspend-to-Idle (S2Idle) and PSCI Integration
5+
#############################################
6+
7+
**********************************
8+
Suspend-to-Idle (S2Idle) Overview
9+
**********************************
10+
11+
Suspend-to-Idle (s2idle), also known as "freeze," is a generic, pure software, light-weight variant of system suspend.
12+
In this state, the Linux kernel freezes user space tasks, suspends devices, and then puts all CPUs into their deepest available idle state.
13+
14+
*******************
15+
PSCI as the Enabler
16+
*******************
17+
18+
The Power State Coordination Interface (PSCI) is an ARM-defined standard that acts as the fundamental
19+
enabler for s2idle on all ARM platforms that support it. PSCI defines a standardized firmware interface that allows the
20+
Operating System (OS) to request power states without needing intimate knowledge of the underlying
21+
SoC.
22+
23+
**s2idle Call Flow:**
24+
25+
.. code-block:: text
26+
27+
Linux Kernel PSCI Firmware (TF-A)
28+
============ ====================
29+
30+
1. Freeze tasks
31+
|
32+
v
33+
2. Suspend devices
34+
|
35+
v
36+
3. cpuidle framework -----------> CPU_SUSPEND
37+
(per CPU) |
38+
| v
39+
| Coordinate power
40+
| state requests
41+
| |
42+
| v
43+
| CPU enters low-power
44+
| hardware state
45+
|
46+
|<--------- Resume ---------
47+
|
48+
v
49+
4. Resume devices
50+
|
51+
v
52+
5. Thaw tasks
53+
54+
The `cpuidle` framework calls the PSCI `CPU_SUSPEND` API to transition each CPU individually into respective low-power states.
55+
The effectiveness of s2idle depends heavily on the PSCI implementation's ability to coordinate these
56+
requests and enter the deepest possible hardware state.
57+
58+
************************
59+
OS Initiated (OSI) Mode
60+
************************
61+
62+
PSCI 1.0 introduced **OS Initiated (OSI)** mode, which shifts the responsibility of power state coordination from the platform firmware to the Operating System.
63+
64+
In the default **Platform Coordinated (PC)** mode, the OS independently requests a state for each core. The firmware then aggregates these requests (voting) to
65+
determine if a cluster or the system can be powered down.
66+
67+
In **OS Initiated (OSI)** mode, the OS explicitly manages the hierarchy. The OS determines when the last core in a power domain (e.g., a cluster) is going idle
68+
and explicitly requests the power-down of that domain.
69+
70+
Why OSI?
71+
========
72+
73+
OSI mode allows the OS to make better power decisions because it has visibility into:
74+
* **Task Scheduling:** The OS knows when other cores will wake up.
75+
* **Wakeup Latencies:** The OS can respect Quality of Service (QoS) latency constraints more accurately.
76+
* **Usage Patterns:** The OS can predict idle duration better than firmware.
77+
78+
OSI Sequence
79+
============
80+
81+
The coordination in OSI mode follows a specific "Last Man Standing" sequence. The OS tracks the state of all cores in a topology node (e.g., a cluster).
82+
83+
.. code-block:: text
84+
85+
OSI "Last Man Standing" Flow
86+
87+
Cluster with 2 Cores OS Action PSCI Request
88+
==================== ========= =============
89+
90+
1. Core 0,1: ACTIVE
91+
|
92+
| Core 0 becomes idle
93+
v
94+
2. Core 0: IDLE --> OS requests local --> CPU_SUSPEND
95+
Core 1: ACTIVE Core Power Down (Core PD only)
96+
Cluster stays ON
97+
|
98+
| Core 1 (LAST) becomes idle
99+
v
100+
3. Core 0,1: IDLE --> OS recognizes --> CPU_SUSPEND
101+
"Last Man" scenario (Composite State)
102+
Requests Composite:
103+
- Core 1: PD Core: PD
104+
- Cluster: PD Cluster: PD
105+
- System: PD System: PD
106+
|
107+
v
108+
4. Firmware Verification --> PSCI firmware checks
109+
& System Power Down all cores/clusters idle
110+
If verified: Power down
111+
entire system
112+
If not: Deny request
113+
(race condition)
114+
115+
**Detailed Steps:**
116+
117+
1. **First Core Idle:** When the first core in a cluster goes idle, the OS requests a local idle state
118+
for that core (e.g., Core Power Down) but keeps the cluster running.
119+
120+
2. **Last Core Idle:** When the *last* active core in the cluster is ready to go idle, the OS recognizes
121+
that the entire cluster, and potentially the system, can now be powered down.
122+
123+
3. **Composite Request:** The last core issues a `CPU_SUSPEND` call that requests a **composite state**:
124+
125+
* **Core State:** Power Down
126+
* **Cluster State:** Power Down
127+
* **System State:** Power Down (as demonstrated in the diagram)
128+
129+
4. **Firmware Enforcement:** The PSCI firmware verifies that all other cores and clusters in the requested node are indeed idle.
130+
If they are not, the request is denied (to prevent race conditions).
131+
132+
***********************************
133+
Understanding the Suspend Parameter
134+
***********************************
135+
136+
The `power_state` parameter passed to `CPU_SUSPEND` is the key to requesting these states.
137+
In OSI mode, this parameter must encode the intent for the entire hierarchy.
138+
139+
Power State Parameter Encoding
140+
================================
141+
142+
The `power_state` is a 32-bit parameter defined by the ARM PSCI specification (ARM DEN0022C).
143+
It has two encoding formats, controlled by the platform's build configuration.
144+
145+
Standard Format
146+
===============
147+
148+
This is the default format used by most platforms:
149+
150+
.. code-block:: text
151+
152+
31 26 25 24 23 17 16 15 0
153+
+---------------+------+----------------+----+----------------------+
154+
| Reserved | Pwr | Reserved | ST | State ID |
155+
| (must be 0) | Level| (must be 0) | | (platform-defined) |
156+
+---------------+------+----------------+----+----------------------+
157+
158+
.. list-table:: Standard Format Bit Fields
159+
:widths: 20 80
160+
:header-rows: 1
161+
162+
* - Bit Field
163+
- Description
164+
165+
* - **[31:26]**
166+
- **Reserved**: Must be zero.
167+
168+
* - **[25:24]**
169+
- **Power Level**: Indicates the deepest power domain level that can be powered down.
170+
171+
* ``0``: CPU/Core level
172+
* ``1``: Cluster level
173+
* ``2``: System level
174+
* ``3``: Higher levels (platform-specific)
175+
176+
* - **[23:17]**
177+
- **Reserved**: Must be zero.
178+
179+
* - **[16]**
180+
- **State Type (ST)**: Type of power state.
181+
182+
* ``0``: Standby or Retention (low latency, context preserved)
183+
* ``1``: Power Down (higher latency, may lose context)
184+
185+
* - **[15:0]**
186+
- **State ID**: Platform-specific identifier for the requested power state. The OS and
187+
platform firmware must agree on the meaning of these values, typically defined through
188+
device tree bindings.
189+
190+
**OSI Mode Consideration:**
191+
192+
In OSI mode, the OS is responsible for tracking which cores are idle. When the last core
193+
in a cluster issues this `CPU_SUSPEND` call with Power Level = 1, the PSCI firmware:
194+
195+
1. Verifies that all other cores in the cluster are already in a low-power state
196+
2. If verified, powers down the entire cluster
197+
3. If not verified (race condition), denies the request with an error code
198+
199+
The State ID field is platform-defined and typically documented in the device tree
200+
``idle-state`` nodes using the ``arm,psci-suspend-param`` property. This mechanism,
201+
leveraging ``cpuidle`` and ``s2idle``, allows the kernel to abstract complex platform-specific
202+
low-power modes into a generic framework. The ``idle-state`` nodes in the Device Tree define these power states,
203+
including their entry/exit latencies and target power consumption, enabling the ``cpuidle`` governor to make informed
204+
decisions about which idle state to enter based on system load and predicted idle duration.
205+
206+
The ``arm,psci-suspend-param`` property then directly maps these idle states to the corresponding PSCI ``power_state`` parameter values that the firmware understands.
207+
208+
Example: System Suspend (Standard Format)
209+
=========================================
210+
211+
When the OS targets a system-wide suspend state (e.g., Suspend-to-RAM), the `power_state` parameter is constructed to target the highest power level.
212+
Consider the example value **0x02012234**:
213+
214+
.. list-table:: Power State Parameter Breakdown (0x02012234)
215+
:widths: 20 20 20 40
216+
:header-rows: 1
217+
218+
* - Field
219+
- Bits
220+
- Value
221+
- Meaning
222+
223+
* - Reserved
224+
- [31:26]
225+
- 0
226+
- Must be zero
227+
228+
* - Power Level
229+
- [25:24]
230+
- 2
231+
- System level
232+
233+
* - Reserved
234+
- [23:17]
235+
- 0
236+
- Must be zero
237+
238+
* - State Type
239+
- [16]
240+
- 1
241+
- Power Down
242+
243+
* - State ID
244+
- [15:0]
245+
- 0x2234
246+
- Platform-specific (e.g., "S2RAM")
247+
248+
**Interpretation:**
249+
250+
* **Power Level = 2** tells the firmware that a system-level transition is requested.
251+
* **State Type = 1** indicates a power-down state.
252+
* **State ID = 0x2234** is the platform-specific identifier for this system state.
253+
254+
In the context of **s2idle**, if the OS determines that all constraints are met for system suspension,
255+
the last active CPU (Last Man) will invoke `CPU_SUSPEND` with this parameter. The PSCI firmware then
256+
coordinates the final steps to suspend the system (e.g., placing DDR in self-refresh and powering down the SoC).
257+
258+
**********************************
259+
S2Idle vs Deep Sleep (mem)
260+
**********************************
261+
262+
The Linux kernel has sleep states that are global low-power states of the entire system in which user space
263+
code cannot be executed and the overall system activity is significantly reduced.
264+
There's different types of sleep states as mentioned in it's
265+
`documentation<https://docs.kernel.org/admin-guide/pm/sleep-states.html>`__.
266+
System sleep states can be selected using the sysfs entry :file:`/sys/kernel/mem_sleep`
267+
268+
On TI K3 AM62L platform, we currently support the ``s2idle`` and ``deep`` states.
269+
Both of them can achieve similar power savings (e.g., by suspending to RAM / putting DDR into Self-Refresh).
270+
The primary differences lie in the software execution flow, specifically how CPUs are managed and which
271+
PSCI APIs are invoked.
272+
273+
.. list-table:: S2Idle vs Deep Sleep
274+
:widths: 20 40 40
275+
:header-rows: 1
276+
277+
* - Feature
278+
- s2idle (Suspend-to-Idle)
279+
- deep (Suspend-to-RAM)
280+
281+
* - **Kernel String**
282+
- ``s2idle`` or ``freeze``
283+
- ``deep`` or ``mem``
284+
285+
* - **Non-boot CPUs**
286+
- **Online**: Non-boot CPUs are put into a deep idle state but remain logically online.
287+
- **Offline**: Non-boot CPUs are hot-unplugged (removed) from the system via ``CPU_OFF``.
288+
289+
* - **Entry Path**
290+
- **cpuidle**: Uses the standard CPUidle framework. Additionally, each driver is made idle by calling respective runtime suspend hooks.
291+
- **suspend_ops**: Uses platform-specific suspend operations like each driver's suspend ops and finally the `PSCI_SYSTEM_SUSPEND` is called.
292+
No governors exist to make any decisions.
293+
294+
* - **PSCI Call**
295+
- ``CPU_SUSPEND``: Invoked for every core (Last Man Standing logic coordinates the cluster/system depth).
296+
- ``SYSTEM_SUSPEND``: Typically invoked by the last active CPU after others are offlined.
297+
298+
* - **Resume Flow**
299+
- **Fast**: CPUs exit the idle loop immediately upon interrupt. Context is preserved.
300+
- **Slow**: Kernel must serially bring secondary CPUs back online (Hotplug). Kernel must recreate
301+
threads, re-enable interrupts, resume each driver and restore per-CPU state for every non-boot core.
302+
303+
* - **Latency**
304+
- Lower
305+
- High, primarily due to the overhead of **CPU Hotplug** for non-boot CPUs
306+

source/linux/Foundational_Components_Power_Management.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Power Management
1414
Foundational_Components/Power_Management/pm_low_power_modes
1515
Foundational_Components/Power_Management/pm_am62lx_low_power_modes
1616
Foundational_Components/Power_Management/pm_low_power_modes_socoff
17+
Foundational_Components/Power_Management/pm_psci_s2idle
1718
Foundational_Components/Power_Management/pm_wakeup_sources
1819
Foundational_Components/Power_Management/pm_sw_arch
1920
Foundational_Components/Power_Management/pm_debug

0 commit comments

Comments
 (0)