Skip to content

Commit 8d02b89

Browse files
committed
feat(linux): Add s2idle docs
Introduce the concept of s2idle and how we use it for mode selection Signed-off-by: Dhruva Gole <[email protected]>
1 parent 6bb8848 commit 8d02b89

File tree

3 files changed

+312
-0
lines changed

3 files changed

+312
-0
lines changed

configs/AM62LX/AM62LX_linux_toc.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ linux/Foundational_Components_Power_Management
7676
linux/Foundational_Components/Power_Management/pm_overview
7777
linux/Foundational_Components/Power_Management/pm_cpuidle
7878
linux/Foundational_Components/Power_Management/pm_am62lx_low_power_modes
79+
linux/Foundational_Components/Power_Management/pm_psci_s2idle
7980
linux/Foundational_Components/Power_Management/pm_wakeup_sources
8081
linux/Foundational_Components/Power_Management/pm_am62lx_debug
8182

Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
.. _pm_s2idle_psci:
2+
3+
#############################################
4+
Suspend-to-Idle (S2Idle) and PSCI Integration
5+
#############################################
6+
7+
**********************************
8+
Suspend-to-Idle (S2Idle) Overview
9+
**********************************
10+
11+
Suspend-to-Idle (s2idle), also known as "freeze," is a generic, pure software, light-weight variant of system suspend.
12+
In this state, the Linux kernel freezes user space tasks, suspends devices, and then puts all CPUs into their deepest available idle state.
13+
14+
*******************
15+
PSCI as the Enabler
16+
*******************
17+
18+
The Power State Coordination Interface (PSCI) is an ARM-defined standard that acts as the fundamental
19+
enabler for s2idle on all ARM platforms that support it. PSCI defines a standardized firmware interface that allows the
20+
Operating System (OS) to request power states without needing intimate knowledge of the underlying
21+
SoC.
22+
23+
**s2idle Call Flow:**
24+
25+
.. code-block:: text
26+
27+
Linux Kernel PSCI Firmware (TF-A)
28+
============ ====================
29+
30+
1. Freeze tasks
31+
|
32+
v
33+
2. Suspend devices
34+
|
35+
v
36+
3. cpuidle framework -----------> CPU_SUSPEND
37+
(per CPU) |
38+
| v
39+
| Coordinate power
40+
| state requests
41+
| |
42+
| v
43+
| CPU enters low-power
44+
| hardware state
45+
| |
46+
| V
47+
| Wakeup event (eg. RTC)
48+
|<--------- Resume ---------
49+
|
50+
v
51+
4. Resume devices
52+
|
53+
v
54+
5. Thaw tasks
55+
56+
The `cpuidle` framework calls the PSCI `CPU_SUSPEND` API to transition each CPU individually into respective low-power states.
57+
The effectiveness of s2idle depends heavily on the PSCI implementation's ability to coordinate these
58+
requests and enter the deepest possible hardware state.
59+
60+
************************
61+
OS Initiated (OSI) Mode
62+
************************
63+
64+
PSCI 1.0 introduced **OS Initiated (OSI)** mode, which shifts the responsibility of power state coordination from the platform firmware to the Operating System.
65+
In the default **Platform Coordinated (PC)** mode, the OS independently requests a state for each core. The firmware then aggregates these requests (voting) to
66+
determine if a cluster or the system can be powered down.
67+
68+
In **OS Initiated (OSI)** mode, the OS explicitly manages the hierarchy. The OS determines when the last core in a power domain (e.g., a cluster) is going idle
69+
and explicitly requests the power-down of that domain.
70+
71+
Why OSI?
72+
========
73+
74+
OSI mode allows the OS to make better power decisions because it has visibility into:
75+
76+
* **Task Scheduling:** The OS knows when other cores will wake up.
77+
* **Wakeup Latencies:** The OS can respect Quality of Service (QoS) latency constraints more accurately.
78+
* **Usage Patterns:** The OS can predict idle duration better than firmware.
79+
80+
OSI Sequence
81+
============
82+
83+
The coordination in OSI mode follows a specific "Last Man Standing" sequence. The OS tracks the state of all cores in a topology node (e.g., a cluster).
84+
85+
.. code-block:: text
86+
87+
OSI "Last Man Standing" Flow
88+
89+
Cluster with 2 Cores OS Action PSCI Request
90+
==================== ========= =============
91+
92+
1. Core 0,1: ACTIVE
93+
|
94+
| Core 0 becomes idle
95+
v
96+
2. Core 0: IDLE --> OS requests local --> CPU_SUSPEND
97+
Core 1: ACTIVE Core Power Down (Core PD only)
98+
Cluster stays ON
99+
|
100+
| Core 1 (LAST) becomes idle
101+
v
102+
3. Core 0,1: IDLE --> OS recognizes --> CPU_SUSPEND
103+
"Last Man" scenario (Composite State)
104+
Requests Composite:
105+
- Core 1: PD Core: PD
106+
- Cluster: PD Cluster: PD
107+
- System: PD System: PD
108+
|
109+
v
110+
4. Firmware Verification --> PSCI firmware checks
111+
& System Power Down all cores/clusters idle
112+
If verified: Power down
113+
entire system
114+
If not: Deny request
115+
(race condition)
116+
117+
**Detailed Steps:**
118+
119+
1. **First Core Idle:** When the first core in a cluster goes idle, the OS requests a local idle state
120+
for that core (e.g., Core Power Down) but keeps the cluster running.
121+
122+
2. **Last Core Idle:** When the *last* active core in the cluster is ready to go idle, the OS recognizes
123+
that the entire cluster, and potentially the system, can now be powered down.
124+
125+
3. **Composite Request:** The last core issues a `CPU_SUSPEND` call that requests a **composite state**:
126+
127+
* **Core State:** Power Down
128+
* **Cluster State:** Power Down
129+
* **System State:** Power Down (as demonstrated in the diagram)
130+
131+
4. **Firmware Enforcement:** The PSCI firmware verifies that all other cores and clusters in the requested node are indeed idle.
132+
If they are not, the request is denied (to prevent race conditions).
133+
134+
***********************************
135+
Understanding the Suspend Parameter
136+
***********************************
137+
138+
The `power_state` parameter passed to `CPU_SUSPEND` is the key to requesting these states.
139+
In OSI mode, this parameter must encode the intent for the entire hierarchy.
140+
141+
Power State Parameter Encoding
142+
================================
143+
144+
The `power_state` is a 32-bit parameter defined by the ARM PSCI specification (ARM DEN0022C).
145+
It has two encoding formats, controlled by the platform's build configuration.
146+
147+
Standard Format
148+
===============
149+
150+
This is the default format used by most platforms:
151+
152+
.. code-block:: text
153+
154+
31 26 25 24 23 17 16 15 0
155+
+---------------+------+----------------+----+----------------------+
156+
| Reserved | Pwr | Reserved | ST | State ID |
157+
| (must be 0) | Level| (must be 0) | | (platform-defined) |
158+
+---------------+------+----------------+----+----------------------+
159+
160+
.. list-table:: Standard Format Bit Fields
161+
:widths: 20 80
162+
:header-rows: 1
163+
164+
* - Bit Field
165+
- Description
166+
167+
* - **[31:26]**
168+
- **Reserved**: Must be zero.
169+
170+
* - **[25:24]**
171+
- **Power Level**: Indicates the deepest power domain level that can be powered down.
172+
173+
* ``0``: CPU/Core level
174+
* ``1``: Cluster level
175+
* ``2``: System level
176+
* ``3``: Higher levels (platform-specific)
177+
178+
* - **[23:17]**
179+
- **Reserved**: Must be zero.
180+
181+
* - **[16]**
182+
- **State Type (ST)**: Type of power state.
183+
184+
* ``0``: Standby or Retention (low latency, context preserved)
185+
* ``1``: Power Down (higher latency, may lose context)
186+
187+
* - **[15:0]**
188+
- **State ID**: Platform-specific identifier for the requested power state. The OS and
189+
platform firmware must agree on the meaning of these values, typically defined through
190+
device tree bindings.
191+
192+
**OSI Mode Consideration:**
193+
194+
In OSI mode, the OS is responsible for tracking which cores are idle. When the last core
195+
in a cluster issues this `CPU_SUSPEND` call with Power Level = 1, the PSCI firmware:
196+
197+
1. Verifies that all other cores in the cluster are already in a low-power state
198+
2. If verified, powers down the entire cluster
199+
3. If not verified (race condition), denies the request with an error code
200+
201+
The State ID field is platform-defined and typically documented in the device tree
202+
``idle-state`` nodes using the ``arm,psci-suspend-param`` property. This mechanism,
203+
leveraging ``cpuidle`` and ``s2idle``, allows the kernel to abstract complex platform-specific
204+
low-power modes into a generic framework. The ``idle-state`` nodes in the Device Tree define these power states,
205+
including their entry/exit latencies and target power consumption, enabling the ``cpuidle`` governor to make informed
206+
decisions about which idle state to enter based on system load and predicted idle duration.
207+
It's worth noting however that when we go via the `s2idle` path, where the user initiates the suspend to idle,
208+
then the kernel is designed to pick the deepest possible idle state always.
209+
210+
The ``arm,psci-suspend-param`` property then directly maps these idle states to the corresponding PSCI ``power_state`` parameter values that the firmware understands.
211+
212+
Example: System Suspend (Standard Format)
213+
=========================================
214+
215+
When the OS targets a system-wide suspend state (e.g., Suspend-to-RAM), the `power_state` parameter is constructed to target the highest power level.
216+
Consider the example value **0x02012234**:
217+
218+
.. list-table:: Power State Parameter Breakdown (0x02012234)
219+
:widths: 20 20 20 40
220+
:header-rows: 1
221+
222+
* - Field
223+
- Bits
224+
- Value
225+
- Meaning
226+
227+
* - Reserved
228+
- [31:26]
229+
- 0
230+
- Must be zero
231+
232+
* - Power Level
233+
- [25:24]
234+
- 2
235+
- System level
236+
237+
* - Reserved
238+
- [23:17]
239+
- 0
240+
- Must be zero
241+
242+
* - State Type
243+
- [16]
244+
- 1
245+
- Power Down
246+
247+
* - State ID
248+
- [15:0]
249+
- 0x2234
250+
- Platform-specific (e.g., "S2RAM")
251+
252+
**Interpretation:**
253+
254+
* **Power Level = 2** tells the firmware that a system-level transition is requested.
255+
* **State Type = 1** indicates a power-down state.
256+
* **State ID = 0x2234** is the platform-specific identifier for this system state.
257+
258+
In the context of **s2idle**, if the OS determines that all constraints are met for system suspension,
259+
the last active CPU (Last Man) will invoke `CPU_SUSPEND` with this parameter. The PSCI firmware then
260+
coordinates the final steps to suspend the system (e.g., placing DDR in self-refresh and powering down the SoC).
261+
262+
**********************************
263+
S2Idle vs Deep Sleep (mem)
264+
**********************************
265+
266+
The Linux kernel has sleep states that are global low-power states of the entire system in which user space
267+
code cannot be executed and the overall system activity is significantly reduced.
268+
There's different types of sleep states as mentioned in it's
269+
`documentation <https://docs.kernel.org/admin-guide/pm/sleep-states.html>`__.
270+
System sleep states can be selected using the sysfs entry :file:`/sys/kernel/mem_sleep`
271+
272+
On TI K3 AM62L platform, we currently support the ``s2idle`` and ``deep`` states.
273+
Both of them can achieve similar power savings (e.g., by suspending to RAM / putting DDR into Self-Refresh).
274+
The primary differences lie in the software execution flow, specifically how CPUs are managed and which
275+
PSCI APIs are invoked.
276+
277+
.. list-table:: S2Idle vs Deep Sleep
278+
:widths: 20 40 40
279+
:header-rows: 1
280+
281+
* - Feature
282+
- s2idle (Suspend-to-Idle)
283+
- deep (Suspend-to-RAM)
284+
285+
* - **Kernel String**
286+
- ``s2idle`` or ``freeze``
287+
- ``deep`` or ``mem``
288+
289+
* - **Non-boot CPUs**
290+
- **Online**: Non-boot CPUs are put into a deep idle state but remain logically online.
291+
- **Offline**: Non-boot CPUs are hot-unplugged (removed) from the system via ``CPU_OFF``.
292+
293+
* - **Entry Path**
294+
- **cpuidle**: Uses the standard CPUidle framework. Additionally, each driver is made idle by calling respective runtime suspend hooks.
295+
- **suspend_ops**: Uses platform-specific suspend operations like each driver's suspend ops and finally the `PSCI_SYSTEM_SUSPEND` is called.
296+
No governors exist to make any decisions.
297+
298+
* - **PSCI Call**
299+
- ``CPU_SUSPEND``: Invoked for every core (Last Man Standing logic coordinates the cluster/system depth).
300+
- ``SYSTEM_SUSPEND``: Typically invoked by the last active CPU after others are offlined.
301+
302+
* - **Resume Flow**
303+
- **Fast**: CPUs exit the idle loop immediately upon interrupt. Context is preserved.
304+
- **Slow**: Kernel must serially bring secondary CPUs back online (Hotplug). Kernel must recreate
305+
threads, re-enable interrupts, resume each driver and restore per-CPU state for every non-boot core.
306+
307+
* - **Latency**
308+
- Lower
309+
- High, primarily due to the overhead of **CPU Hotplug** for non-boot CPUs
310+

source/linux/Foundational_Components_Power_Management.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Power Management
1414
Foundational_Components/Power_Management/pm_low_power_modes
1515
Foundational_Components/Power_Management/pm_am62lx_low_power_modes
1616
Foundational_Components/Power_Management/pm_low_power_modes_socoff
17+
Foundational_Components/Power_Management/pm_psci_s2idle
1718
Foundational_Components/Power_Management/pm_wakeup_sources
1819
Foundational_Components/Power_Management/pm_sw_arch
1920
Foundational_Components/Power_Management/pm_debug

0 commit comments

Comments
 (0)