Skip to content

Commit 4a0b798

Browse files
committed
feat(linux): Add s2idle docs
Signed-off-by: Dhruva Gole <[email protected]>
1 parent fecba63 commit 4a0b798

File tree

3 files changed

+318
-0
lines changed

3 files changed

+318
-0
lines changed

configs/AM62LX/AM62LX_linux_toc.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ linux/Foundational_Components_Power_Management
7676
linux/Foundational_Components/Power_Management/pm_overview
7777
linux/Foundational_Components/Power_Management/pm_cpuidle
7878
linux/Foundational_Components/Power_Management/pm_am62lx_low_power_modes
79+
linux/Foundational_Components/Power_Management/pm_psci_s2idle
7980
linux/Foundational_Components/Power_Management/pm_wakeup_sources
8081
linux/Foundational_Components/Power_Management/pm_am62lx_debug
8182

Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
.. _pm_s2idle_psci:
2+
3+
#############################################
4+
Suspend-to-Idle (S2Idle) and PSCI Integration
5+
#############################################
6+
7+
**********************************
8+
Suspend-to-Idle (S2Idle) Overview
9+
**********************************
10+
11+
Suspend-to-Idle (s2idle), also known as "freeze," is a generic, pure software, light-weight variant of system suspend.
12+
In this state, the Linux kernel freezes user space tasks, suspends devices, and then puts all CPUs into their deepest available idle state.
13+
14+
*******************
15+
PSCI as the Enabler
16+
*******************
17+
18+
The Power State Coordination Interface (PSCI) is an ARM-defined standard that acts as the fundamental
19+
enabler for s2idle on all ARM platforms that support it. PSCI defines a standardized firmware interface that allows the
20+
Operating System (OS) to request power states without needing intimate knowledge of the underlying
21+
SoC.
22+
23+
**s2idle Call Flow:**
24+
25+
.. code-block:: text
26+
27+
Linux Kernel PSCI Firmware (TF-A)
28+
============ ====================
29+
30+
1. Freeze tasks
31+
|
32+
v
33+
2. Suspend devices
34+
|
35+
v
36+
3. cpuidle driver -----------> CPU_SUSPEND (SMC)
37+
(per CPU) |
38+
| v
39+
| Coordinate power
40+
| state requests
41+
| |
42+
| v
43+
| CPU enters low-power
44+
| hardware state
45+
|
46+
|<--------- Resume ---------
47+
|
48+
v
49+
4. Resume devices
50+
|
51+
v
52+
5. Thaw tasks
53+
54+
The `cpuidle` driver calls the PSCI `CPU_SUSPEND` API to transition the CPUs into a low-power state.
55+
The effectiveness of s2idle depends heavily on the PSCI implementation's ability to coordinate these
56+
requests and enter the deepest possible hardware state.
57+
58+
What is an SMC Call?
59+
====================
60+
61+
The diagram references **SMC (Secure Monitor Call)**. This is an ARM instruction used to generate a synchronous exception that is taken to Exception Level 3 (EL3).
62+
This is how it works:
63+
64+
* Linux runs at EL1 (Kernel).
65+
* Trusted Firmware-A (TF-A) runs at EL3 (Secure Monitor).
66+
* When the kernel executes ``smc`` instruction, control is transferred to EL3, allowing the firmware to perform privileged power management operations.
67+
68+
************************
69+
OS Initiated (OSI) Mode
70+
************************
71+
72+
PSCI 1.0 introduced **OS Initiated (OSI)** mode, which shifts the responsibility of power state coordination from the platform firmware to the Operating System.
73+
74+
In the default **Platform Coordinated (PC)** mode, the OS independently requests a state for each core. The firmware then aggregates these requests (voting) to
75+
determine if a cluster or the system can be powered down.
76+
77+
In **OS Initiated (OSI)** mode, the OS explicitly manages the hierarchy. The OS determines when the last core in a power domain (e.g., a cluster) is going idle
78+
and explicitly requests the power-down of that domain.
79+
80+
Why OSI?
81+
========
82+
83+
OSI mode allows the OS to make better power decisions because it has visibility into:
84+
* **Task Scheduling:** The OS knows when other cores will wake up.
85+
* **Wakeup Latencies:** The OS can respect Quality of Service (QoS) latency constraints more accurately.
86+
* **Usage Patterns:** The OS can predict idle duration better than firmware.
87+
88+
OSI Sequence
89+
============
90+
91+
The coordination in OSI mode follows a specific "Last Man Standing" sequence. The OS tracks the state of all cores in a topology node (e.g., a cluster).
92+
93+
.. code-block:: text
94+
95+
OSI "Last Man Standing" Flow
96+
97+
Cluster with 2 Cores OS Action PSCI Request
98+
==================== ========= =============
99+
100+
1. Core 0,1: ACTIVE
101+
|
102+
| Core 0 becomes idle
103+
v
104+
2. Core 0: IDLE --> OS requests local --> CPU_SUSPEND
105+
Core 1: ACTIVE Core Power Down (Core PD only)
106+
Cluster stays ON
107+
|
108+
| Core 1 (LAST) becomes idle
109+
v
110+
3. Core 0,1: IDLE --> OS recognizes --> CPU_SUSPEND
111+
"Last Man" scenario (Composite State)
112+
Requests Composite:
113+
- Core 1: PD Core: PD
114+
- Cluster: PD Cluster: PD
115+
- System: PD System: PD
116+
|
117+
v
118+
4. Firmware Verification --> PSCI firmware checks
119+
& System Power Down all cores/clusters idle
120+
If verified: Power down
121+
entire system
122+
If not: Deny request
123+
(race condition)
124+
125+
**Detailed Steps:**
126+
127+
1. **First Core Idle:** When the first core in a cluster goes idle, the OS requests a local idle state
128+
for that core (e.g., Core Power Down) but keeps the cluster running.
129+
130+
2. **Last Core Idle:** When the *last* active core in the cluster is ready to go idle, the OS recognizes
131+
that the entire cluster, and potentially the system, can now be powered down.
132+
133+
3. **Composite Request:** The last core issues a `CPU_SUSPEND` call that requests a **composite state**:
134+
135+
* **Core State:** Power Down
136+
* **Cluster State:** Power Down
137+
* **System State:** Power Down (as demonstrated in the diagram)
138+
139+
4. **Firmware Enforcement:** The PSCI firmware verifies that all other cores and clusters in the requested node are indeed idle.
140+
If they are not, the request is denied (to prevent race conditions).
141+
142+
***********************************
143+
Understanding the Suspend Parameter
144+
***********************************
145+
146+
The `power_state` parameter passed to `CPU_SUSPEND` is the key to requesting these states.
147+
In OSI mode, this parameter must encode the intent for the entire hierarchy.
148+
149+
Power State Parameter Encoding
150+
================================
151+
152+
The `power_state` is a 32-bit parameter defined by the ARM PSCI specification (ARM DEN0022C).
153+
It has two encoding formats, controlled by the platform's build configuration.
154+
155+
Standard Format
156+
===============
157+
158+
This is the default format used by most platforms:
159+
160+
.. code-block:: text
161+
162+
31 26 25 24 23 17 16 15 0
163+
+---------------+------+----------------+----+----------------------+
164+
| Reserved | Pwr | Reserved | ST | State ID |
165+
| (must be 0) | Level| (must be 0) | | (platform-defined) |
166+
+---------------+------+----------------+----+----------------------+
167+
168+
.. list-table:: Standard Format Bit Fields
169+
:widths: 20 80
170+
:header-rows: 1
171+
172+
* - Bit Field
173+
- Description
174+
175+
* - **[31:26]**
176+
- **Reserved**: Must be zero.
177+
178+
* - **[25:24]**
179+
- **Power Level**: Indicates the deepest power domain level that can be powered down.
180+
181+
* ``0``: CPU/Core level
182+
* ``1``: Cluster level
183+
* ``2``: System level
184+
* ``3``: Higher levels (platform-specific)
185+
186+
* - **[23:17]**
187+
- **Reserved**: Must be zero.
188+
189+
* - **[16]**
190+
- **State Type (ST)**: Type of power state.
191+
192+
* ``0``: Standby or Retention (low latency, context preserved)
193+
* ``1``: Power Down (higher latency, may lose context)
194+
195+
* - **[15:0]**
196+
- **State ID**: Platform-specific identifier for the requested power state. The OS and
197+
platform firmware must agree on the meaning of these values, typically defined through
198+
device tree bindings.
199+
200+
**OSI Mode Consideration:**
201+
202+
In OSI mode, the OS is responsible for tracking which cores are idle. When the last core
203+
in a cluster issues this `CPU_SUSPEND` call with Power Level = 1, the PSCI firmware:
204+
205+
1. Verifies that all other cores in the cluster are already in a low-power state
206+
2. If verified, powers down the entire cluster
207+
3. If not verified (race condition), denies the request with an error code
208+
209+
The State ID field is platform-defined and typically documented in the device tree
210+
``idle-state`` nodes using the ``arm,psci-suspend-param`` property. This mechanism,
211+
leveraging ``cpuidle`` and ``s2idle``, allows the kernel to abstract complex platform-specific
212+
low-power modes into a generic framework. The ``idle-state`` nodes in the Device Tree define these power states,
213+
including their entry/exit latencies and target power consumption, enabling the ``cpuidle`` governor to make informed
214+
decisions about which idle state to enter based on system load and predicted idle duration.
215+
216+
The ``arm,psci-suspend-param`` property then directly maps these idle states to the corresponding PSCI ``power_state`` parameter values that the firmware understands.
217+
218+
Example: System Suspend (Standard Format)
219+
=========================================
220+
221+
When the OS targets a system-wide suspend state (e.g., Suspend-to-RAM), the `power_state` parameter is constructed to target the highest power level.
222+
Consider the example value **0x02012234**:
223+
224+
.. list-table:: Power State Parameter Breakdown (0x02012234)
225+
:widths: 20 20 20 40
226+
:header-rows: 1
227+
228+
* - Field
229+
- Bits
230+
- Value
231+
- Meaning
232+
233+
* - Reserved
234+
- [31:26]
235+
- 0
236+
- Must be zero
237+
238+
* - Power Level
239+
- [25:24]
240+
- 2
241+
- System level
242+
243+
* - Reserved
244+
- [23:17]
245+
- 0
246+
- Must be zero
247+
248+
* - State Type
249+
- [16]
250+
- 1
251+
- Power Down
252+
253+
* - State ID
254+
- [15:0]
255+
- 0x2234
256+
- Platform-specific (e.g., "S2RAM")
257+
258+
**Interpretation:**
259+
260+
* **Power Level = 2** tells the firmware that a system-level transition is requested.
261+
* **State Type = 1** indicates a power-down state.
262+
* **State ID = 0x2234** is the platform-specific identifier for this system state.
263+
264+
In the context of **s2idle**, if the OS determines that all constraints are met for system suspension,
265+
the last active CPU (Last Man) will invoke `CPU_SUSPEND` with this parameter. The PSCI firmware then
266+
coordinates the final steps to suspend the system (e.g., placing DDR in self-refresh and powering down the SoC).
267+
268+
**********************************
269+
S2Idle vs Deep Sleep (mem)
270+
**********************************
271+
272+
The Linux kernel has sleep states that are global low-power states of the entire system in which user space
273+
code cannot be executed and the overall system activity is significantly reduced.
274+
There's different types of sleep states as mentioned in it's
275+
`documentation<https://docs.kernel.org/admin-guide/pm/sleep-states.html>`__.
276+
System sleep states can be selected using the sysfs entry :file:`/sys/kernel/mem_sleep`
277+
278+
On TI K3 AM62L platform, we currently support the ``s2idle`` and ``deep`` states.
279+
Both of them can achieve similar power savings (e.g., by suspending to RAM / putting DDR into Self-Refresh).
280+
The primary differences lie in the software execution flow, specifically how CPUs are managed and which
281+
PSCI APIs are invoked.
282+
283+
.. list-table:: S2Idle vs Deep Sleep
284+
:widths: 20 40 40
285+
:header-rows: 1
286+
287+
* - Feature
288+
- s2idle (Suspend-to-Idle)
289+
- deep (Suspend-to-RAM)
290+
291+
* - **Kernel String**
292+
- ``s2idle`` or ``freeze``
293+
- ``deep`` or ``mem``
294+
295+
* - **Non-boot CPUs**
296+
- **Online**: Non-boot CPUs are put into a deep idle state but remain logically online.
297+
- **Offline**: Non-boot CPUs are hot-unplugged (removed) from the system via ``CPU_OFF``.
298+
299+
* - **Entry Path**
300+
- **cpuidle**: Uses the standard CPUidle framework. Additionally, each driver is made idle by calling respective runtime suspend hooks.
301+
- **suspend_ops**: Uses platform-specific suspend operations like each driver's suspend ops and finally the `PSCI_SYSTEM_SUSPEND` is called.
302+
No governors exist to make any decisions.
303+
304+
* - **PSCI Call**
305+
- ``CPU_SUSPEND``: Invoked for every core (Last Man Standing logic coordinates the cluster/system depth).
306+
- ``SYSTEM_SUSPEND``: Typically invoked by the last active CPU after others are offlined.
307+
308+
* - **Resume Flow**
309+
- **Fast**: CPUs exit the idle loop immediately upon interrupt. Context is preserved.
310+
- **Slow**: Kernel must serially bring secondary CPUs back online (Hotplug). Kernel must recreate
311+
threads, re-enable interrupts, resume each driver and restore per-CPU state for every non-boot core.
312+
313+
* - **Latency**
314+
- Lower
315+
- High, primarily due to the overhead of **CPU Hotplug** for non-boot CPUs
316+

source/linux/Foundational_Components_Power_Management.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Power Management
1414
Foundational_Components/Power_Management/pm_low_power_modes
1515
Foundational_Components/Power_Management/pm_am62lx_low_power_modes
1616
Foundational_Components/Power_Management/pm_low_power_modes_socoff
17+
Foundational_Components/Power_Management/pm_psci_s2idle
1718
Foundational_Components/Power_Management/pm_wakeup_sources
1819
Foundational_Components/Power_Management/pm_sw_arch
1920
Foundational_Components/Power_Management/pm_debug

0 commit comments

Comments
 (0)