Skip to content

Commit 6209735

Browse files
committed
feat(linux): s2idle: Document the mode selection logic
Document the mode selection logic using the s2idle flow Signed-off-by: Dhruva Gole <d-gole@ti.com>
1 parent 6448b30 commit 6209735

File tree

1 file changed

+385
-0
lines changed

1 file changed

+385
-0
lines changed

source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

Lines changed: 385 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,3 +309,388 @@ PSCI APIs are invoked.
309309
- Lower
310310
- High, primarily due to the overhead of **CPU Hotplug** for non-boot CPUs
311311

312+
**********************************************
313+
Low Power Mode Selection in S2Idle (OSI Mode)
314+
**********************************************
315+
316+
S2Idle with OSI mode enables sophisticated low-power mode selection based on system constraints,
317+
power domain hierarchy, and predicted idle duration. The system can automatically select between
318+
multiple low-power modes without user intervention, adapting to the runtime requirements.
319+
320+
Power Domain Hierarchy in Device Tree
321+
======================================
322+
323+
The power domain hierarchy in the device tree defines how different system components are grouped
324+
and how their power states are coordinated. This hierarchical structure is fundamental to OSI mode's
325+
"Last Man Standing" logic.
326+
327+
**Hierarchical Structure:**
328+
329+
.. code-block:: text
330+
331+
MAIN_PD (System Level)
332+
333+
├──> CLUSTER_PD (Cluster Level)
334+
│ │
335+
│ ├──> CPU_PD (CPU Level)
336+
│ │ ├──> CPU0
337+
│ │ └──> CPU1
338+
│ │
339+
│ └──> Cluster-sensitive peripherals
340+
│ ├──> CPSW3G (Ethernet)
341+
│ └──> DSS0 (Display)
342+
343+
└──> Main domain peripherals
344+
├──> UART, I2C, SPI controllers
345+
├──> Timers
346+
├──> SDHCI controllers
347+
└──> USB controllers
348+
349+
**Device Tree Implementation:**
350+
351+
In the Device Tree, this hierarchy is established through power domain mappings:
352+
353+
.. code-block:: dts
354+
355+
&psci {
356+
CPU_PD: power-controller-cpu {
357+
#power-domain-cells = <0>;
358+
power-domains = <&CLUSTER_PD>;
359+
domain-idle-states = <&cpu_sleep_0>, <&cpu_sleep_1>;
360+
};
361+
362+
CLUSTER_PD: power-controller-cluster {
363+
#power-domain-cells = <0>;
364+
domain-idle-states = <&cluster_sleep_0>;
365+
power-domains = <&MAIN_PD>;
366+
};
367+
368+
MAIN_PD: power-controller-main {
369+
#power-domain-cells = <0>;
370+
domain-idle-states = <&main_sleep_deep>, <&main_sleep_rtcddr>;
371+
};
372+
};
373+
374+
**Why Domain Grouping is Needed:**
375+
376+
The domain grouping serves several critical purposes:
377+
378+
1. **Hardware Dependency Management**: Certain peripherals must remain active for specific low-power
379+
states. For example, DDR controllers must remain operational in RTC+DDR mode, but can be powered
380+
down in Deep Sleep mode.
381+
382+
2. **Constraint Propagation**: When a device in the CLUSTER_PD is active (e.g., Display Subsystem),
383+
the cluster cannot enter its deepest idle state. The constraint propagates up the hierarchy,
384+
preventing both CLUSTER_PD and MAIN_PD from entering deeper states.
385+
386+
3. **Automatic Mode Selection**: The cpuidle framework uses the hierarchy to automatically select
387+
the deepest possible state. If any device in a power domain is active or has latency constraints,
388+
shallower states are automatically chosen.
389+
390+
4. **Race Condition Prevention**: The hierarchy ensures that the PSCI firmware can verify all
391+
components in a domain are truly idle before powering down that domain.
392+
393+
**Peripheral Power Domain Mapping:**
394+
395+
The ``power-domain-map`` property explicitly assigns peripherals to power domains:
396+
397+
.. code-block:: dts
398+
399+
&scmi_pds {
400+
power-domain-map = <3 &CLUSTER_PD>, /* CPSW3G Ethernet */
401+
<39 &CLUSTER_PD>, /* DSS0 Display */
402+
<38 &CLUSTER_PD>, /* DSS_DSI0 */
403+
<15 &MAIN_PD>, /* TIMER0 */
404+
<26 &MAIN_PD>, /* SDHCI1 */
405+
<89 &MAIN_PD>, /* UART0 */
406+
<95 &MAIN_PD>; /* USBSS0 */
407+
};
408+
409+
This mapping ensures that when the Display (DSS0) is active, the system won't enter states that
410+
would cause DDR Auto Self-Refresh issues. Similarly, active UART or USB connections prevent
411+
deeper system states that would disconnect those interfaces.
412+
413+
Role in Mode Selection
414+
=======================
415+
416+
During s2idle entry, the cpuidle framework traverses the power domain hierarchy from bottom to top:
417+
418+
.. code-block:: text
419+
420+
Mode Selection Flow during S2Idle Entry
421+
========================================
422+
423+
1. Freeze user space tasks
424+
2. Suspend all devices (call runtime_suspend hooks)
425+
3. For each CPU (in cpuidle framework):
426+
427+
CPU Level (CPU_PD):
428+
├─> Check QoS latency constraints
429+
├─> Check device activity in CPU_PD
430+
└─> Select CPU idle state: cpu_sleep_0 (Standby) or cpu_sleep_1 (PowerDown)
431+
432+
Cluster Level (CLUSTER_PD):
433+
├─> Check if this is the last CPU in cluster
434+
├─> Check device activity in CLUSTER_PD (e.g., Display, Ethernet)
435+
├─> If last CPU and no constraints:
436+
│ └─> Select cluster idle state: cluster_sleep_0
437+
└─> Else: Skip cluster power-down
438+
439+
System Level (MAIN_PD):
440+
├─> Check if last CPU in system
441+
├─> Check device activity in MAIN_PD (e.g., UART, USB, Timers)
442+
├─> Check QoS constraints for entire system
443+
├─> Compare latency requirements to available states:
444+
│ ├─> main_sleep_rtcddr (exit latency: 600ms)
445+
│ └─> main_sleep_deep (exit latency: 10ms)
446+
└─> Select deepest state that meets all constraints
447+
448+
4. Last CPU issues composite CPU_SUSPEND with selected state
449+
5. PSCI firmware verifies and executes power-down
450+
451+
Idle State Definitions
452+
=======================
453+
454+
The Device Tree defines multiple idle states at each level of the hierarchy, each with different
455+
power/latency trade-offs. The key states are:
456+
457+
**CPU-Level Idle States:**
458+
459+
* **cpu_sleep_1 (PowerDown)**: CPU is powered down with context loss
460+
461+
* ``arm,psci-suspend-param = <0x012233>``
462+
* Exit latency: ~100ms
463+
464+
**Cluster-Level Idle States:**
465+
466+
* **cluster_sleep_0 (Low-Latency Standby)**: Cluster enters low-power standby when all CPUs are idle
467+
468+
* ``arm,psci-suspend-param = <0x01000021>``
469+
* Exit latency: ~300μs
470+
471+
**System-Level Idle States (Main Domain):**
472+
473+
* **main_sleep_deep (Deep Sleep)**: DDR in self-refresh, more peripherals remain powered for faster resume
474+
475+
* ``arm,psci-suspend-param = <0x2012235>``
476+
* Exit latency: 10ms
477+
* Use case: Short to moderate idle periods with faster resume requirements
478+
479+
* **main_sleep_rtcddr (RTC+DDR)**: DDR in self-refresh, minimal peripherals powered (RTC, I/O retention only)
480+
481+
* ``arm,psci-suspend-param = <0x2012234>``
482+
* Exit latency: 600ms
483+
* Use case: Long idle periods requiring maximum power savings
484+
485+
.. note::
486+
For complete Device Tree definitions including all latency parameters, refer to the platform's
487+
device tree source files (e.g., ``k3-am62l-main.dtsi``).
488+
489+
Understanding the Suspend Parameters
490+
=====================================
491+
492+
The ``arm,psci-suspend-param`` values encode the target power state using the PSCI standard format
493+
described earlier. Let's decode the key parameters for the main domain states:
494+
495+
**Deep Sleep Mode (main_sleep_deep):**
496+
497+
Parameter: ``0x2012235``
498+
499+
.. code-block:: text
500+
501+
Binary: 0000 0010 0000 0001 0010 0010 0011 0101
502+
Hex: 0x02012235
503+
504+
[31:26] = 0 → Reserved
505+
[25:24] = 2 → Power Level = System (0x2)
506+
[23:17] = 0 → Reserved
507+
[16] = 1 → State Type = Power Down
508+
[15:0] = 0x2235 → State ID (platform-specific)
509+
510+
**Interpretation:**
511+
512+
- **Power Level = 2 (System)**: The entire system, including the SoC, enters a low-power state
513+
- **State Type = 1 (Power Down)**: Context is lost; firmware must restore state on resume
514+
- **State ID = 0x2235**: Platform-specific identifier that the PSCI firmware (TF-A) recognizes
515+
as "Deep Sleep" mode where DDR is in Self-Refresh and more peripherals in the Main domain
516+
remain powered compared to RTC+DDR mode, providing faster resume at the cost of higher power
517+
518+
**RTC+DDR Mode (main_sleep_rtcddr):**
519+
520+
Parameter: ``0x2012234``
521+
522+
.. code-block:: text
523+
524+
Binary: 0000 0010 0000 0001 0010 0010 0011 0100
525+
Hex: 0x02012234
526+
527+
[31:26] = 0 → Reserved
528+
[25:24] = 2 → Power Level = System (0x2)
529+
[23:17] = 0 → Reserved
530+
[16] = 1 → State Type = Power Down
531+
[15:0] = 0x2234 → State ID (platform-specific)
532+
533+
**Interpretation:**
534+
535+
- **Power Level = 2 (System)**: System-level power state
536+
- **State Type = 1 (Power Down)**: Power-down with context loss
537+
- **State ID = 0x2234**: Platform-specific identifier for "RTC+DDR" mode where DDR is in
538+
Self-Refresh and only minimal peripherals (RTC, I/O retention) remain powered in the Main
539+
domain, providing maximum power savings at the cost of longer resume latency
540+
541+
The cpuidle governor uses these latency and residency values to automatically select the appropriate
542+
mode. If predicted idle time is short and latency constraints are tight, Deep Sleep mode (the
543+
shallower state) is chosen for faster resume. For longer predicted idle periods with relaxed
544+
latency requirements, RTC+DDR mode (the deeper state) is preferred for maximum power savings.
545+
546+
QoS Latency Constraints and Mode Selection
547+
===========================================
548+
549+
The Linux kernel's PM QoS (Quality of Service) framework allows drivers and applications to
550+
specify maximum acceptable wakeup latency. These constraints directly influence which idle
551+
state can be entered during s2idle.
552+
553+
**How QoS Constraints Work:**
554+
555+
1. Each device or CPU can register a latency constraint (in nanoseconds)
556+
2. The cpuidle governor queries these constraints before selecting an idle state
557+
3. Only idle states with ``exit-latency-us`` ≤ constraint are considered
558+
4. The deepest eligible state is selected
559+
560+
**Setting QoS Constraints from User Space:**
561+
562+
Applications can constrain the system's low-power behavior by writing to the PM QoS device file.
563+
Below is a C program that demonstrates this:
564+
565+
.. code-block:: c
566+
567+
/* testqos.c - Set CPU wakeup latency constraint */
568+
#include <stdio.h>
569+
#include <fcntl.h>
570+
#include <unistd.h>
571+
#include <signal.h>
572+
573+
#define QOS_DEV "/dev/cpu_wakeup_latency"
574+
#define LATENCY_VAL "0x1000" /* 4096 ns (4 μs) in hex */
575+
576+
static volatile int keep_running = 1;
577+
578+
void sig_handler(int sig) {
579+
keep_running = 0;
580+
}
581+
582+
int main(void) {
583+
int fd;
584+
585+
signal(SIGINT, sig_handler);
586+
signal(SIGTERM, sig_handler);
587+
588+
fd = open(QOS_DEV, O_RDWR);
589+
if (fd < 0) {
590+
perror("open");
591+
return 1;
592+
}
593+
594+
if (write(fd, LATENCY_VAL, sizeof(LATENCY_VAL) - 1) < 0) {
595+
perror("write");
596+
close(fd);
597+
return 1;
598+
}
599+
600+
printf("QoS set to %s. Press Ctrl+C to exit.\n", LATENCY_VAL);
601+
602+
while (keep_running)
603+
sleep(1);
604+
605+
close(fd);
606+
printf("Released.\n");
607+
return 0;
608+
}
609+
610+
**Why This Program is Needed:**
611+
612+
This program demonstrates how to control low-power mode selection by setting QoS latency constraints.
613+
By applying a tight latency constraint (4 μs in the example), you can force the system to stay in
614+
shallow idle states, preventing entry into Deep Sleep or RTC+DDR modes. This is useful for testing
615+
that the cpuidle governor correctly respects QoS constraints and selects the appropriate idle state
616+
based on latency requirements.
617+
618+
**Selecting Specific Low-Power Modes:**
619+
620+
To force selection of a specific mode, set the QoS constraint strategically based on the exit
621+
latencies of the available states. The latency value must be provided as a **hex string**
622+
(e.g., "0x7ef41").
623+
624+
* **To force Deep Sleep mode**: Set constraint above Deep Sleep's exit latency (10ms = 10,000 μs)
625+
but below RTC+DDR's exit latency (600ms = 600,000 μs). For example, use **520 μs (520,001 ns)**:
626+
627+
.. code-block:: c
628+
629+
#define LATENCY_VAL "0x7ef41" /* 520,001 ns = 520 μs in hex */
630+
631+
**Calculation:**
632+
633+
- Target latency: 520 μs = 520,000 ns (round to 520,001 for convenience)
634+
- Convert to hex: 520,001₁₀ = 0x7EF41₁₆
635+
- Write as hex string: ``"0x7ef41"``
636+
- This allows Deep Sleep (10,000 μs exit latency) but blocks RTC+DDR (600,000 μs exit latency)
637+
638+
* **To allow RTC+DDR mode**: Set constraint higher than 600ms (600,000 μs) or don't apply any
639+
constraint, allowing the cpuidle governor to select the deepest state (RTC+DDR) during long
640+
idle periods.
641+
642+
**How It Sets QoS Constraints:**
643+
644+
The program opens the special device file ``/dev/cpu_wakeup_latency``, which is part of the
645+
kernel's PM QoS framework. Writing a latency value (in nanoseconds) to this file:
646+
647+
1. Registers a global CPU wakeup latency constraint
648+
2. Causes the cpuidle governor to filter out any idle states with exit latency exceeding this value
649+
3. Remains active as long as the file descriptor is open
650+
4. Automatically releases the constraint when the file descriptor is closed (on program exit)
651+
652+
653+
**Example: Deep Sleep Mode Selection:**
654+
655+
Consider a scenario where the system has active I2C or SPI communication requiring responses
656+
within 20ms. A QoS constraint of 20,000 μs (20ms) would be applied:
657+
658+
.. code-block:: text
659+
660+
Available Main Domain States:
661+
├─> main_sleep_rtcddr: exit-latency = 600,000 μs (600ms) → REJECTED (exceeds constraint)
662+
└─> main_sleep_deep: exit-latency = 10,000 μs (10ms) → SELECTED (meets constraint)
663+
664+
Result: System enters Deep Sleep mode instead of RTC+DDR mode
665+
666+
In this example, even though RTC+DDR provides better power savings, the 20ms latency constraint
667+
forces the system to use the shallower Deep Sleep mode. The selection is between the two main
668+
domain idle states defined for s2idle suspend.
669+
670+
**Usage Example:**
671+
672+
.. code-block:: console
673+
674+
root@am62lxx-evm:~# gcc testqos.c -o testqos
675+
root@am62lxx-evm:~# ./testqos
676+
QoS set to 0x1000. Press Ctrl+C to exit.
677+
678+
# In another terminal, observe the constrained behavior:
679+
root@am62lxx-evm:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
680+
0 # state0: WFI
681+
125 # state1: Standby
682+
350125 # state2: PowerDown (disabled by QoS)
683+
684+
# Press Ctrl+C in the first terminal
685+
Released.
686+
687+
# Now the deeper states are available again:
688+
root@am62lxx-evm:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
689+
0
690+
125
691+
350125 # state2: PowerDown (now enabled)
692+
693+
The value ``0x1000`` (4096 ns = ~4 μs) prevents any idle state with exit latency greater than
694+
4 μs from being entered. In the example above, the PowerDown state with 350ms exit latency
695+
is effectively disabled while the constraint is active.
696+

0 commit comments

Comments
 (0)