@@ -309,3 +309,388 @@ PSCI APIs are invoked.
309309 - Lower
310310 - High, primarily due to the overhead of **CPU Hotplug ** for non-boot CPUs
311311
312+ **********************************************
313+ Low Power Mode Selection in S2Idle (OSI Mode)
314+ **********************************************
315+
316+ S2Idle with OSI mode enables sophisticated low-power mode selection based on system constraints,
317+ power domain hierarchy, and predicted idle duration. The system can automatically select between
318+ multiple low-power modes without user intervention, adapting to the runtime requirements.
319+
320+ Power Domain Hierarchy in Device Tree
321+ ======================================
322+
323+ The power domain hierarchy in the device tree defines how different system components are grouped
324+ and how their power states are coordinated. This hierarchical structure is fundamental to OSI mode's
325+ "Last Man Standing" logic.
326+
327+ **Hierarchical Structure: **
328+
329+ .. code-block :: text
330+
331+ MAIN_PD (System Level)
332+ │
333+ ├──> CLUSTER_PD (Cluster Level)
334+ │ │
335+ │ ├──> CPU_PD (CPU Level)
336+ │ │ ├──> CPU0
337+ │ │ └──> CPU1
338+ │ │
339+ │ └──> Cluster-sensitive peripherals
340+ │ ├──> CPSW3G (Ethernet)
341+ │ └──> DSS0 (Display)
342+ │
343+ └──> Main domain peripherals
344+ ├──> UART, I2C, SPI controllers
345+ ├──> Timers
346+ ├──> SDHCI controllers
347+ └──> USB controllers
348+
349+ **Device Tree Implementation: **
350+
351+ In the Device Tree, this hierarchy is established through power domain mappings:
352+
353+ .. code-block :: dts
354+
355+ &psci {
356+ CPU_PD: power-controller-cpu {
357+ #power-domain-cells = <0>;
358+ power-domains = <&CLUSTER_PD>;
359+ domain-idle-states = <&cpu_sleep_0>, <&cpu_sleep_1>;
360+ };
361+
362+ CLUSTER_PD: power-controller-cluster {
363+ #power-domain-cells = <0>;
364+ domain-idle-states = <&cluster_sleep_0>;
365+ power-domains = <&MAIN_PD>;
366+ };
367+
368+ MAIN_PD: power-controller-main {
369+ #power-domain-cells = <0>;
370+ domain-idle-states = <&main_sleep_deep>, <&main_sleep_rtcddr>;
371+ };
372+ };
373+
374+ **Why Domain Grouping is Needed: **
375+
376+ The domain grouping serves several critical purposes:
377+
378+ 1. **Hardware Dependency Management **: Certain peripherals must remain active for specific low-power
379+ states. For example, DDR controllers must remain operational in RTC+DDR mode, but can be powered
380+ down in Deep Sleep mode.
381+
382+ 2. **Constraint Propagation **: When a device in the CLUSTER_PD is active (e.g., Display Subsystem),
383+ the cluster cannot enter its deepest idle state. The constraint propagates up the hierarchy,
384+ preventing both CLUSTER_PD and MAIN_PD from entering deeper states.
385+
386+ 3. **Automatic Mode Selection **: The cpuidle framework uses the hierarchy to automatically select
387+ the deepest possible state. If any device in a power domain is active or has latency constraints,
388+ shallower states are automatically chosen.
389+
390+ 4. **Race Condition Prevention **: The hierarchy ensures that the PSCI firmware can verify all
391+ components in a domain are truly idle before powering down that domain.
392+
393+ **Peripheral Power Domain Mapping: **
394+
395+ The ``power-domain-map `` property explicitly assigns peripherals to power domains:
396+
397+ .. code-block :: dts
398+
399+ &scmi_pds {
400+ power-domain-map = <3 &CLUSTER_PD>, /* CPSW3G Ethernet */
401+ <39 &CLUSTER_PD>, /* DSS0 Display */
402+ <38 &CLUSTER_PD>, /* DSS_DSI0 */
403+ <15 &MAIN_PD>, /* TIMER0 */
404+ <26 &MAIN_PD>, /* SDHCI1 */
405+ <89 &MAIN_PD>, /* UART0 */
406+ <95 &MAIN_PD>; /* USBSS0 */
407+ };
408+
409+ This mapping ensures that when the Display (DSS0) is active, the system won't enter states that
410+ would cause DDR Auto Self-Refresh issues. Similarly, active UART or USB connections prevent
411+ deeper system states that would disconnect those interfaces.
412+
413+ Role in Mode Selection
414+ =======================
415+
416+ During s2idle entry, the cpuidle framework traverses the power domain hierarchy from bottom to top:
417+
418+ .. code-block :: text
419+
420+ Mode Selection Flow during S2Idle Entry
421+ ========================================
422+
423+ 1. Freeze user space tasks
424+ 2. Suspend all devices (call runtime_suspend hooks)
425+ 3. For each CPU (in cpuidle framework):
426+
427+ CPU Level (CPU_PD):
428+ ├─> Check QoS latency constraints
429+ ├─> Check device activity in CPU_PD
430+ └─> Select CPU idle state: cpu_sleep_0 (Standby) or cpu_sleep_1 (PowerDown)
431+
432+ Cluster Level (CLUSTER_PD):
433+ ├─> Check if this is the last CPU in cluster
434+ ├─> Check device activity in CLUSTER_PD (e.g., Display, Ethernet)
435+ ├─> If last CPU and no constraints:
436+ │ └─> Select cluster idle state: cluster_sleep_0
437+ └─> Else: Skip cluster power-down
438+
439+ System Level (MAIN_PD):
440+ ├─> Check if last CPU in system
441+ ├─> Check device activity in MAIN_PD (e.g., UART, USB, Timers)
442+ ├─> Check QoS constraints for entire system
443+ ├─> Compare latency requirements to available states:
444+ │ ├─> main_sleep_rtcddr (exit latency: 600ms)
445+ │ └─> main_sleep_deep (exit latency: 10ms)
446+ └─> Select deepest state that meets all constraints
447+
448+ 4. Last CPU issues composite CPU_SUSPEND with selected state
449+ 5. PSCI firmware verifies and executes power-down
450+
451+ Idle State Definitions
452+ =======================
453+
454+ The Device Tree defines multiple idle states at each level of the hierarchy, each with different
455+ power/latency trade-offs. The key states are:
456+
457+ **CPU-Level Idle States: **
458+
459+ * **cpu_sleep_1 (PowerDown) **: CPU is powered down with context loss
460+
461+ * ``arm,psci-suspend-param = <0x012233> ``
462+ * Exit latency: ~100ms
463+
464+ **Cluster-Level Idle States: **
465+
466+ * **cluster_sleep_0 (Low-Latency Standby) **: Cluster enters low-power standby when all CPUs are idle
467+
468+ * ``arm,psci-suspend-param = <0x01000021> ``
469+ * Exit latency: ~300μs
470+
471+ **System-Level Idle States (Main Domain): **
472+
473+ * **main_sleep_deep (Deep Sleep) **: DDR in self-refresh, more peripherals remain powered for faster resume
474+
475+ * ``arm,psci-suspend-param = <0x2012235> ``
476+ * Exit latency: 10ms
477+ * Use case: Short to moderate idle periods with faster resume requirements
478+
479+ * **main_sleep_rtcddr (RTC+DDR) **: DDR in self-refresh, minimal peripherals powered (RTC, I/O retention only)
480+
481+ * ``arm,psci-suspend-param = <0x2012234> ``
482+ * Exit latency: 600ms
483+ * Use case: Long idle periods requiring maximum power savings
484+
485+ .. note ::
486+ For complete Device Tree definitions including all latency parameters, refer to the platform's
487+ device tree source files (e.g., ``k3-am62l-main.dtsi ``).
488+
489+ Understanding the Suspend Parameters
490+ =====================================
491+
492+ The ``arm,psci-suspend-param `` values encode the target power state using the PSCI standard format
493+ described earlier. Let's decode the key parameters for the main domain states:
494+
495+ **Deep Sleep Mode (main_sleep_deep): **
496+
497+ Parameter: ``0x2012235 ``
498+
499+ .. code-block :: text
500+
501+ Binary: 0000 0010 0000 0001 0010 0010 0011 0101
502+ Hex: 0x02012235
503+
504+ [31:26] = 0 → Reserved
505+ [25:24] = 2 → Power Level = System (0x2)
506+ [23:17] = 0 → Reserved
507+ [16] = 1 → State Type = Power Down
508+ [15:0] = 0x2235 → State ID (platform-specific)
509+
510+ **Interpretation: **
511+
512+ - **Power Level = 2 (System) **: The entire system, including the SoC, enters a low-power state
513+ - **State Type = 1 (Power Down) **: Context is lost; firmware must restore state on resume
514+ - **State ID = 0x2235 **: Platform-specific identifier that the PSCI firmware (TF-A) recognizes
515+ as "Deep Sleep" mode where DDR is in Self-Refresh and more peripherals in the Main domain
516+ remain powered compared to RTC+DDR mode, providing faster resume at the cost of higher power
517+
518+ **RTC+DDR Mode (main_sleep_rtcddr): **
519+
520+ Parameter: ``0x2012234 ``
521+
522+ .. code-block :: text
523+
524+ Binary: 0000 0010 0000 0001 0010 0010 0011 0100
525+ Hex: 0x02012234
526+
527+ [31:26] = 0 → Reserved
528+ [25:24] = 2 → Power Level = System (0x2)
529+ [23:17] = 0 → Reserved
530+ [16] = 1 → State Type = Power Down
531+ [15:0] = 0x2234 → State ID (platform-specific)
532+
533+ **Interpretation: **
534+
535+ - **Power Level = 2 (System) **: System-level power state
536+ - **State Type = 1 (Power Down) **: Power-down with context loss
537+ - **State ID = 0x2234 **: Platform-specific identifier for "RTC+DDR" mode where DDR is in
538+ Self-Refresh and only minimal peripherals (RTC, I/O retention) remain powered in the Main
539+ domain, providing maximum power savings at the cost of longer resume latency
540+
541+ The cpuidle governor uses these latency and residency values to automatically select the appropriate
542+ mode. If predicted idle time is short and latency constraints are tight, Deep Sleep mode (the
543+ shallower state) is chosen for faster resume. For longer predicted idle periods with relaxed
544+ latency requirements, RTC+DDR mode (the deeper state) is preferred for maximum power savings.
545+
546+ QoS Latency Constraints and Mode Selection
547+ ===========================================
548+
549+ The Linux kernel's PM QoS (Quality of Service) framework allows drivers and applications to
550+ specify maximum acceptable wakeup latency. These constraints directly influence which idle
551+ state can be entered during s2idle.
552+
553+ **How QoS Constraints Work: **
554+
555+ 1. Each device or CPU can register a latency constraint (in nanoseconds)
556+ 2. The cpuidle governor queries these constraints before selecting an idle state
557+ 3. Only idle states with ``exit-latency-us `` ≤ constraint are considered
558+ 4. The deepest eligible state is selected
559+
560+ **Setting QoS Constraints from User Space: **
561+
562+ Applications can constrain the system's low-power behavior by writing to the PM QoS device file.
563+ Below is a C program that demonstrates this:
564+
565+ .. code-block :: c
566+
567+ /* testqos.c - Set CPU wakeup latency constraint */
568+ #include <stdio.h>
569+ #include <fcntl.h>
570+ #include <unistd.h>
571+ #include <signal.h>
572+
573+ #define QOS_DEV "/dev/cpu_wakeup_latency"
574+ #define LATENCY_VAL "0x1000" /* 4096 ns (4 μs) in hex */
575+
576+ static volatile int keep_running = 1;
577+
578+ void sig_handler(int sig) {
579+ keep_running = 0;
580+ }
581+
582+ int main(void) {
583+ int fd;
584+
585+ signal(SIGINT, sig_handler);
586+ signal(SIGTERM, sig_handler);
587+
588+ fd = open(QOS_DEV, O_RDWR);
589+ if (fd < 0) {
590+ perror("open");
591+ return 1;
592+ }
593+
594+ if (write(fd, LATENCY_VAL, sizeof(LATENCY_VAL) - 1) < 0) {
595+ perror("write");
596+ close(fd);
597+ return 1;
598+ }
599+
600+ printf("QoS set to %s. Press Ctrl+C to exit.\n", LATENCY_VAL);
601+
602+ while (keep_running)
603+ sleep(1);
604+
605+ close(fd);
606+ printf("Released.\n");
607+ return 0;
608+ }
609+
610+ **Why This Program is Needed: **
611+
612+ This program demonstrates how to control low-power mode selection by setting QoS latency constraints.
613+ By applying a tight latency constraint (4 μs in the example), you can force the system to stay in
614+ shallow idle states, preventing entry into Deep Sleep or RTC+DDR modes. This is useful for testing
615+ that the cpuidle governor correctly respects QoS constraints and selects the appropriate idle state
616+ based on latency requirements.
617+
618+ **Selecting Specific Low-Power Modes: **
619+
620+ To force selection of a specific mode, set the QoS constraint strategically based on the exit
621+ latencies of the available states. The latency value must be provided as a **hex string **
622+ (e.g., "0x7ef41").
623+
624+ * **To force Deep Sleep mode **: Set constraint above Deep Sleep's exit latency (10ms = 10,000 μs)
625+ but below RTC+DDR's exit latency (600ms = 600,000 μs). For example, use **520 μs (520,001 ns) **:
626+
627+ .. code-block :: c
628+
629+ #define LATENCY_VAL "0x7ef41" /* 520,001 ns = 520 μs in hex */
630+
631+ **Calculation: **
632+
633+ - Target latency: 520 μs = 520,000 ns (round to 520,001 for convenience)
634+ - Convert to hex: 520,001₁₀ = 0x7EF41₁₆
635+ - Write as hex string: ``"0x7ef41" ``
636+ - This allows Deep Sleep (10,000 μs exit latency) but blocks RTC+DDR (600,000 μs exit latency)
637+
638+ * **To allow RTC+DDR mode **: Set constraint higher than 600ms (600,000 μs) or don't apply any
639+ constraint, allowing the cpuidle governor to select the deepest state (RTC+DDR) during long
640+ idle periods.
641+
642+ **How It Sets QoS Constraints: **
643+
644+ The program opens the special device file ``/dev/cpu_wakeup_latency ``, which is part of the
645+ kernel's PM QoS framework. Writing a latency value (in nanoseconds) to this file:
646+
647+ 1. Registers a global CPU wakeup latency constraint
648+ 2. Causes the cpuidle governor to filter out any idle states with exit latency exceeding this value
649+ 3. Remains active as long as the file descriptor is open
650+ 4. Automatically releases the constraint when the file descriptor is closed (on program exit)
651+
652+
653+ **Example: Deep Sleep Mode Selection: **
654+
655+ Consider a scenario where the system has active I2C or SPI communication requiring responses
656+ within 20ms. A QoS constraint of 20,000 μs (20ms) would be applied:
657+
658+ .. code-block :: text
659+
660+ Available Main Domain States:
661+ ├─> main_sleep_rtcddr: exit-latency = 600,000 μs (600ms) → REJECTED (exceeds constraint)
662+ └─> main_sleep_deep: exit-latency = 10,000 μs (10ms) → SELECTED (meets constraint)
663+
664+ Result: System enters Deep Sleep mode instead of RTC+DDR mode
665+
666+ In this example, even though RTC+DDR provides better power savings, the 20ms latency constraint
667+ forces the system to use the shallower Deep Sleep mode. The selection is between the two main
668+ domain idle states defined for s2idle suspend.
669+
670+ **Usage Example: **
671+
672+ .. code-block :: console
673+
674+ root@am62lxx-evm:~# gcc testqos.c -o testqos
675+ root@am62lxx-evm:~# ./testqos
676+ QoS set to 0x1000. Press Ctrl+C to exit.
677+
678+ # In another terminal, observe the constrained behavior:
679+ root@am62lxx-evm:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
680+ 0 # state0: WFI
681+ 125 # state1: Standby
682+ 350125 # state2: PowerDown (disabled by QoS)
683+
684+ # Press Ctrl+C in the first terminal
685+ Released.
686+
687+ # Now the deeper states are available again:
688+ root@am62lxx-evm:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
689+ 0
690+ 125
691+ 350125 # state2: PowerDown (now enabled)
692+
693+ The value ``0x1000 `` (4096 ns = ~4 μs) prevents any idle state with exit latency greater than
694+ 4 μs from being entered. In the example above, the PowerDown state with 350ms exit latency
695+ is effectively disabled while the constraint is active.
696+
0 commit comments