Skip to content

Commit b1ecce9

Browse files
HLD for SmartSwitch DPU graceful shutdown (#1991)
* Initial version for dpu-graceful-shutdown HLD * Did some minor improvement * Addressed review comments * Adding two approaches * Did some cleanup * Did some cleanup * Did some cleanup * Did some cleanup * Did some cleanup * Did some cleanup * Did some cleanup * Did some cleanup * Did some cleanup * Did some cleanup * Did some cleanup * Fixed the sequence diagram * Fixed the sequence diagram * Fixed the sequence flow * Addressed review comments * Did some cleanup * Called out that the response read happens in a 5 sec loop * Added a section for interaoperability * Did some cleanup * Did some cleanup * Did some cleanup * Enhanced the reboot-interoperability.svg diagram * Enhanced the reboot-interoperability description * Addressed review comments * Addressed review comments * Addressed review comments * Addressed some review comments * Addressed some review comments * modified reboot-interoperability diagram * addressed review comments * Addressed some review comments * Addressed some review comments * Updated the image
1 parent c697bb8 commit b1ecce9

File tree

4 files changed

+207
-0
lines changed

4 files changed

+207
-0
lines changed
Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
# SmartSwitch DPU Graceful Shutdown
2+
3+
| Rev | Date | Author | Change Description |
4+
| --- | ---- | ------ | ------------------ |
5+
| 0.1 | 12/05/2025 | Ramesh Raghupathy | Initial version|
6+
7+
8+
## Definitions / Abbreviations
9+
10+
| Term | Meaning |
11+
| --- | ---- |
12+
| PMON | Platform Monitor |
13+
| DPU | Data Processing Unit |
14+
| gRPC | Generic Remote Procedure Calls |
15+
| gNOI | gRPC Network Operations Interface |
16+
| gNMI | gRPC Network Management Interface |
17+
18+
## Introduction
19+
SmartSwitch supports graceful reboot of the DPUs. Given this, it is quiet natural that we provide support for graceful shutdown of the DPUs. Though it may sound like that the graceful shutdown is the first half of graceful reboot, it is not so because the way it is invoked, the code path for the shutdown are different making the implementation little complex. Besides this, the limitation of the absence of docker, the container separation, and the platform agnostic implementation adds to the challenge of invoking the gNOI call from this code path. Graceful shutdown on each DPU happens in parallel.
20+
21+
## DPU Graceful Shutdown Sequence
22+
23+
The following sequence diagram illustrates the detailed steps involved in the graceful shutdown of a DPU:
24+
25+
<p align="center"><img src="./images/dpu-graceful-shutdown.svg"></p>
26+
27+
## Sequence of Operations
28+
29+
1. **Daemon Initialization:**
30+
31+
* Upon startup, `gnoi_reboot_daemon.py` subscribes to the `CHASSIS_MODULE_INFO_TABLE` to monitor incoming shutdown/reboot requests. The state transition will be no-op for startup requests.
32+
33+
2. **CLI Command Execution:**
34+
35+
* The user executes the command `config chassis module shutdown DPUx` via the CLI or a config load.
36+
37+
3. **Chassis Daemon Processing:**
38+
39+
* `chassisd` receives the shutdown command and invokes set_admin_state(down) on `module_base.py`.
40+
41+
* Within `module_base.py`, the system checks if the device `subtype` is `"SmartSwitch"` and `switch_type` is not `dpu`.
42+
43+
* If both conditions are met, it proceeds with the graceful shutdown process, else calls `module.py` `set_admin_state(down)`
44+
45+
4. **Graceful Shutdown Handler Invocation:**
46+
47+
* `module_base.py` calls the `graceful_shutdown_handler()` method to initiate the graceful shutdown sequence.
48+
49+
5. **Reboot Request Creation:**
50+
51+
* Within the `graceful_shutdown_handler()`, `state_transition_in_progress` `True`is written to the `CHASSIS_MODULE_INFO_TABLE` in Redis STATE_DB for DPUx along with `transition_type`.
52+
53+
6. **Daemon Notification and Processing:**
54+
55+
* `gnoi_reboot_daemon.py` detects the `state_transition_in_progress` turning `True` in `CHASSIS_MODULE_INFO_TABLE` and sends a gNOI Reboot RPC with the method `HALT` to the sysmgr in DPUx, which in turn issues a DBUS request to execute `reboot -p` on DPUx.
56+
57+
7. **Reboot Request**:
58+
59+
* The daemon forwards the reboot request.
60+
61+
8. **Reboot Status Monitoring:**
62+
63+
* The daemon sends `gnoi_client -rpc RebootStatus` to monitor the reboot status of DPUx.
64+
65+
9. **DPUx Returns Status:**
66+
67+
* DPUx returns the reboot status response to the daemon.
68+
69+
10. **Reboot Result Update in DB:**
70+
71+
* The daemon writes the reboot result to the `CHASSIS_MODULE_INFO_TABLE` in Redis STATE_DB by turning `state_transition_in_progress` to `False` when after the platform API completes the power down operation of the modules as shown in step 13.
72+
73+
* In case of a reboot result failure the result gets updated after the timeout.
74+
75+
11. **Read the Result:**
76+
77+
* `module_base.py` in a loop reads the `state_transition_in_progress` turning `False` in `CHASSIS_MODULE_INFO_TABLE` every 5 secs.
78+
79+
12. **Log the Result:**
80+
81+
* `module_base.py` logs the reboot result accordingly.
82+
83+
13. **Final State Transition:**
84+
85+
* `module_base.py` invokes `set_admin_state(down)` on `module.py`.
86+
87+
* `module.py` calls the platform API to power down the module when the DPUx completes kernel shutdown.
88+
89+
## Objective
90+
91+
This design enables the `chassisd` process running in the PMON container to invoke a **gNOI-based reboot** when it triggers the "set_admin_state(down)" API of a DPU module, without relying on `docker`, `bash`, or `hostexec` within the container.
92+
93+
## Constraints
94+
95+
- The PMON container is highly restricted: no `docker`, `hostexec`, or `bash`.
96+
- gNOI reboot requires executing a command using `docker exec` on the host.
97+
- Communication must be initiated from PMON and executed by the host.
98+
99+
---
100+
101+
## Design Overview
102+
103+
In the Redis STATE_DB IPC approach, SONiC leverages Redis's publish-subscribe mechanism to facilitate inter-process communication between components. This event-driven design ensures decoupled and reliable communication between components.
104+
105+
### CHASSIS_MODULE_INFO_TABLE Schema (STATE_DB)
106+
107+
KEY: `CHASSIS_MODULE_INFO_TABLE|<MODULE_NAME>`.
108+
109+
| Field | Description |
110+
| ------------------------------ | -------------------------------------------------------------------------------------------------------- |
111+
| `state_transition_in_progress` | `"True"` indicates that a transition is ongoing; `"False"` or absence implies no transition. |
112+
| `transition_start_time` | Timestamp in human-readable UTC format representing the start of the transition. |
113+
| `transition_type` | Specifies the nature of the transition: `"shutdown"`, `"none"`. `none` is default for reboot and startup |
114+
115+
**Example:**
116+
```
117+
CHASSIS_MODULE_INFO_TABLE|DPU0
118+
{
119+
"state_transition_in_progress": "True",
120+
"transition_start_time": "Mon Jun 17 08:32:10 UTC 2025",
121+
"transition_type": "shutdown"
122+
}
123+
```
124+
125+
| Transition Type | Who Sets the Field | How It's Cleared |
126+
| --------------------- | --------------------------------------------------------------- | --------------------------------------------------- |
127+
| **Startup** | CLI or config load | Once module reaches online state |
128+
| **Shutdown** | CLI or config load | `gnoi-reboot-daemon` upon completing the platform API (module shutdown) |
129+
| **Reboot** | `smartswitch_reboot_helper` | Cleared by `smartswitch_reboot_helper` upon completing the platform API |
130+
131+
## Parallel Execution
132+
133+
The following sequence diagram illustrates the parallel execution of graceful shutdown of multiple DPUs:
134+
135+
<p align="center"><img src="./images/parallel-execution.svg"></p>
136+
137+
## Interoperability between DPU Graceful Shutdown & gNOI Reboot
138+
139+
<p align="center"><img src="./images/reboot-interoperability.svg"></p>
140+
141+
The diagram above illustrates scenarios where both module_base.py and smartswitch_reboot_helper might attempt to initiate a shutdown, startup and reboot simultaneously. When there is a race condition the one that writes the `CHASSIS_MODULE_INFO_TABLE` `state_transition_in_progress` field wins. In case if the `state_transition_in_progress` is `True` as a result of DPU startup in progress both reboot and shutdown will fail. It is up to the requesting module to re-issue the transaction if needed. When the module level reboot and switch level reboot happen simultaneously, if the module level reboot has already updated the
142+
`state_transition_in_progress` to `True` the switch level reboot needs to be reissued. If the switch level reboot happens first it will grab all the module
143+
`state_transition_in_progress` and set them to `True` as a first step and runs to completion.
144+
145+
**Scenario 1:** module_base issues a startup or shutdown when smartswitch_reboot_helper module reboot is in progress for the same module.
146+
147+
The same scenario applies if "config reload" happens when reboot is in progress.
148+
149+
* smartswitch_reboot_helper writes to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` to `True`.
150+
151+
* If module_base.py attempts to write to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` during this process, the operation will fail. The user has to retry the shutdown operation later.
152+
153+
* When the reboot is complete the `CHASSIS_MODULE_INFO_TABLE` `state_transition_in_progress` will be set to `False`. The module_base.py has to retry the shutdown/startup operation as needed when the reboot is complete.
154+
155+
**Scenario 2:** smartswitch_reboot_helper module issues a reboot when module_base graceful shutdown is in progress.
156+
157+
* module_base.py writes to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` and sets the `"transition_type": "shutdown"`.
158+
159+
* gnoi_reboot_daemon.py is notified of the new entry and proceeds to send a gNOI Reboot RPC with the method HALT to the sysmgr in DPUx.
160+
161+
* The daemon writes the reboot result to the `CHASSIS_MODULE_INFO_TABLE` by toggling `state_transition_in_progress` to `False`.
162+
163+
* If smartswitch_reboot_helper also attempts to write to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` during this process, the operation will fail.
164+
165+
* The graceful shutdown completes as planned. So, there is no need for the reboot in this situation.
166+
167+
**Scenario 3:** smartswitch_reboot_helper module issues a reboot when module_base startup is in progress.
168+
169+
* module_base.py writes to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True`.
170+
171+
* If smartswitch_reboot_helper also attempts to write to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` during this process, the operation will fail.
172+
173+
* The module startup completes as planned. So, the reboot may not be needed in this situation.
174+
175+
**Scenario 4:** module_base issues a graceful shutdown when the module startup is in progress or vice versa.
176+
177+
* If module_base.py writes to `CHASSIS_MODULE_INFO_TABLE` with `state_transition_in_progress` `True` indicating startup or shutdown is in progress.
178+
179+
* If module_base.py issues another startup or shutdown to the same module that will fail and the user has to issue it again later when the previous operation is complete.
180+
181+
**Scenario 5:** Switch level reboot is issued when module level reboot or startup or shutdown in progress.
182+
183+
* In this situation the switch level reboot logic will check the `state_transition_in_progress` for all the modules first and grab anything that is `False` set them the `True`. If one or more modules are already undergoing reboot or shutdown or startup it will ignore those modules and complete the remaining. This will leave the system in the expected state. Until the switch level reboot is complete the `state_transition_in_progress` for all modules will be maintained `True` irrespective of the type of operation.
184+
185+
**Scenario 6:** Module level reboot or startup or shutdown is issued when switch level reboot is in progress.
186+
187+
* The module level requests will fail as the switch level reboot has already set all the module level `state_transition_in_progress` to `True`.
188+
* The user needs to redo the module level operation after the switch level reboot if needed.
189+
190+
This design ensures that only one reboot process is initiated, regardless of which component triggers it first, thereby preventing race conditions and ensuring system stability.
191+
192+
---
193+
194+
## References
195+
196+
- [PMON HLD](https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/pmon/smartswitch-pmon.md)
197+
- [Smart Switch Reboot HLD](https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/reboot/reboot-hld.md)
198+
199+
---

doc/smart-switch/graceful-shutdown/images/dpu-graceful-shutdown.svg

Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)