|
17 | 17 | - [Hotplugging Shutdown Fixes](#hotplugging-shutdown-fixes) |
18 | 18 | + [Phase 2: Freeze Components And Wait for Switch Quiescence](#phase-2--freeze-components-and-wait-for-switch-quiescence) |
19 | 19 | - [Unfreeze on Failure](#unfreeze-on-failure) |
| 20 | + - [Quiescence Timer](#quiescence-timer) |
20 | 21 | + [Phase 3: Trigger Checkpointing](#phase-3--trigger-checkpointing) |
21 | 22 | + [Phase 4: Prepare and Perform Reboot](#phase-4--prepare-and-perform-reboot) |
22 | 23 | - [Removing Container Shutdown Ordering Dependency](#removing-container-shutdown-ordering-dependency) |
23 | 24 | + [Application Shutdown Optimization](#application-shutdown-optimization) |
24 | 25 | * [Reconciliation Monitoring](#reconciliation-monitoring) |
25 | 26 | * [Component Warmboot States](#component-warmboot-states) |
26 | | -- [Critical Container Design Changes](#critical-container-design-changes) |
| 27 | + * [Handling Race Conditions](#handling-race-conditions) |
| 28 | +- [Reference Design Changes in Critical Containers](#reference-design-changes-in-critical-containers) |
27 | 29 | * [Orchagent](#orchagent) |
28 | 30 | * [Syncd](#syncd) |
29 | 31 | * [Teamd](#teamd) |
@@ -202,42 +204,46 @@ In addition these sanity checks, it will optionally trigger state verification t |
202 | 204 |  |
203 | 205 |
|
204 | 206 |
|
205 | | -NSF Manager will be provide a plugin via SONiC host service that can be used to hotplug fixes during warm shutdown. The host service will call a bash script in which ad-hoc fixes can be added during warm shutdown. This script can be updated before iniating warm reboot and will be called by NSF Manager after performing the sanity checks. Currently, such fixes are added in the fast-reboot script which is then updated on the switch. This framework enables NSF Manager to allow hotplugging fixes during warm shutdown before switch stack components start their warm shutdown routines. |
| 207 | +NSF Manager will be provide a plugin via SONiC host service that can be used to hotplug fixes during warm shutdown. The host service will call a bash script in which ad-hoc fixes can be added during warm shutdown. This script can be updated before iniating warm reboot and will be called by NSF Manager after performing the sanity checks. Currently, such fixes are added in the fast-reboot script which is then updated on the switch. This framework enables NSF Manager to allow hotplugging fixes during warm shutdown before switch stack components start their warm shutdown routines. Additionally, fixes can be hotplugged in [Phase 4](#phase-4-prepare-and-perform-reboot) after the components have completed their warm shutdown routines. This is because [Phase 4](#phase-4-prepare-and-perform-reboot) also uses SONiC host service to prepare the switch for reboot and perform the reboot operation, and thus fixes can be added to that host service. |
206 | 208 |
|
207 | 209 | ##### Phase 2: Freeze Components And Wait for Switch Quiescence |
208 | 210 |
|
209 | 211 |
|
210 | 212 |
|
211 | 213 |  |
212 | 214 |
|
213 | | -NSF Manager will send freeze notification to all registered components and will wait for the quiescence of only those components that have set _freeze = true_. Upon receiving the freeze notification, the component will complete processing the current request queue and stop generating new intents. Stopping new intents from being generated means that boundary components should stop processing requests from external components (external events) and all components should stop their periodic timers that generate new requests (internal events). For example: |
214 | | - |
| 215 | +NSF Manager will send freeze notification to all registered components and will wait for the quiescence of only those components that have set _freeze = true_. Upon receiving the freeze notification, the component will complete processing the current request queue and stop generating new self-sourced intents. This connotes that boundary components should stop processing requests from external components (external events) and all components should stop triggers that generate new requests (internal events) such as stop periodic timers/threads etc. The goal is to stop generating events that might break quiescence of other switch stack components. For example: |
215 | 216 |
|
216 | 217 |
|
217 | 218 | * UMF will stop listening to new gRPC requests from the controller. |
218 | 219 | * P4RT will stop listening to new gRPC requests from the controller and stop packet I/O. |
219 | 220 | * xcvrd will stop listening to transceiver updates such as module presence. |
220 | 221 | * Syncd will stop listening to link state events from the vendor chip. |
221 | 222 | * Orchagent will stop periodic internal timers. |
222 | | -* BGP will stop exchanging packets with the peers. |
223 | | -* Teamd will continue exchanging LACP PDUs but stop processing any changes in the peer PDUs. |
| 223 | +* Teamd will stop processing any changes in the peer PDUs but continue exchanging LACP PDUs. |
224 | 224 |
|
225 | 225 | However, all components will continue to process requests received by them from other switch stack components. This connotes that each component will stop generating events for which it is the source, but will continue to process request from other components. After all components have been freezed, the switch would eventually reach a state wherein each component stops generating new events and thus the switch becomes quiescent. This is because: |
226 | 226 |
|
227 | | - |
228 | | - |
229 | 227 | * Switch boundaries that generate new events have been stopped. |
230 | | -* All timers that generate new events have been stopped. |
| 228 | +* All timers/threads that generate new events have been stopped. |
231 | 229 | * All components have completed processing their pending requests and thus there are no in-flight messages. |
232 | 230 |
|
233 | 231 | After receiving the freeze notification, the components will update their quiescent state in STATE DB when they receive a new request (i.e. they are no longer quiescent) and when they complete processing their current request queue (i.e. they become quiescent). NSF Manager will monitor the quiescent state of all components in STATE DB to determine that the switch has become quiescent and thus further state changes won’t occur in the switch. If all components are in quiescent state then NSF Manager will declare that the switch has become quiescent and thus the switch has attained its final state. |
234 | 232 |
|
235 | | - |
236 | 233 | ###### Unfreeze on Failure |
237 | 234 |
|
238 | 235 |
|
239 | 236 | NSF Manager will wait for a period of time for the switch to become quiescent after which it will determine that this phase failed and abort the warm reboot operation. Additionally, it will unfreeze the switch stack on such failures by sending unfreeze notification to all the registered components. As a result, all components will resume their normal operations and start generating new external and internal events. This ensures that the switch continues to operate normally as it did before the start of the warm reboot operation. |
240 | 237 |
|
| 238 | +###### Quiescence Timer |
| 239 | + |
| 240 | +As indicated in the above sections, NSF Manager consists of 2 timers in this phase: |
| 241 | + |
| 242 | +* Quiescence Time: Amount of time that NSF Manager will wait after all components are quiescent to determine that the switch is quiescent. |
| 243 | +* Overall Phase Timeout: Amount of time for the switch to become quiescent after which NSF Manager will determine that this phase failed. |
| 244 | + |
| 245 | +These timers might be platform dependent since some platforms might take more time than others due to software/hardware constraints. Therefore, these timers will be configurable via CONFIG DB to allow setting appropriate timeouts for different platforms. |
| 246 | + |
241 | 247 |
|
242 | 248 | ##### Phase 3: Trigger Checkpointing |
243 | 249 |
|
@@ -302,11 +308,16 @@ Upon receiving a freeze notification, a component will transition to _frozen_ st |
302 | 308 |
|
303 | 309 | After warm reboot, a component will transition to _initialized_ state after it has completed its initialization routine. It will transition to _reconciled_ state after it has completed reconciliation. It will transition to _failed_ state if its initialization or reconciliation fails. |
304 | 310 |
|
305 | | -Components will update their state in STATE DB using [setWarmStartState()](https://github.com/sonic-net/sonic-swss-common/blob/master/common/warm_restart.cpp#L223) API during the different warm reboot stages. NSF Manager will monitor these NSF states in STATE DB to determine whether it needs to proceed with the next phase of the warm-boot orchestration or not. |
| 311 | +Components will update their state in STATE DB using [setWarmStartState()](https://github.com/sonic-net/sonic-swss-common/blob/master/common/warm_restart.cpp#L223) API during the different warm reboot stages. NSF Manager will monitor these warm-boot states in STATE DB to determine whether it needs to proceed with the next phase of the warm-boot orchestration or not. |
| 312 | + |
| 313 | +#### Handling Race Conditions |
| 314 | + |
| 315 | +[Phase 2](#phase-2-freeze-components-and-wait-for-switch-quiescence) ensures that switch stack components stop generating new events for which they are the source, but they continue to process requests received from other components. The components update their warm-boot state to _frozen_ if they are processing a request and update it to _quiescent_ if their request queue is empty. Therefore, if there are any in-flight requests then at least one component will not be in _quiescent_ state. Eventually, all components will be in _quiescent_ state when there are no in-flight requests and there would be no in-flight requests after a period of time since all components stop generating new events. NSF Manager will wait for all components to be in _quiescent_ state for a period of time (configurable timer) before proceeding to the next phase. As a result, this design handles race conditions that may occur due to requests that are in-flight during warm shutdown. |
| 316 | + |
306 | 317 |
|
307 | | -### Critical Container Design Changes |
| 318 | +### Reference Design Changes in Critical Containers |
308 | 319 |
|
309 | | -This section describes the design changes that need to be made in critical containers for this warm-boot orchestration framework. |
| 320 | +This section provides a reference for the design changes in critical containers due to this warm-boot orchestration framework. The actual design change details are out of scope of this document. |
310 | 321 |
|
311 | 322 | #### Orchagent |
312 | 323 |
|
|
407 | 418 |
|
408 | 419 | NSF Manager will have unit and component tests to verify shutdown orchestration and reconciliation monitoring functionality. Unit/Component tests will be added to all switch stack components that will register with NSF Manager to ensure that they process notifications from NSF Manager and update STATE DB correctly. |
409 | 420 |
|
410 | | -Integration tests will be added to verify the end-to-end functionality of this new warm-boot orchestration framework. At a high-level, the integration test will trigger warm reboot using NSF Manager and will verify that the switch warm reboots with 0 packet loss. |
| 421 | +Integration tests will be added to verify the end-to-end functionality of this new warm-boot orchestration framework. At a high-level, the integration test will trigger warm reboot using NSF Manager and will verify that the switch warm reboots with 0 packet loss. The detailed test plan is out of scope of this design and will be shared separately. |
411 | 422 |
|
412 | 423 | NSF Manager is independent of the underlying forwarding ASIC and thus it will support all NPU types. The above integration test can be used to verify warm-boot orchestration on the different NPU types. |
413 | 424 |
|
|
0 commit comments