Skip to content

Conversation

@vvolam
Copy link
Contributor

@vvolam vvolam commented Nov 1, 2025

HLD: https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/graceful-shutdown/graceful-shutdown.md
These changes build upon enhancements in #4031

This PR adds CLI support and visibility for module-level graceful transitions (startup/shutdown/reboot) to align with the SmartSwitch/DPU lifecycle work.

What I did

  • Added support to view module transition states (startup, shutdown, reboot) through CLI.
  • Integrated with STATE_DB CHASSIS_MODULE_TABLE to display transition status, type, and elapsed time.
  • Enhanced user experience with readable durations and exit codes for automation.
  • Implemented comprehensive unit tests for transition visibility, parsing, and error handling.

How I did it

  • Added a helper class to read STATE_DB entries:
    • state_transition_in_progress
    • transition_type
    • transition_start_time
  • Implemented robust error handling for missing or malformed DB entries.
  • Added pytest-based unit tests using mocked state_db_connector.

How to verify it

  • Build and install the updated sonic-utilities package on DUT
  • Check Redis entries: redis-cli -n 6 hgetall "CHASSIS_MODULE_TABLE|DPU0"
  • Run the module startup/shutdown commands
  • Run unit tests

Sample outputs when "state_transition_in_progress"

Errors thrown when the same module transition is already in progress.

$ sudo config chassis modules shutdown DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2';sudo reboot -d DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2'
Shutting down chassis module DPU2

  1. "desc"
  2. "NVIDIA XXXXXX DPU"
  3. "slot"
  4. "N/A"
  5. "oper_status"
  6. "Online"
  7. "serial"
  8. "XXXXXXXXXX"
  9. "transition_in_progress"
  10. "True"
  11. "transition_type"
  12. "shutdown"
  13. "transition_start_time"
  14. "1763059401"
    True
    2025-11-13 18:43:22 - User requested rebooting device dpu2 ...
    2025-11-13 18:43:23 - INFO: DPU dpu2 is in 'Online' state before reboot.
    2025-11-13 18:43:23 - ERROR: state_transition_in_progress flag is already set for dpu2

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

Copilot AI review requested due to automatic review settings November 1, 2025 00:30
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors module state transition tracking by moving the implementation from STATE_DB to platform-level methods. The change introduces three new methods in ModuleHelper to manage state transitions through the platform API instead of using database entries with timestamps.

  • Adds set_module_state_transition, clear_module_state_transition, and get_module_state_transition methods to ModuleHelper
  • Removes STATE_DB-based transition tracking functions from config/chassis_modules.py
  • Updates shell script to use new platform-level transition flag management

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
utilities_common/module.py Adds three new methods for managing module state transitions via platform API
tests/test_module.py Adds comprehensive unit tests for the new state transition methods
tests/chassis_modules_test.py Updates tests to use platform-level transition checks instead of STATE_DB queries
scripts/reboot_smartswitch_helper Adds shell functions to set/clear/get state transition flags via Python API
config/chassis_modules.py Removes STATE_DB transition tracking functions and simplifies shutdown/startup logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam
Copy link
Contributor Author

vvolam commented Nov 4, 2025

@rameshraghupathy @gpunathilell could you please review this latest PR

@rameshraghupathy
Copy link
Contributor

@vvolam I guess, you should refer to #4031 in the description?

@vvolam
Copy link
Contributor Author

vvolam commented Nov 5, 2025

@vvolam I guess, you should refer to #4031 in the description?

@rameshraghupathy Fixed the PR description. Thank you

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants