[action] [PR:25463] config-setup: Prefer minigraph.xml over ZTP when config_db.json is absent#25657
Open
mssonicbld wants to merge 1 commit intosonic-net:202511from
Open
[action] [PR:25463] config-setup: Prefer minigraph.xml over ZTP when config_db.json is absent#25657mssonicbld wants to merge 1 commit intosonic-net:202511from
mssonicbld wants to merge 1 commit intosonic-net:202511from
Conversation
…sent ### Description of PR **Summary:** When `config_db.json` is missing but `minigraph.xml` is present, the config-setup boot sequence previously triggered ZTP (if enabled), which generates a minimal config and runs `config reload`. This removes the device management IP, requiring console access to recover. Additionally, during warm boot the config initialization path had no guard — if `config_db.json` was absent, ZTP could still be triggered even though warm boot must preserve the existing running configuration. **Root cause:** 1. `do_config_initialization()` had no awareness of `minigraph.xml`. It only checked for ZTP or factory default, even when a valid minigraph was available on disk. 2. `check_system_warm_boot()` only checked STATE_DB, which may not be available early in the boot sequence. The canonical `SONIC_BOOT_TYPE=warm` in `/proc/cmdline` (used by all other boot-type detection in the codebase) was not checked. 3. `boot_config()` did not skip config initialization during warm boot, allowing ZTP to trigger inappropriately. **Fix (3 changes in `config-setup`):** 1. **`do_config_initialization()`**: Check for `minigraph.xml` at the top of the function, before ZTP/factory-default logic. If minigraph is available, use `reload_minigraph` and return early. This aligns with the pattern already used in `do_config_migration()`. 2. **`check_system_warm_boot()`**: Enhanced to check `/proc/cmdline` for `SONIC_BOOT_TYPE=warm` first (the authoritative source set by warm-reboot scripts), then fall back to STATE_DB for compatibility. This is consistent with `getBootType()` used in `docker_image_ctl.j2`, `syncd_common.sh`, and `watchdog-control.sh`. 3. **`boot_config()`**: Added warm boot guard after config migration — during warm boot, skip config initialization and ZTP entirely. Also added `minigraph.xml` guard on the ZTP restart block so ZTP erase/restart is skipped when minigraph was used. This maintains the config priority order consistent with `do_config_migration()`: `config_db.json` > `minigraph.xml` > ZTP > factory default. **Addresses:** ADO 36697420 — `[202511.08] Config Reload is Run during warm-boot up` ### Type of change - [x] Bug fix - [ ] Configuration change (update something in `files/` or `device/`) ### Back port request - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? With ZTP enabled on 202511 images, upgrade path tests that delete `config_db.json` trigger ZTP's fallback behavior. ZTP generates a new config and runs `config reload`, wiping management IP. The device then requires console access to recover. During warm boot this is especially dangerous as it disrupts the warm restart flow. #### How did you do it? 1. Added `minigraph.xml` existence check in `do_config_initialization()` before ZTP/factory-default logic 2. Enhanced `check_system_warm_boot()` to check `/proc/cmdline` for `SONIC_BOOT_TYPE=warm` (consistent with all other boot-type detection in the codebase) 3. Added warm boot guard in `boot_config()` to skip config initialization and ZTP during warm boot 4. Added `minigraph.xml` guard on ZTP restart block in `boot_config()` to avoid unnecessary ZTP erase when minigraph was applied #### How did you verify/test it? Code review and trace of all boot code paths in `config-setup`. The minigraph fix follows the same pattern already used in `do_config_migration()` which correctly prefers minigraph over factory default. The warm boot detection follows the same `/proc/cmdline` pattern used by `docker_image_ctl.j2`, `syncd_common.sh`, and `watchdog-control.sh`.
Collaborator
Author
|
Original PR: #25463 |
4 tasks
Collaborator
Author
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
When
config_db.jsonis missing butminigraph.xmlis present, the config-setup boot sequence previously triggered ZTP (if enabled), which generates a minimal config and runsconfig reload. This removes the device management IP, requiring console access to recover.Additionally, during warm boot the config initialization path had no guard — if
config_db.jsonwas absent, ZTP could still be triggered even though warm boot must preserve the existing running configuration.Root cause:
do_config_initialization()had no awareness ofminigraph.xml. It only checked for ZTP or factory default, even when a valid minigraph was available on disk.check_system_warm_boot()only checked STATE_DB, which may not be available early in the boot sequence. The canonicalSONIC_BOOT_TYPE=warmin/proc/cmdline(used by all other boot-type detection in the codebase) was not checked.boot_config()did not skip config initialization during warm boot, allowing ZTP to trigger inappropriately.Fix (3 changes in
config-setup):do_config_initialization(): Check forminigraph.xmlat the top of the function, before ZTP/factory-default logic. If minigraph is available, usereload_minigraphand return early. This aligns with the pattern already used indo_config_migration().check_system_warm_boot(): Enhanced to check/proc/cmdlineforSONIC_BOOT_TYPE=warmfirst (the authoritative source set by warm-reboot scripts), then fall back to STATE_DB for compatibility. This is consistent withgetBootType()used indocker_image_ctl.j2,syncd_common.sh, andwatchdog-control.sh.boot_config(): Added warm boot guard after config migration — during warm boot, skip config initialization and ZTP entirely. Also addedminigraph.xmlguard on the ZTP restart block so ZTP erase/restart is skipped when minigraph was used.This maintains the config priority order consistent with
do_config_migration():config_db.json>minigraph.xml> ZTP > factory default.Addresses: ADO 36697420 —
[202511.08] Config Reload is Run during warm-boot upType of change
files/ordevice/)Back port request
Approach
What is the motivation for this PR?
With ZTP enabled on 202511 images, upgrade path tests that delete
config_db.jsontrigger ZTP's fallback behavior. ZTP generates a new config and runsconfig reload, wiping management IP. The device then requires console access to recover. During warm boot this is especially dangerous as it disrupts the warm restart flow.How did you do it?
minigraph.xmlexistence check indo_config_initialization()before ZTP/factory-default logiccheck_system_warm_boot()to check/proc/cmdlineforSONIC_BOOT_TYPE=warm(consistent with all other boot-type detection in the codebase)boot_config()to skip config initialization and ZTP during warm bootminigraph.xmlguard on ZTP restart block inboot_config()to avoid unnecessary ZTP erase when minigraph was appliedHow did you verify/test it?
Code review and trace of all boot code paths in
config-setup. The minigraph fix follows the same pattern already used indo_config_migration()which correctly prefers minigraph over factory default. The warm boot detection follows the same/proc/cmdlinepattern used bydocker_image_ctl.j2,syncd_common.sh, andwatchdog-control.sh.