Skip to content

Comments

[action] [PR:25463] config-setup: Prefer minigraph.xml over ZTP when config_db.json is absent#25657

Open
mssonicbld wants to merge 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/25463
Open

[action] [PR:25463] config-setup: Prefer minigraph.xml over ZTP when config_db.json is absent#25657
mssonicbld wants to merge 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/25463

Conversation

@mssonicbld
Copy link
Collaborator

Description of PR

Summary:
When config_db.json is missing but minigraph.xml is present, the config-setup boot sequence previously triggered ZTP (if enabled), which generates a minimal config and runs config reload. This removes the device management IP, requiring console access to recover.

Additionally, during warm boot the config initialization path had no guard — if config_db.json was absent, ZTP could still be triggered even though warm boot must preserve the existing running configuration.

Root cause:

  1. do_config_initialization() had no awareness of minigraph.xml. It only checked for ZTP or factory default, even when a valid minigraph was available on disk.
  2. check_system_warm_boot() only checked STATE_DB, which may not be available early in the boot sequence. The canonical SONIC_BOOT_TYPE=warm in /proc/cmdline (used by all other boot-type detection in the codebase) was not checked.
  3. boot_config() did not skip config initialization during warm boot, allowing ZTP to trigger inappropriately.

Fix (3 changes in config-setup):

  1. do_config_initialization(): Check for minigraph.xml at the top of the function, before ZTP/factory-default logic. If minigraph is available, use reload_minigraph and return early. This aligns with the pattern already used in do_config_migration().

  2. check_system_warm_boot(): Enhanced to check /proc/cmdline for SONIC_BOOT_TYPE=warm first (the authoritative source set by warm-reboot scripts), then fall back to STATE_DB for compatibility. This is consistent with getBootType() used in docker_image_ctl.j2, syncd_common.sh, and watchdog-control.sh.

  3. boot_config(): Added warm boot guard after config migration — during warm boot, skip config initialization and ZTP entirely. Also added minigraph.xml guard on the ZTP restart block so ZTP erase/restart is skipped when minigraph was used.

This maintains the config priority order consistent with do_config_migration():
config_db.json > minigraph.xml > ZTP > factory default.

Addresses: ADO 36697420 — [202511.08] Config Reload is Run during warm-boot up

Type of change

  • Bug fix
  • Configuration change (update something in files/ or device/)

Back port request

  • 202505
  • 202511

Approach

What is the motivation for this PR?

With ZTP enabled on 202511 images, upgrade path tests that delete config_db.json trigger ZTP's fallback behavior. ZTP generates a new config and runs config reload, wiping management IP. The device then requires console access to recover. During warm boot this is especially dangerous as it disrupts the warm restart flow.

How did you do it?

  1. Added minigraph.xml existence check in do_config_initialization() before ZTP/factory-default logic
  2. Enhanced check_system_warm_boot() to check /proc/cmdline for SONIC_BOOT_TYPE=warm (consistent with all other boot-type detection in the codebase)
  3. Added warm boot guard in boot_config() to skip config initialization and ZTP during warm boot
  4. Added minigraph.xml guard on ZTP restart block in boot_config() to avoid unnecessary ZTP erase when minigraph was applied

How did you verify/test it?

Code review and trace of all boot code paths in config-setup. The minigraph fix follows the same pattern already used in do_config_migration() which correctly prefers minigraph over factory default. The warm boot detection follows the same /proc/cmdline pattern used by docker_image_ctl.j2, syncd_common.sh, and watchdog-control.sh.

…sent

### Description of PR

**Summary:**
When `config_db.json` is missing but `minigraph.xml` is present, the config-setup boot sequence previously triggered ZTP (if enabled), which generates a minimal config and runs `config reload`. This removes the device management IP, requiring console access to recover.

Additionally, during warm boot the config initialization path had no guard — if `config_db.json` was absent, ZTP could still be triggered even though warm boot must preserve the existing running configuration.

**Root cause:**
1. `do_config_initialization()` had no awareness of `minigraph.xml`. It only checked for ZTP or factory default, even when a valid minigraph was available on disk.
2. `check_system_warm_boot()` only checked STATE_DB, which may not be available early in the boot sequence. The canonical `SONIC_BOOT_TYPE=warm` in `/proc/cmdline` (used by all other boot-type detection in the codebase) was not checked.
3. `boot_config()` did not skip config initialization during warm boot, allowing ZTP to trigger inappropriately.

**Fix (3 changes in `config-setup`):**

1. **`do_config_initialization()`**: Check for `minigraph.xml` at the top of the function, before ZTP/factory-default logic. If minigraph is available, use `reload_minigraph` and return early. This aligns with the pattern already used in `do_config_migration()`.

2. **`check_system_warm_boot()`**: Enhanced to check `/proc/cmdline` for `SONIC_BOOT_TYPE=warm` first (the authoritative source set by warm-reboot scripts), then fall back to STATE_DB for compatibility. This is consistent with `getBootType()` used in `docker_image_ctl.j2`, `syncd_common.sh`, and `watchdog-control.sh`.

3. **`boot_config()`**: Added warm boot guard after config migration — during warm boot, skip config initialization and ZTP entirely. Also added `minigraph.xml` guard on the ZTP restart block so ZTP erase/restart is skipped when minigraph was used.

This maintains the config priority order consistent with `do_config_migration()`:
`config_db.json` > `minigraph.xml` > ZTP > factory default.

**Addresses:** ADO 36697420 — `[202511.08] Config Reload is Run during warm-boot up`

### Type of change

- [x] Bug fix
- [ ] Configuration change (update something in `files/` or `device/`)

### Back port request
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
With ZTP enabled on 202511 images, upgrade path tests that delete `config_db.json` trigger ZTP's fallback behavior. ZTP generates a new config and runs `config reload`, wiping management IP. The device then requires console access to recover. During warm boot this is especially dangerous as it disrupts the warm restart flow.

#### How did you do it?
1. Added `minigraph.xml` existence check in `do_config_initialization()` before ZTP/factory-default logic
2. Enhanced `check_system_warm_boot()` to check `/proc/cmdline` for `SONIC_BOOT_TYPE=warm` (consistent with all other boot-type detection in the codebase)
3. Added warm boot guard in `boot_config()` to skip config initialization and ZTP during warm boot
4. Added `minigraph.xml` guard on ZTP restart block in `boot_config()` to avoid unnecessary ZTP erase when minigraph was applied

#### How did you verify/test it?
Code review and trace of all boot code paths in `config-setup`. The minigraph fix follows the same pattern already used in `do_config_migration()` which correctly prefers minigraph over factory default. The warm boot detection follows the same `/proc/cmdline` pattern used by `docker_image_ctl.j2`, `syncd_common.sh`, and `watchdog-control.sh`.
@mssonicbld
Copy link
Collaborator Author

Original PR: #25463

@mssonicbld
Copy link
Collaborator Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant