In case of warmboot and coldboot, restart sw agent and all …#1000
Open
benoit-nexthop wants to merge 1 commit intofacebook:mainfrom
Open
In case of warmboot and coldboot, restart sw agent and all …#1000benoit-nexthop wants to merge 1 commit intofacebook:mainfrom
benoit-nexthop wants to merge 1 commit intofacebook:mainfrom
Conversation
# Summary
So far, the config CLI used to assume only wedge agent and it used to
reload, restart just the wedge agent. Now with this code, the CLI is
split agent aware.
For hitless, it will reload fboss_sw_agent.
For coldboot, it will stop and start fboss_sw_agent and fboss_hw_agent@*
for warmboot, it will restart fboss_sw_agent and fboss_hw_agent@*
For monolith it will be backward compatible.
## Test Summary
This test suite (`ConfigSessionSystemdTest`) contains 9 tests that
verify the split agent and monolith logic in `ConfigSession`:
### **Split Mode Detection Tests (3 tests)**
- **IsSplitMode_ReturnsTrueWhenSwAgentEnabled** - Verifies
`isSplitMode()` returns `true` when `fboss_sw_agent` service is enabled
- **IsSplitMode_ReturnsFalseWhenSwAgentNotEnabled** - Verifies
`isSplitMode()` returns `false` when `fboss_sw_agent` service is not
enabled
- **IsSplitMode_ReturnsFalseOnException** - Verifies `isSplitMode()`
gracefully handles exceptions and returns `false` (assumes monolithic
mode)
### **Monolithic Mode Restart Tests (2 tests)**
- **RestartService_MonolithicMode_Warmboot** - Verifies warmboot restart
calls `restartService("wedge_agent")` and waits for service to become
active
- **RestartService_MonolithicMode_Coldboot** - Verifies coldboot restart
calls `stopService()` then `startService()` (not `restartService()`) for
`wedge_agent`
### **Split Mode Restart Tests (2 tests)**
- **RestartService_SplitMode_Warmboot_SingleHwAgent** - Verifies
warmboot restart restarts both `fboss_sw_agent` and `fboss_hw_agent@0`
- **RestartService_SplitMode_Coldboot_SingleHwAgent** - Verifies
coldboot restart stops then starts both agents in correct sequence
### **Error Handling Tests (2 tests)**
- **RestartService_PropagatesFailure** - Verifies exceptions from
systemd operations are propagated to caller
- **RestartService_ServiceFailsToStart** - Verifies exceptions from
`waitForServiceActive()` timeout are propagated to caller
**Total: 9 tests** covering split mode detection, service restart logic
for both monolithic and split architectures, and error handling.
**Manual Testing**
**Coldboot**
```
[root@gold221 fboss]# fboss2-dev config session diff
--- current live config
+++ session config
@@ -317,7 +317,7 @@
"blockNeighbors": [],
"exactMatchTableConfigs": [],
"l2AgeTimerSeconds": 300,
- "l2LearningMode": 0,
+ "l2LearningMode": 1,
"macAddrsToBlock": [],
"maxRouteCounterIDs": 0,
"metaMacOuis": [],
[root@gold221 fboss]# fboss2-dev config session commit
I0305 08:27:40.499146 3302 SystemdInterface.cpp:80] fboss_sw_agent is now active
I0305 08:27:46.951436 3302 SystemdInterface.cpp:80] fboss_hw_agent@0 is now active
Config session committed successfully as 52a3af89ad16b0990f8449caff8c9d5a61d1b563 and fboss_sw_agent (coldboot), fboss_hw_agent@0 (coldboot) restarted.
[root@gold221 fboss]# systemctl status fboss_sw_agent
● fboss_sw_agent.service - FBOSS SW Agent
Loaded: loaded (/usr/lib/systemd/system/fboss_sw_agent.service; enabled; preset: disabled)
Active: active (running) since Tue 2026-03-03 14:46:37 UTC; 14s ago
Main PID: 5306 (fboss_sw_agent)
Tasks: 66 (limit: 203230)
CGroup: /system.slice/fboss_sw_agent.service
└─5306 /opt/fboss/bin/fboss_sw_agent
Mar 03 14:46:46 gold221 fboss_sw_agent[5306]: V0303 14:46:46.247107 5480 ThriftHandler.cpp:1752] [0x7f19d40054e0] programInternalPhyPorts thrift request succeeded in 0ms. params: id=53,force=false,
Mar 03 14:46:46 gold221 fboss_sw_agent[5306]: V0303 14:46:46.247139 5488 ThriftHandler.cpp:1777] programInternalPhyPorts for not present Transceiver:51 which doesn't exist in SwitchState. Skip re-programming
Mar 03 14:46:46 gold221 fboss_sw_agent[5306]: V0303 14:46:46.247149 5488 ThriftHandler.cpp:1752] [0x7f19a0002430] programInternalPhyPorts thrift request succeeded in 0ms. params: id=51,force=false,
Mar 03 14:46:46 gold221 fboss_sw_agent[5306]: V0303 14:46:46.247269 5488 ThriftHandler.cpp:1752] [0x7f1a18009590] programInternalPhyPorts thrift request received from ::1 (unknown). params: id=57,force=false,
Mar 03 14:46:46 gold221 fboss_sw_agent[5306]: V0303 14:46:46.247310 5488 ThriftHandler.cpp:1777] programInternalPhyPorts for not present Transceiver:57 which doesn't exist in SwitchState. Skip re-programming
Mar 03 14:46:46 gold221 fboss_sw_agent[5306]: V0303 14:46:46.247320 5488 ThriftHandler.cpp:1752] [0x7f1a18009590] programInternalPhyPorts thrift request succeeded in 0ms. params: id=57,force=false,
Mar 03 14:46:49 gold221 fboss_sw_agent[5306]: V0303 14:46:49.210919 5488 ThriftHandler.cpp:1448] [0x7f1a0c009650] getPortStatus thrift request received from ::1 (unknown)
Mar 03 14:46:49 gold221 fboss_sw_agent[5306]: V0303 14:46:49.211175 5488 ThriftHandler.cpp:1448] [0x7f1a0c009650] getPortStatus thrift request succeeded in 0ms
Mar 03 14:46:51 gold221 fboss_sw_agent[5306]: V0303 14:46:51.216371 5488 ThriftHandler.cpp:2477] [0x7f1a04005120] getConfigAppliedInfo thrift request received from ::1 (unknown)
Mar 03 14:46:51 gold221 fboss_sw_agent[5306]: V0303 14:46:51.216389 5488 ThriftHandler.cpp:2477] [0x7f1a04005120] getConfigAppliedInfo thrift request succeeded in 0ms
[root@gold221 fboss]# systemctl status fboss_hw_agent@0
● fboss_hw_agent@0.service - FBOSS HW Agent 0
Loaded: loaded (/usr/lib/systemd/system/fboss_hw_agent@.service; enabled; preset: disabled)
Active: active (running) since Tue 2026-03-03 14:46:37 UTC; 22s ago
Main PID: 5314 (fboss_hw_agent-)
Tasks: 92 (limit: 203230)
CGroup: /system.slice/system-fboss_hw_agent.slice/fboss_hw_agent@0.service
└─5314 /opt/fboss/bin/fboss_hw_agent-sai_impl --switchIndex 0
Mar 03 14:46:42 gold221 fboss_hw_agent0[5314]: I0303 14:46:42.866061 5371 IpcHealthMonitor.cpp:117] IPC state transition for LinkChangeEventThriftSyncer: DISCONNECTED -> CONNECTING
Mar 03 14:46:42 gold221 fboss_hw_agent0[5314]: I0303 14:46:42.866102 5372 IpcHealthMonitor.cpp:117] IPC state transition for LinkChangeEventThriftSyncer: CONNECTING -> CONNECTED
Mar 03 14:46:42 gold221 fboss_hw_agent0[5314]: V0303 14:46:42.866185 5385 SaiSwitch.cpp:3586] Sending link state change notification for port 19 with oper status: DOWN
Mar 03 14:46:42 gold221 fboss_hw_agent0[5314]: V0303 14:46:42.866295 5385 SaiSwitch.cpp:3586] Sending link state change notification for port 11 with oper status: DOWN
Mar 03 14:46:42 gold221 fboss_hw_agent0[5314]: V0303 14:46:42.866325 5385 SaiSwitch.cpp:3586] Sending link state change notification for port 9 with oper status: DOWN
Mar 03 14:46:42 gold221 fboss_hw_agent0[5314]: V0303 14:46:42.866349 5385 SaiSwitch.cpp:3586] Sending link state change notification for port 1 with oper status: DOWN
Mar 03 14:46:43 gold221 fboss_hw_agent0[5314]: I0303 14:46:43.254064 5371 IpcHealthMonitor.cpp:117] IPC state transition for TxPktEventThriftSyncer: DISCONNECTED -> CONNECTING
Mar 03 14:46:43 gold221 fboss_hw_agent0[5314]: I0303 14:46:43.254109 5373 IpcHealthMonitor.cpp:117] IPC state transition for TxPktEventThriftSyncer: CONNECTING -> CONNECTED
Mar 03 14:46:43 gold221 fboss_hw_agent0[5314]: I0303 14:46:43.283731 5371 IpcHealthMonitor.cpp:117] IPC state transition for HwSwitchStatsSinkClient: DISCONNECTED -> CONNECTING
Mar 03 14:46:43 gold221 fboss_hw_agent0[5314]: I0303 14:46:43.283776 5382 IpcHealthMonitor.cpp:117] IPC state transition for HwSwitchStatsSinkClient: CONNECTING -> CONNECTED
```
**Warmboot**
```
[root@gold221 fboss]# fboss2-dev config interface eth1/1/1 switchport access vlan 2001
Successfully set access VLAN for interface(s) eth1/1/1 to 2001
[root@gold221 fboss]# fboss2-dev config session commit
I0305 08:29:44.989263 3756 SystemdInterface.cpp:80] fboss_sw_agent is now active
I0305 08:29:44.994704 3756 SystemdInterface.cpp:80] fboss_hw_agent@0 is now active
Config session committed successfully as 9e1f4ca133fba9eb30206b93db11c55fab2fd8b3 and fboss_sw_agent (warmboot), fboss_hw_agent@0 (warmboot) restarted.[root@gold221 fboss]# systemctl status fboss_hw_agent@0
● fboss_hw_agent@0.service - FBOSS HW Agent 0
Loaded: loaded (/usr/lib/systemd/system/fboss_hw_agent@.service; enabled; preset: disabled)
Active: active (running) since Tue 2026-03-03 15:15:03 UTC; 5s ago
Main PID: 7913 (fboss_hw_agent-)
Tasks: 92 (limit: 203230)
CGroup: /system.slice/system-fboss_hw_agent.slice/fboss_hw_agent@0.service
└─7913 /opt/fboss/bin/fboss_hw_agent-sai_impl --switchIndex 0
Mar 03 15:15:08 gold221 fboss_hw_agent0[7913]: V0303 15:15:08.446688 7942 SaiSwitch.cpp:3586] Sending link state change notification for port 9 with oper status: DOWN
Mar 03 15:15:08 gold221 fboss_hw_agent0[7913]: V0303 15:15:08.446714 7942 SaiSwitch.cpp:3586] Sending link state change notification for port 1 with oper status: DOWN
Mar 03 15:15:08 gold221 fboss_hw_agent0[7913]: I0303 15:15:08.886051 7928 IpcHealthMonitor.cpp:117] IPC state transition for SwitchReachabilityChangeEventThriftSyncer: DISCONNECTED -> CONNECTING
Mar 03 15:15:08 gold221 fboss_hw_agent0[7913]: I0303 15:15:08.886101 7940 IpcHealthMonitor.cpp:117] IPC state transition for SwitchReachabilityChangeEventThriftSyncer: CONNECTING -> CONNECTED
Mar 03 15:15:09 gold221 fboss_hw_agent0[7913]: I0303 15:15:09.080828 7928 IpcHealthMonitor.cpp:117] IPC state transition for HwSwitchStatsSinkClient: DISCONNECTED -> CONNECTING
Mar 03 15:15:09 gold221 fboss_hw_agent0[7913]: I0303 15:15:09.080871 7939 IpcHealthMonitor.cpp:117] IPC state transition for HwSwitchStatsSinkClient: CONNECTING -> CONNECTED
Mar 03 15:15:09 gold221 fboss_hw_agent0[7913]: I0303 15:15:09.111336 7928 IpcHealthMonitor.cpp:117] IPC state transition for FdbEventThriftSyncer: DISCONNECTED -> CONNECTING
Mar 03 15:15:09 gold221 fboss_hw_agent0[7913]: I0303 15:15:09.111369 7937 IpcHealthMonitor.cpp:117] IPC state transition for FdbEventThriftSyncer: CONNECTING -> CONNECTED
Mar 03 15:15:09 gold221 fboss_hw_agent0[7913]: I0303 15:15:09.144835 7928 IpcHealthMonitor.cpp:117] IPC state transition for RxPktEventThriftSyncer: DISCONNECTED -> CONNECTING
Mar 03 15:15:09 gold221 fboss_hw_agent0[7913]: I0303 15:15:09.144866 7938 IpcHealthMonitor.cpp:117] IPC state transition for RxPktEventThriftSyncer: CONNECTING -> CONNECTED
[root@gold221 fboss]# systemctl status fboss_sw_agent
● fboss_sw_agent.service - FBOSS SW Agent
Loaded: loaded (/usr/lib/systemd/system/fboss_sw_agent.service; enabled; preset: disabled)
Active: active (running) since Tue 2026-03-03 15:14:58 UTC; 16s ago
Main PID: 7828 (fboss_sw_agent)
Tasks: 66 (limit: 203230)
CGroup: /system.slice/fboss_sw_agent.service
└─7828 /opt/fboss/bin/fboss_sw_agent
Mar 03 15:15:11 gold221 fboss_sw_agent[7828]: V0303 15:15:11.260747 8151 ThriftHandler.cpp:1752] [0x7fa6cc003d40] programInternalPhyPorts thrift request received from ::1 (unknown). params: id=42,force=false,
Mar 03 15:15:11 gold221 fboss_sw_agent[7828]: V0303 15:15:11.260805 8151 ThriftHandler.cpp:1777] programInternalPhyPorts for not present Transceiver:42 which doesn't exist in SwitchState. Skip re-programming
Mar 03 15:15:11 gold221 fboss_sw_agent[7828]: V0303 15:15:11.260828 8151 ThriftHandler.cpp:1752] [0x7fa6cc003d40] programInternalPhyPorts thrift request succeeded in 0ms. params: id=42,force=false,
Mar 03 15:15:11 gold221 fboss_sw_agent[7828]: V0303 15:15:11.261004 7867 AclNexthopHandler.cpp:40] aclsChanged: 0
Mar 03 15:15:11 gold221 fboss_sw_agent[7828]: V0303 15:15:11.261019 7867 CowStorageMgr.h:72] [FSDB] # Pending updates 1
Mar 03 15:15:11 gold221 fboss_sw_agent[7828]: V0303 15:15:11.261028 7867 SwSwitch.cpp:2089] Update state took 314us
Mar 03 15:15:11 gold221 fboss_sw_agent[7828]: V0303 15:15:11.261034 7853 CowStorageMgr.h:245] Applied state update Update internal state to publish (processing delay: 14 μs)
Mar 03 15:15:11 gold221 fboss_sw_agent[7828]: V0303 15:15:11.261049 8154 ThriftHandler.cpp:1752] [0x7fa6c4003d40] programInternalPhyPorts thrift request succeeded in 0ms. params: id=30,force=false,
Mar 03 15:15:14 gold221 fboss_sw_agent[7828]: V0303 15:15:14.233359 8154 ThriftHandler.cpp:1448] [0x7fa7480d61c0] getPortStatus thrift request received from ::1 (unknown)
Mar 03 15:15:14 gold221 fboss_sw_agent[7828]: V0303 15:15:14.233726 8154 ThriftHandler.cpp:1448] [0x7fa7480d61c0] getPortStatus thrift request succeeded in 0ms
```
**reload**
```
[root@gold221 fboss]# fboss2-dev config interface eth1/1/1 description abc
Successfully configured interface(s) eth1/1/1: description="abc"
[root@gold221 fboss]# fboss2-dev config session commit
Config session committed successfully as aa71dcfcf21bfde9bffd6caf15892d517edd39de and config reloaded for fboss_sw_agent.
[root@gold221 fboss]#
```
# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Author: hillol-nexthop <hillol@nexthop.ai>
# Date: Fri Mar 13 02:17:55 2026 +0530
#
# On branch cli-split-agent
# Your branch is ahead of 'upstream/main' by 1 commit.
# (use "git push" to publish your local commits)
#
# Changes to be committed:
# modified: cmake/CliFboss2.cmake
# modified: cmake/CliFboss2TestConfig.cmake
# modified: fboss/cli/fboss2/BUCK
# modified: fboss/cli/fboss2/commands/config/session/CmdConfigSessionCommit.cpp
# modified: fboss/cli/fboss2/session/ConfigSession.cpp
# modified: fboss/cli/fboss2/session/ConfigSession.h
# new file: fboss/cli/fboss2/session/SystemdInterface.cpp
# new file: fboss/cli/fboss2/session/SystemdInterface.h
# modified: fboss/cli/fboss2/test/TestableConfigSession.h
# new file: fboss/cli/fboss2/test/config/ConfigSessionSystemdTest.cpp
# new file: fboss/cli/fboss2/test/config/MockSystemdInterface.h
#
# Untracked files:
# .bazelrc
# MONOBUILD_TIMING_ANALYSIS.md
# broadcom-sai-sdk/
# cli-demo.md
# fboss/oss/scripts/oss_test_results.py
# fboss/oss/scripts/tests/test_oss_test_results.py
# fboss_bins.tar.zst
# gha-tail.py
# job-logs.txt
# job-logs2.txt
# monobuild.reusable.yml
#
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
So far, the config CLI used to assume only wedge agent and it used to reload, restart just the wedge agent. Now with this code, the CLI is split agent aware.
For hitless, it will reload fboss_sw_agent.
For coldboot, it will stop and start fboss_sw_agent and fboss_hw_agent@*
for warmboot, it will restart fboss_sw_agent and fboss_hw_agent@*
For monolith it will be backward compatible.
Original change by @hillol-nexthop
Test Summary
This test suite (
ConfigSessionSystemdTest) contains 9 tests that verify the split agent and monolith logic inConfigSession:Split Mode Detection Tests (3 tests)
isSplitMode()returnstruewhenfboss_sw_agentservice is enabledisSplitMode()returnsfalsewhenfboss_sw_agentservice is not enabledisSplitMode()gracefully handles exceptions and returnsfalse(assumes monolithic mode)Monolithic Mode Restart Tests (2 tests)
restartService("wedge_agent")and waits for service to become activestopService()thenstartService()(notrestartService()) forwedge_agentSplit Mode Restart Tests (2 tests)
fboss_sw_agentandfboss_hw_agent@0Error Handling Tests (2 tests)
waitForServiceActive()timeout are propagated to callerTotal: 9 tests covering split mode detection, service restart logic for both monolithic and split architectures, and error handling.
Manual Testing
Coldboot
Warmboot
reload