Rewrite Python version of supervisor-proc-exit-listener in Rust (supervisor-proc-exit-listener-rs) without integration #24508

qiluo-msft · 2025-11-12T08:45:17Z

Why I did it

Rewrite Python version of supervisor-proc-exit-listener in Rust (supervisor-proc-exit-listener-rs)
Add unit test, achieve coverage "73.60% coverage, 184/250 lines covered".
Change executable name to supervisor-proc-exit-listener-rs.
However, the executable is not used in any the docker containers, we will use another PR (#23688) to use it in each docker container.

Before this PR:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  13062 root      20   0  130.3m  33.4m  19.0m S   0.0   0.2   0:00.23 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name swss
  15163 root      20   0  130.3m  32.1m  17.8m S   0.0   0.2   0:00.17 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name eventd
  13170 root      20   0  130.3m  32.1m  17.8m S   0.0   0.2   0:00.26 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name sysmgr
  20784 root      20   0  130.3m  32.1m  17.8m S   0.0   0.2   0:00.23 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name gnmi
  19288 root      20   0  130.3m  32.1m  17.7m S   0.0   0.2   0:00.17 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name bmp
  21357 root      20   0  129.3m  32.0m  17.8m S   0.0   0.2   0:00.12 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name snmp
  13286 root      20   0  130.3m  32.0m  17.7m S   0.0   0.2   0:00.27 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name teamd
  15647 root      20   0  129.3m  31.9m  17.7m S   0.0   0.2   0:00.17 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name bgp
  17187 root      20   0  130.2m  31.9m  17.6m S   0.0   0.2   0:00.15 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name syncd
   5191 root      20   0  129.3m  31.9m  17.6m S   0.0   0.2   0:00.20 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name database
  17893 root      20   0  129.3m  31.9m  17.6m S   0.0   0.2   0:00.13 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name pmon
  21026 root      20   0  129.3m  31.8m  17.6m S   0.0   0.2   0:00.13 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name lldp
  17195 root      20   0  129.3m  31.8m  17.6m S   0.0   0.2   0:00.12 /usr/bin/python3 /usr/local/bin/supervisor-proc-exit-listener --container-name radv

RES	SHR	RssAnon	All
33.4	19	14.4
32.1	17.8	14.3
32.1	17.8	14.3
32.1	17.8	14.3
32.1	17.7	14.4
32	17.8	14.2
32	17.7	14.3
31.9	17.7	14.2
31.9	17.6	14.3
31.9	17.6	14.3
31.9	17.6	14.3
31.8	17.6	14.2
31.8	17.6	14.2
Total		185.7	203.49
Avg	17.79

After this PR:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  20051 root      20   0   94.9m   8.4m   7.7m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name bmp
  16240 root      20   0   94.9m   8.4m   7.7m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name eventd
  17235 root      20   0   94.9m   8.3m   7.6m S   0.0   0.1   0:00.02 /usr/bin/supervisor-proc-exit-listener-rs --container-name syncd
  21093 root      20   0   94.9m   8.3m   7.7m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name gnmi
  16207 root      20   0   94.9m   8.3m   7.6m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name teamd
  16092 root      20   0   94.9m   8.3m   7.6m S   0.0   0.1   0:00.02 /usr/bin/supervisor-proc-exit-listener-rs --container-name sysmgr
  16288 root      20   0   94.9m   8.3m   7.6m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name bgp
  21158 root      20   0   94.9m   8.3m   7.6m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name lldp
  17281 root      20   0   94.9m   8.3m   7.6m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name radv
  16093 root      20   0   94.9m   8.3m   7.6m S   0.0   0.1   0:00.02 /usr/bin/supervisor-proc-exit-listener-rs --container-name swss
  21452 root      20   0   94.9m   8.3m   7.6m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name snmp
  18123 root      20   0   94.9m   8.2m   7.6m S   0.0   0.1   0:00.01 /usr/bin/supervisor-proc-exit-listener-rs --container-name pmon
   7246 root      20   0   94.9m   8.2m   7.5m S   0.0   0.1   0:00.04 /usr/bin/supervisor-proc-exit-listener-rs --container-name database

Memory calc sheet:

RES	SHR	RssAnon	All
8.4	7.7	0.7
8.4	7.7	0.7
8.3	7.6	0.7
8.3	7.7	0.6
8.3	7.6	0.7
8.3	7.6	0.7
8.3	7.6	0.7
8.3	7.6	0.7
8.3	7.6	0.7
8.3	7.6	0.7
8.3	7.6	0.7
8.2	7.6	0.6
8.2	7.5	0.7
Total		8.9	16.52
Avg	7.62

The total memory reduction is about 203.49 - 16.52 = 186.97 MB

Work item tracking

Microsoft ADO (number only):

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Build python wheel for sonic-supervisord-utilities, add more test Change executable name to supervisor-proc-exit-listener-rs, config it as listener in all the docker images

…istener

Signed-off-by: Qi Luo <[email protected]>

mssonicbld · 2025-11-12T08:45:25Z

/azp run Azure.sonic-buildimage

azure-pipelines · 2025-11-12T08:45:39Z

Azure Pipelines successfully started running 1 pipeline(s).

saiarcot895 · 2025-11-13T21:29:36Z

src/sonic-dhcp-utilities/tests/test_data/supervisor.conf


 [eventlistener:supervisor-proc-exit-listener]
-command=/usr/local/bin/supervisor-proc-exit-listener --container-name dhcp_relay
+command=/usr/bin/supervisor-proc-exit-listener-rs --container-name dhcp_relay


Why/how does test data need to be updated? #Resolved

Copilot

Pull Request Overview

This PR introduces a Rust implementation of the Python-based supervisor-proc-exit-listener utility, providing significant memory reduction (~187 MB total across all containers). The new binary is named supervisor-proc-exit-listener-rs and includes comprehensive unit tests achieving 73.60% coverage. The implementation maintains functional parity with the Python version while being performance-optimized.

Key Changes

Complete Rust rewrite of the supervisor process exit listener with matching functionality
New test suite with unit tests, integration tests, and mock implementations
Build infrastructure including Cargo configuration, Debian packaging, and Makefiles

Reviewed Changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`src/sonic-supervisord-utilities-rs/tests/test_listener.rs`	Comprehensive unit tests with mocking framework for the listener
`src/sonic-supervisord-utilities-rs/tests/integration_tests.rs`	Integration tests for end-to-end functionality
`src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs`	Core listener implementation with event handling and alerting
`src/sonic-supervisord-utilities-rs/src/childutils.rs`	Supervisor protocol parsing utilities
`src/sonic-supervisord-utilities-rs/src/lib.rs`	Library entry point and module exports
`src/sonic-supervisord-utilities-rs/src/bin/supervisor_proc_exit_listener.rs`	Binary entry point
`src/sonic-supervisord-utilities-rs/Cargo.toml`	Rust package configuration and dependencies
`src/sonic-supervisord-utilities-rs/debian/*`	Debian packaging configuration files
`src/sonic-supervisord-utilities-rs/Makefile`	Build automation and testing targets
`rules/sonic-supervisord-utilities-rs.mk`	SONiC build system integration
`rules/docker-config-engine-bookworm.mk`	Docker build dependency configuration
`src/sonic-supervisord-utilities-rs/tests/test_data/config_db.json`	Test fixture for ConfigDB data
`src/sonic-supervisord-utilities-rs/tests/etc/supervisor/critical_processes`	Test fixture for critical process configuration
`src/sonic-supervisord-utilities-rs/tests/etc/supervisor/watchdog_processes`	Test fixture for watchdog process configuration
`src/sonic-supervisord-utilities-rs/tests/dev/stdin`	Test fixture for supervisor protocol input

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-13T21:31:47Z

src/sonic-supervisord-utilities-rs/tests/integration_tests.rs

+fn create_test_config_db() -> ConfigDBConnector {
+    // Try to create a ConfigDB connector for testing
+    // In test environments this might fail, but we'll let the individual tests handle it
+    ConfigDBConnector::new(false, None).unwrap_or_else(|_| {
+        // If it fails, try with unix socket path
+        ConfigDBConnector::new(true, None).expect("Failed to create test ConfigDB connector with any method")
+    })


The create_test_config_db() function uses unwrap_or_else with expect, which will still panic if the unix socket path connection also fails. Consider returning a Result instead of panicking in a test helper function, allowing individual tests to handle unavailable ConfigDB appropriately:

fn create_test_config_db() -> Result<ConfigDBConnector, Box<dyn std::error::Error>> { ConfigDBConnector::new(false, None) .or_else(|_| ConfigDBConnector::new(true, None)) }

Suggested change

fn create_test_config_db() -> ConfigDBConnector {

// Try to create a ConfigDB connector for testing

// In test environments this might fail, but we'll let the individual tests handle it

ConfigDBConnector::new(false, None).unwrap_or_else(|_| {

// If it fails, try with unix socket path

ConfigDBConnector::new(true, None).expect("Failed to create test ConfigDB connector with any method")

})

fn create_test_config_db() -> Result<ConfigDBConnector, Box<dyn std::error::Error>> {

// Try to create a ConfigDB connector for testing

// In test environments this might fail, but we'll let the individual tests handle it

ConfigDBConnector::new(false, None)

.or_else(|_| ConfigDBConnector::new(true, None))

.map_err(|e| Box::new(e) as Box<dyn std::error::Error>)

``` #WontFix

Panic in this situation in a testcase is okay, the test fails anyway.

Copilot · 2025-11-13T21:31:48Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+        // Test InjectorPP mocking of nix::sys::signal::kill function
+        let mut injector = InjectorPP::new();
+
+        // Mock the kill function to track calls
+        injector
+            .when_called(injectorpp::func!(fn (nix::sys::signal::kill)(nix::unistd::Pid, nix::sys::signal::Signal) -> Result<(), nix::errno::Errno>))
+            .will_execute(injectorpp::fake!(
+                func_type: fn(_pid: nix::unistd::Pid, _signal: nix::sys::signal::Signal) -> Result<(), nix::errno::Errno>,
+                returns: Ok(())
+            ));
+
+        // Test that kill would be called with correct parameters
+        let test_pid = nix::unistd::Pid::from_raw(1234);
+        let result = nix::sys::signal::kill(test_pid, nix::sys::signal::Signal::SIGTERM);
+
+        assert!(result.is_ok(), "Mocked kill should succeed");
+    }


The test test_injectorpp_kill_mocking_basic sets up a mock for nix::sys::signal::kill but doesn't actually verify that the mock was called or test any real functionality. This test only verifies the mock setup works. Consider either:

Adding assertions to verify the mock behavior

Removing this test if it's redundant with other tests that actually exercise the mocked functionality

Adding a comment explaining this is a basic smoke test for the mocking framework #Resolved

Copilot · 2025-11-13T21:31:48Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+    fn test_critical_process_with_autorestart_disabled_no_kill() {
+        // Test that when a critical process exits and auto-restart is disabled, kill() is NOT called
+        let mut injector = InjectorPP::new();
+        let mut kill_call_count = 0;


Unused variable: kill_call_count is declared and initialized to 0 but never incremented or used. This appears to be leftover from an earlier implementation. Remove this unused variable.

Suggested change

let mut kill_call_count = 0;

``` #Resolved

Copilot · 2025-11-13T21:31:48Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+    fn poll(&mut self, events: &mut mio::Events, _timeout: Option<std::time::Duration>) -> std::io::Result<()> {
+        // We need to actually add an event to make the main loop detect stdin_ready = true
+        // Since mio::Event is not easily constructible, we'll use a different approach
+
+        // Create a temporary pipe just to generate a real mio event
+        let (read_fd, write_fd) = nix::unistd::pipe()?;
+        let mut source = mio::unix::SourceFd(&read_fd);
+
+        // Write something to the pipe to make it readable
+        nix::unistd::write(write_fd, &[1])?;
+        nix::unistd::close(write_fd)?;
+
+        // Create a temporary registry and poll to generate a real event
+        let mut temp_poll = mio::Poll::new()?;
+        temp_poll.registry().register(&mut source, mio::Token(0), mio::Interest::READABLE)?;
+
+        // Poll the temporary poll to get real events
+        temp_poll.poll(events, Some(std::time::Duration::from_millis(1)))?;
+
+        // Clean up
+        nix::unistd::close(read_fd)?;
+
+        Ok(())


The MockPoller::poll implementation creates temporary file descriptors in each call but never closes them in the error paths. If temp_poll.registry().register() or temp_poll.poll() fails, the read_fd will leak. Consider wrapping the file descriptor cleanup in a RAII guard or ensuring cleanup happens in all code paths.

// Clean up should happen even on error let _read_fd_guard = scopeguard::guard(read_fd, |fd| { let _ = nix::unistd::close(fd); }); ``` #Resolved

Copilot · 2025-11-13T21:31:48Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+        Ok(())
+    }
+
+    fn register(&self, _stdin_fd: std::os::unix::io::RawFd) -> std::io::Result<()> {


The MockPoller::register method signature doesn't match the Poller trait. The trait expects fn register(&self, stdin_fd: RawFd, token: Token) but this implementation only has _stdin_fd parameter. Add the missing token parameter:

fn register(&self, _stdin_fd: std::os::unix::io::RawFd, _token: mio::Token) -> std::io::Result<()> {

Suggested change

fn register(&self, _stdin_fd: std::os::unix::io::RawFd) -> std::io::Result<()> {

fn register(&self, _stdin_fd: std::os::unix::io::RawFd, _token: mio::Token) -> std::io::Result<()> {

``` #Resolved

Copilot · 2025-11-13T21:31:49Z

src/sonic-supervisord-utilities-rs/debian/changelog

+  * Rust implementation of SONiC supervisor process exit listener
+  * Performance-optimized port of Python sonic-supervisord-utilities
+
+ -- SONiC Team <[email protected]>  Sat, 03 Aug 2025 00:00:00 -0800


The date format in the changelog is incorrect. August 3, 2025 is a Sunday, not a Saturday. The day should be Sun instead of Sat.

Suggested change

-- SONiC Team <[email protected]> Sat, 03 Aug 2025 00:00:00 -0800

-- SONiC Team <[email protected]> Sun, 03 Aug 2025 00:00:00 -0800

``` #Resolved

Copilot · 2025-11-13T21:31:49Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+        let mut kill_called = false;
+        let mut kill_pid = nix::unistd::Pid::from_raw(0);
+        let mut kill_signal = nix::sys::signal::Signal::SIGTERM;


Unused variables: kill_called, kill_pid, and kill_signal are declared but never used in the test. These appear to be leftover from an earlier implementation. Remove these unused variables:

fn test_critical_process_with_autorestart_enabled_calls_kill() { let mut injector = InjectorPP::new(); // Mock kill function to capture calls injector .when_called(...)

Suggested change

let mut kill_called = false;

let mut kill_pid = nix::unistd::Pid::from_raw(0);

let mut kill_signal = nix::sys::signal::Signal::SIGTERM;

``` #Resolved

Copilot · 2025-11-13T21:31:49Z

src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs

+use nix::sys::signal::{self, Signal};
+use nix::unistd::getppid;
+use std::collections::HashMap;
+use std::collections::HashMap as StdHashMap;


Duplicate import: HashMap is imported twice on lines 10 and 11. Remove the redundant import on line 11:

use std::collections::HashMap;

Suggested change

use std::collections::HashMap as StdHashMap;

``` #Resolved

Copilot · 2025-11-13T21:31:50Z

src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs

+/// Get current monotonic time as seconds since an arbitrary epoch - helper function
+pub fn get_current_time() -> f64 {
+    static START_TIME: std::sync::OnceLock<Instant> = std::sync::OnceLock::new();
+    let start = START_TIME.get_or_init(|| Instant::now());
+    start.elapsed().as_secs_f64()
+}


The function comment says "helper function" but this is a public API (pub fn). Consider adding proper documentation with /// doc comments explaining the return value and any important behavior (like the fact that it returns time since an arbitrary epoch, not wall clock time). #Resolved

Copilot · 2025-11-13T21:31:50Z

src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs

+                if elapsed_secs >= ALERTING_INTERVAL_SECS as f64 {
+                    let elapsed_mins = (elapsed_secs / 60.0) as u64;
+                    process_info.insert("last_alerted".to_string(), current_time);
+                    let current_dead_minutes = process_info.get("dead_minutes").unwrap_or(&0.0);
+                    let new_dead_minutes = current_dead_minutes + elapsed_mins as f64;
+                    process_info.insert("dead_minutes".to_string(), new_dead_minutes);
+
+                    generate_alerting_message(process_name, "not running", new_dead_minutes as u64, Severity::LOG_ERR);
+                }


The elapsed time calculation elapsed_secs >= ALERTING_INTERVAL_SECS as f64 could result in precision issues due to float comparison. Consider using a small epsilon for comparison or ensuring the interval is slightly less than the actual threshold to avoid edge cases where alerts might be skipped due to floating-point precision. #Resolved

precision issues does not matter in this use case.

mssonicbld · 2025-11-13T22:59:03Z

/azp run Azure.sonic-buildimage

azure-pipelines · 2025-11-13T22:59:17Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2025-11-14T01:59:07Z

/azp run Azure.sonic-buildimage

azure-pipelines · 2025-11-14T01:59:18Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull Request Overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-14T02:02:13Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+            .when_called(injectorpp::func!(fn (sonic_supervisord_utilities_rs::proc_exit_listener::get_current_time)() -> f64))
+            .will_execute(injectorpp::fake!(
+                func_type: fn() -> f64,
+                returns: get_mock_time() // This will be called each time and return progressive time


The get_mock_time() function is called during test setup (at line 284), but injectorpp mocking evaluates the return value once at setup time, not on each call. This means all calls to get_current_time() will return the same value rather than progressive times. The comment incorrectly states 'This will be called each time and return progressive time'. Either the mocking approach needs to be changed to support progressive time, or the comment should be corrected.

Suggested change

returns: get_mock_time() // This will be called each time and return progressive time

calls: || get_mock_time() // This closure will be called each time and return progressive time

``` #Resolved

Invalid comment.
The output of cargo test -- --nocapture shows progressive time.

READY RESULT 2 OKREADY DEBUG get_mock_time(): call #1, start_time=1609459200, counter=0, result=1609459200 DEBUG get_mock_time(): call #2, start_time=1609459200, counter=1, result=1609462800 RESULT 2 OKREADY DEBUG get_mock_time(): call #3, start_time=1609459200, counter=2, result=1609466400 DEBUG get_mock_time(): call #4, start_time=1609459200, counter=3, result=1609470000 RESULT 2 OKREADY DEBUG get_mock_time(): call #5, start_time=1609459200, counter=4, result=1609473600 ✓ Main function processed stdin successfully test tests::test_main_swss_success ... ok

Copilot · 2025-11-14T02:02:13Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+            .when_called(injectorpp::func!(fn (nix::sys::signal::kill)(nix::unistd::Pid, nix::sys::signal::Signal) -> Result<(), nix::errno::Errno>))
+            .will_execute(injectorpp::fake!(
+                func_type: fn(_pid: nix::unistd::Pid, _signal: nix::sys::signal::Signal) -> Result<(), nix::errno::Errno>,
+                returns: panic!("kill() should NOT be called for snmp container with auto-restart disabled")


Using panic! in mock return values is unconventional. If kill() is called, the panic message won't be displayed clearly. Consider using a flag or counter to track whether kill() was called, then assert after the test completes. For example: let kill_called = Arc::new(AtomicBool::new(false)); and check its value after the test.

Copilot · 2025-11-14T02:02:13Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+            .when_called(injectorpp::func!(fn (nix::sys::signal::kill)(nix::unistd::Pid, nix::sys::signal::Signal) -> Result<(), nix::errno::Errno>))
+            .will_execute(injectorpp::fake!(
+                func_type: fn(_pid: nix::unistd::Pid, _signal: nix::sys::signal::Signal) -> Result<(), nix::errno::Errno>,
+                returns: panic!("kill() should not be called when auto-restart is disabled")


Same issue as in test_main_snmp - using panic! in mock return values is not ideal. Consider using a flag or counter to verify kill() was not called, then assert after the test completes.

Copilot · 2025-11-14T02:02:14Z

src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs

+                    let elapsed_mins = (elapsed_secs / 60.0) as u64;
+                    process_info.insert("last_alerted".to_string(), current_time);
+                    let current_dead_minutes = process_info.get("dead_minutes").unwrap_or(&0.0);
+                    let new_dead_minutes = current_dead_minutes + elapsed_mins as f64;
+                    process_info.insert("dead_minutes".to_string(), new_dead_minutes);
+
+                    generate_alerting_message(process_name, "not running", new_dead_minutes as u64, Severity::LOG_ERR);


The dead_minutes calculation is incorrect. It should track the total time the process has been dead, not accumulate elapsed intervals. This line adds the current elapsed_mins to the previous dead_minutes, causing exponential growth. It should be: let new_dead_minutes = (current_time - first_dead_time) / 60.0; where first_dead_time is the time when the process was first added to process_under_alerting.

Suggested change

let elapsed_mins = (elapsed_secs / 60.0) as u64;

process_info.insert("last_alerted".to_string(), current_time);

let current_dead_minutes = process_info.get("dead_minutes").unwrap_or(&0.0);

let new_dead_minutes = current_dead_minutes + elapsed_mins as f64;

process_info.insert("dead_minutes".to_string(), new_dead_minutes);

generate_alerting_message(process_name, "not running", new_dead_minutes as u64, Severity::LOG_ERR);

process_info.insert("last_alerted".to_string(), current_time);

// Calculate dead_minutes based on first_dead_time

if let Some(&first_dead_time) = process_info.get("first_dead_time") {

let new_dead_minutes = (current_time - first_dead_time) / 60.0;

process_info.insert("dead_minutes".to_string(), new_dead_minutes);

generate_alerting_message(process_name, "not running", new_dead_minutes as u64, Severity::LOG_ERR);

}

``` #WontFix

Not valid comment. It is mimic the behavior of python version. The dead_minutes calculation is not super accurate, but good enough.

Copilot · 2025-11-14T02:02:14Z

src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs

+    pub use_unix_socket_path: bool,
+}
+
+/// Read the critical processes/group names


The function documentation should describe what the return tuple represents. Consider: /// Returns (group_list, process_list) where group_list contains group names and process_list contains process names from the config file.

Suggested change

/// Read the critical processes/group names

/// Read the critical processes/group names.

///

/// Returns (group_list, process_list) where `group_list` contains group names and `process_list` contains process names from the config file.

Copilot · 2025-11-14T02:02:14Z

src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs

+    ALERTING_INTERVAL_SECS as f64
+}
+
+/// Publish events


Missing documentation for this public function. Should describe what events are published and for what purpose.

Suggested change

/// Publish events

/// Publishes a process exit event to the event publisher.

///

/// This function sends an event with the tag specified by `EVENTS_PUBLISHER_TAG` to the provided

/// `events_handle`. The event includes the process name and container name as parameters.

///

/// # Parameters

/// - `events_handle`: The event publisher handle used to send the event.

/// - `process_name`: The name of the process for which the event is being published.

/// - `container_name`: The name of the container in which the process is running.

///

/// # Events Published

/// Publishes an event tagged with `EVENTS_PUBLISHER_TAG` containing the process and container names.

/// This is typically used to notify other components or systems that a process has exited or requires

/// attention, enabling monitoring or alerting mechanisms to respond accordingly.

Copilot · 2025-11-14T02:02:15Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+        let time1 = get_mock_time(); // Should return start_time (1609459200.0)
+        let time2 = get_mock_time(); // Should return start_time + 3600.0
+        let time3 = get_mock_time(); // Should return start_time + 7200.0
+        let time4 = get_mock_time(); // Should return start_time + 10800.0


Magic numbers 3600.0, 7200.0, 10800.0 should be defined as constants or computed as multiples of a base interval. For example: const HOUR_SECS: f64 = 3600.0; and then use start_time + HOUR_SECS, start_time + 2.0 * HOUR_SECS, etc.

Copilot · 2025-11-14T02:02:15Z

src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs

+    Ok(is_auto_restart.unwrap().clone())
+}
+
+/// Load heartbeat alert intervals from ConfigDB


Missing documentation for this public function. Should describe what heartbeat alert intervals are loaded and from where.

Suggested change

/// Load heartbeat alert intervals from ConfigDB

/// Loads heartbeat alert intervals for monitored processes from the ConfigDB.

///

/// This function retrieves the `HEARTBEAT` table from the ConfigDB, which contains

/// per-process configuration for heartbeat monitoring. For each process entry,

/// it reads the `alert_interval` field (in milliseconds), converts it to seconds,

/// and stores the value in a global mapping used for alerting logic.

///

/// The function sets a global flag to indicate that heartbeat alert intervals

/// have been initialized. If the `HEARTBEAT` table is empty or missing, no intervals

/// are loaded and the mapping remains unchanged.

///

/// # Arguments

///

/// * `config_db` - A reference to an object implementing `ConfigDBTrait` used to access ConfigDB.

///

/// # Errors

///

/// Returns an error if the `HEARTBEAT` table cannot be retrieved or if the initialization flag cannot be set.

Copilot · 2025-11-14T02:02:15Z

src/sonic-supervisord-utilities-rs/src/proc_exit_listener.rs

+            if let Some(&last_alerted) = process_info.get("last_alerted") {
+                let elapsed_secs = current_time - last_alerted;
+                if elapsed_secs >= ALERTING_INTERVAL_SECS as f64 {
+                    let elapsed_mins = (elapsed_secs / 60.0) as u64;


The constant 60.0 (seconds per minute) appears multiple times in the code. Consider defining it as a named constant: const SECS_PER_MINUTE: f64 = 60.0;

Copilot · 2025-11-14T02:02:16Z

src/sonic-supervisord-utilities-rs/tests/test_listener.rs

+/// Get progressive time
+/// Each call returns start_time + (call_count * 3600) seconds (1 hour increment)
+pub fn get_mock_time() -> f64 {
+    let start_time = *TIME_MOCK_START.get().unwrap_or(&1609459200.0); // Default: 2021-01-01 00:00:00 UTC


The magic number 1609459200.0 (Unix timestamp) should be defined as a named constant with a descriptive comment. For example: const DEFAULT_TEST_START_TIME: f64 = 1609459200.0; // 2021-01-01 00:00:00 UTC

saiarcot895 · 2025-11-14T18:27:44Z

rules/sonic-supervisord-utilities-rs.mk

Please create the .dep file for this as well. #Resolved

saiarcot895 · 2025-11-14T18:30:55Z

src/sonic-supervisord-utilities-rs/debian/control

+Maintainer: SONiC Team <[email protected]>
+Section: net
+Priority: optional
+Build-Depends: debhelper (>= 12)


Suggested change

Build-Depends: debhelper (>= 12)

Build-Depends: debhelper-compat (= 13)

``` #Resolved

saiarcot895 · 2025-11-14T18:31:30Z

src/sonic-supervisord-utilities-rs/debian/compat

If doing the debhelper-compat change, please remove this file to avoid conflicts. #Resolved

mssonicbld · 2025-11-15T20:09:36Z

/azp run Azure.sonic-buildimage

azure-pipelines · 2025-11-15T20:09:50Z

Azure Pipelines successfully started running 1 pipeline(s).

qiluo-msft added 14 commits August 12, 2025 22:15

Move supervisor-proc-exit-listener to a wheel folder, add unit test

8ed5801

Build python wheel for sonic-supervisord-utilities, add more test Change executable name to supervisor-proc-exit-listener-rs, config it as listener in all the docker images

Revert src/sonic-supervisord-utilities/scripts/supervisor-proc-exit-l…

502c564

…istener

Remove unused dependencies, move some to dev-dependencies

72a8afa

Signed-off-by: Qi Luo <[email protected]>

Simplify childutils function signature

1407c37

Simplify main()

f9b1e7b

Use monotonic time function

fc49c2f

Update injectorpp rev in dev-dependencies

dfe18fc

Use log crate instead of tracing

d93a8a1

Signed-off-by: Qi Luo <[email protected]>

Implement select on stdin

3f2e7e7

Signed-off-by: Qi Luo <[email protected]>

Add unit test for polling

5e6876c

Rename filename

23efe21

Use severity enum

7cb822f

Remove hardcode Token(0)

f18f9ac

Revert back supervisord.conf files

033eaa3

qiluo-msft requested a review from saiarcot895 November 13, 2025 21:27

qiluo-msft marked this pull request as ready for review November 13, 2025 21:27

qiluo-msft requested a review from xumia as a code owner November 13, 2025 21:27

Copilot AI review requested due to automatic review settings November 13, 2025 21:27

qiluo-msft requested a review from lguohan as a code owner November 13, 2025 21:27

Copilot started reviewing on behalf of qiluo-msft November 13, 2025 21:27 View session

saiarcot895 reviewed Nov 13, 2025

View reviewed changes

Copilot finished reviewing on behalf of qiluo-msft November 13, 2025 21:30

Copilot AI reviewed Nov 13, 2025

View reviewed changes

qiluo-msft added 4 commits November 13, 2025 22:24

Revert back src/sonic-dhcp-utilities/tests/test_data/supervisor.conf

398f43f

Fix build warnings: unused variables

3b276fb

Fix build error: cannot find value LOG_WARNING in crate syslog

3d1bcb1

Fix build error on register function

d89d64a

qiluo-msft added 3 commits November 13, 2025 23:06

Simplify HashMap usage

93ff7da

Use scopeguard in unit test

f778248

Remove extra testcase

6788f7f

Copilot AI review requested due to automatic review settings November 14, 2025 01:59

Copilot started reviewing on behalf of qiluo-msft November 14, 2025 01:59 View session

Copilot finished reviewing on behalf of qiluo-msft November 14, 2025 02:00

Copilot AI reviewed Nov 14, 2025

View reviewed changes

saiarcot895 reviewed Nov 14, 2025

View reviewed changes

qiluo-msft added 2 commits November 15, 2025 20:07

Add missing .dep file

f57f590

Refine debian/control Build-Depends

ab864ad

	fn register(&self, _stdin_fd: std::os::unix::io::RawFd) -> std::io::Result<()> {
	fn register(&self, _stdin_fd: std::os::unix::io::RawFd, _token: mio::Token) -> std::io::Result<()> {
	``` #Resolved

	-- SONiC Team <[email protected]> Sat, 03 Aug 2025 00:00:00 -0800
	-- SONiC Team <[email protected]> Sun, 03 Aug 2025 00:00:00 -0800
	``` #Resolved

	returns: get_mock_time() // This will be called each time and return progressive time
	calls: \|\| get_mock_time() // This closure will be called each time and return progressive time
	``` #Resolved

-/// Read the critical processes/group names
+/// Read the critical processes/group names.
+///
+/// Returns (group_list, process_list) where `group_list` contains group names and `process_list` contains process names from the config file.

-/// Publish events
+/// Publishes a process exit event to the event publisher.
+///
+/// This function sends an event with the tag specified by `EVENTS_PUBLISHER_TAG` to the provided
+/// `events_handle`. The event includes the process name and container name as parameters.
+///
+/// # Parameters
+/// - `events_handle`: The event publisher handle used to send the event.
+/// - `process_name`: The name of the process for which the event is being published.
+/// - `container_name`: The name of the container in which the process is running.
+///
+/// # Events Published
+/// Publishes an event tagged with `EVENTS_PUBLISHER_TAG` containing the process and container names.
+/// This is typically used to notify other components or systems that a process has exited or requires
+/// attention, enabling monitoring or alerting mechanisms to respond accordingly.

-/// Load heartbeat alert intervals from ConfigDB
+/// Loads heartbeat alert intervals for monitored processes from the ConfigDB.
+///
+/// This function retrieves the `HEARTBEAT` table from the ConfigDB, which contains
+/// per-process configuration for heartbeat monitoring. For each process entry,
+/// it reads the `alert_interval` field (in milliseconds), converts it to seconds,
+/// and stores the value in a global mapping used for alerting logic.
+///
+/// The function sets a global flag to indicate that heartbeat alert intervals
+/// have been initialized. If the `HEARTBEAT` table is empty or missing, no intervals
+/// are loaded and the mapping remains unchanged.
+///
+/// # Arguments
+///
+/// * `config_db` - A reference to an object implementing `ConfigDBTrait` used to access ConfigDB.
+///
+/// # Errors
+///
+/// Returns an error if the `HEARTBEAT` table cannot be retrieved or if the initialization flag cannot be set.

	Build-Depends: debhelper (>= 12)
	Build-Depends: debhelper-compat (= 13)
	``` #Resolved

Rewrite Python version of supervisor-proc-exit-listener in Rust (supervisor-proc-exit-listener-rs) without integration #24508

Are you sure you want to change the base?

Rewrite Python version of supervisor-proc-exit-listener in Rust (supervisor-proc-exit-listener-rs) without integration #24508

Uh oh!

Conversation

qiluo-msft commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why I did it

Work item tracking

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Uh oh!

mssonicbld commented Nov 12, 2025

Uh oh!

azure-pipelines bot commented Nov 12, 2025

Uh oh!

saiarcot895 Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qiluo-msft Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025 • edited by qiluo-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qiluo-msft Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

mssonicbld commented Nov 13, 2025

Uh oh!

azure-pipelines bot commented Nov 13, 2025

qiluo-msft commented Nov 12, 2025 •

edited

Loading

saiarcot895 Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 13, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 14, 2025 •

edited by qiluo-msft

Loading

Copilot AI Nov 14, 2025 •

edited by qiluo-msft

Loading

saiarcot895 Nov 14, 2025 •

edited by qiluo-msft

Loading

saiarcot895 Nov 14, 2025 •

edited by qiluo-msft

Loading

saiarcot895 Nov 14, 2025 •

edited by qiluo-msft

Loading