Skip to content

Commit b514449

Browse files
authored
[action] [PR:24393] Enhance supervisor-proc-exit-listener script to handle redis-communication failure case gracefully (#1842)
#### Why I did it redis-server is considered as critical process in database container. However, currently supervisor-proc-exit-listener will break if redis-server exits. By design, supervisor-proc-exit-listener should 1. periodically logging a critical process is not running if container auto_restart is disabled 2. kill the container if container auto_restart is enabled This PR will help upervisor-proc-exit-listener achieve designed behavior if redis-server exits ##### Work item tracking - Microsoft ADO **(number only)**: #### How I did it #### How to verify it Kill the redis-server process inside database container. Observing logs: ``` 2025 Oct 31 01:25:27.304563 sonic INFO database#supervisord 2025-10-31 01:25:27,303 WARN exited: redis (terminated by SIGKILL; not expected) 2025 Oct 31 01:25:28.309293 sonic WARNING database#supervisor-proc-exit-listener: Unable to retrieve features table from Config DB: Unable to connect to redis - Connection refused(1): Cannot assign requested address 2025 Oct 31 01:26:28.436837 sonic ERR database#supervisor-proc-exit-listener: Process 'redis' is not running in namespace 'host' (1.0 minutes). 2025 Oct 31 01:27:28.559959 sonic ERR database#supervisor-proc-exit-listener: Process 'redis' is not running in namespace 'host' (2.0 minutes). ``` #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 202205 - [ ] 202211 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 #### Tested branch (Please provide the tested image version) <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
1 parent ab01ef5 commit b514449

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

files/scripts/supervisor-proc-exit-listener

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -96,22 +96,22 @@ def get_autorestart_state(container_name, use_unix_socket_path):
9696
@return: Return the status of auto-restart feature.
9797
"""
9898
config_db = swsscommon.ConfigDBConnector(use_unix_socket_path=use_unix_socket_path)
99-
config_db.connect()
100-
features_table = config_db.get_table(FEATURE_TABLE_NAME)
99+
try:
100+
config_db.connect()
101+
features_table = config_db.get_table(FEATURE_TABLE_NAME)
102+
except RuntimeError as e:
103+
syslog.syslog(syslog.LOG_WARNING, "Unable to retrieve features table from Config DB: {}".format(e))
104+
return ""
105+
101106
if not features_table:
102-
syslog.syslog(syslog.LOG_ERR, "Unable to retrieve features table from Config DB. Exiting...")
103-
sys.exit(2)
107+
syslog.syslog(syslog.LOG_WARNING, "Empyt features table")
108+
return ""
104109

105110
if container_name not in features_table:
106-
syslog.syslog(syslog.LOG_ERR, "Unable to retrieve feature '{}'. Exiting...".format(container_name))
107-
sys.exit(3)
108-
109-
is_auto_restart = features_table[container_name].get('auto_restart')
110-
if not is_auto_restart:
111-
syslog.syslog(
112-
syslog.LOG_ERR, "Unable to determine auto-restart feature status for '{}'. Exiting...".format(container_name))
113-
sys.exit(4)
111+
syslog.syslog(syslog.LOG_WARNING, "Unable to retrieve feature '{}'".format(container_name))
112+
return ""
114113

114+
is_auto_restart = features_table[container_name].get('auto_restart', 'enabled') # Use default if field not found
115115
return is_auto_restart
116116

117117
def publish_events(events_handle, process_name, container_name):
@@ -163,7 +163,7 @@ def main(argv):
163163

164164
if (process_name in critical_process_list or group_name in critical_group_list) and expected == 0:
165165
is_auto_restart = get_autorestart_state(container_name, use_unix_socket_path)
166-
if is_auto_restart != "disabled":
166+
if is_auto_restart == "enabled":
167167
MSG_FORMAT_STR = "Process '{}' exited unexpectedly. Terminating supervisor '{}'"
168168
msg = MSG_FORMAT_STR.format(payload_headers['processname'], container_name)
169169
syslog.syslog(syslog.LOG_INFO, msg)

0 commit comments

Comments
 (0)