service_checker should consider k8s rollout containers#25660
service_checker should consider k8s rollout containers#25660qiluo-msft wants to merge 4 commits intosonic-net:masterfrom
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Extends ServiceChecker in src/system-health to treat Kubernetes rollout containers as “running” services by deriving the service/container name from Kubernetes labels, and adds unit tests covering k8s-labeled containers and mixed (Docker + k8s) scenarios.
Changes:
- Update
ServiceChecker.get_current_running_containers()to recognize k8s-managed containers viaio.kubernetes.*labels and use the label container name. - Add unit tests for k8s-labeled containers and mixed container environments.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/system-health/health_checker/service_checker.py |
Adds logic to map running k8s containers to their service name via Kubernetes labels. |
src/system-health/tests/test_system_health.py |
Adds tests for k8s-labeled containers and mixed Docker/k8s container lists. |
Comments suppressed due to low confidence (1)
src/system-health/tests/test_system_health.py:491
test_service_checker_mixed_containersalso mocks_get_container_folderandrun_commandin a way that doesn’t validate how k8s containers are actually executed/inspected. To exercise the intended behavior, it would help to assert the code uses thek8s_...container name/ID for docker operations while still reporting the service asacmsvia the label.
@patch('swsscommon.swsscommon.ConfigDBConnector.connect', MagicMock())
@patch('health_checker.service_checker.ServiceChecker._get_container_folder', MagicMock(return_value=test_path))
@patch('sonic_py_common.multi_asic.is_multi_asic', MagicMock(return_value=False))
@patch('docker.DockerClient')
@patch('health_checker.utils.run_command')
@patch('swsscommon.swsscommon.ConfigDBConnector')
def test_service_checker_mixed_containers(mock_config_db, mock_run, mock_docker_client):
"""Test that service checker handles both regular Docker and Kubernetes containers"""
mock_db_data = MagicMock()
mock_get_table = MagicMock()
mock_db_data.get_table = mock_get_table
mock_config_db.return_value = mock_db_data
mock_get_table.return_value = {
'snmp': {
'state': 'enabled',
'has_global_scope': 'True',
'has_per_asic_scope': 'False',
},
'acms': {
'state': 'enabled',
'has_global_scope': 'True',
'has_per_asic_scope': 'False',
}
}
mock_containers = MagicMock()
# Regular Docker container
mock_snmp_container = MagicMock()
mock_snmp_container.name = 'snmp'
mock_snmp_container.labels = {}
# Kubernetes container
mock_acms_container = MagicMock()
mock_acms_container.name = 'k8s_acms_acms-pod-test_sonic_12345678_0'
mock_acms_container.labels = {
'io.kubernetes.pod.namespace': 'sonic',
'io.kubernetes.docker.type': 'container',
'io.kubernetes.container.name': 'acms'
}
mock_containers.list = MagicMock(return_value=[mock_snmp_container, mock_acms_container])
mock_docker_client_object = MagicMock()
mock_docker_client.return_value = mock_docker_client_object
mock_docker_client_object.containers = mock_containers
mock_run.return_value = mock_supervisorctl_output
checker = ServiceChecker()
config = Config()
checker.check(config)
# Verify both types of containers are recognized
running_containers = checker.get_current_running_containers()
assert 'snmp' in running_containers
assert 'acms' in running_containers
# Verify both are added to critical processes
assert 'snmp' in checker.container_critical_processes
assert 'acms' in checker.container_critical_processes
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
src/system-health/health_checker/service_checker.py:166
- The logic assumes that all Kubernetes containers will have namespace set to "sonic". However, if a container has some Kubernetes labels but a different namespace (or no namespace), it will still be processed as a regular Docker container and undergo critical process checking. Consider adding a check to handle containers with partial Kubernetes labels more gracefully, or document the assumption that only containers in the "sonic" namespace are considered Kubernetes-managed.
if ns == "sonic":
# Kubernetes-managed container - add service name to running containers
# but skip critical process checking (k8s has its own health mechanisms)
if dtype == "container" and kname and kname not in ("<no value>", "POD"):
running_containers.add(kname)
continue
src/system-health/health_checker/service_checker.py:164
- The check for
kname not in ("<no value>", "POD")is defensive but lacks documentation. Consider adding a comment explaining where the "" string might come from (e.g., specific Kubernetes versions or configurations that might return this placeholder value), or if this was observed in practice, to help future maintainers understand this edge case.
if dtype == "container" and kname and kname not in ("<no value>", "POD"):
src/system-health/health_checker/service_checker.py:156
- While the existing tests will likely still pass, they don't explicitly set the labels attribute on mock containers. Consider updating existing tests (e.g., test_service_checker_single_asic, test_service_checker_multi_asic) to explicitly set
mock_container.labels = {}for regular Docker containers. This makes the test expectations clearer and follows the pattern established in the new test_service_checker_mixed_containers test, improving test maintainability and making the distinction between k8s and regular Docker containers more explicit.
labels = ctr.labels or {}
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
|
||
| if ns == "sonic": | ||
| # Kubernetes-managed container - add service name to running containers | ||
| # but skip critical process checking (k8s has its own health mechanisms) |
There was a problem hiding this comment.
Does k8s's health checks verify that required processes are running inside a container (not just the container itself running)?
Why I did it
K8s rollout containers have long names, and we will use container label to identify its original container/service names such as "restapi". This PR will extend service_checker to respect either image native containers or k8s rollout containers for the same service.
Work item tracking
How I did it
Check all containers labels.
How to verify it
Unit test. Manually verified on a sonic device with k8s rollout restapi container.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)