Skip to content

Comments

service_checker should consider k8s rollout containers#25660

Open
qiluo-msft wants to merge 4 commits intosonic-net:masterfrom
qiluo-msft:qiluo/k8s_servicechecker
Open

service_checker should consider k8s rollout containers#25660
qiluo-msft wants to merge 4 commits intosonic-net:masterfrom
qiluo-msft:qiluo/k8s_servicechecker

Conversation

@qiluo-msft
Copy link
Collaborator

Why I did it

K8s rollout containers have long names, and we will use container label to identify its original container/service names such as "restapi". This PR will extend service_checker to respect either image native containers or k8s rollout containers for the same service.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Check all containers labels.

How to verify it

Unit test. Manually verified on a sonic device with k8s rollout restapi container.

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Copilot AI review requested due to automatic review settings February 25, 2026 01:39
@qiluo-msft qiluo-msft requested a review from lguohan as a code owner February 25, 2026 01:39
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends ServiceChecker in src/system-health to treat Kubernetes rollout containers as “running” services by deriving the service/container name from Kubernetes labels, and adds unit tests covering k8s-labeled containers and mixed (Docker + k8s) scenarios.

Changes:

  • Update ServiceChecker.get_current_running_containers() to recognize k8s-managed containers via io.kubernetes.* labels and use the label container name.
  • Add unit tests for k8s-labeled containers and mixed container environments.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/system-health/health_checker/service_checker.py Adds logic to map running k8s containers to their service name via Kubernetes labels.
src/system-health/tests/test_system_health.py Adds tests for k8s-labeled containers and mixed Docker/k8s container lists.
Comments suppressed due to low confidence (1)

src/system-health/tests/test_system_health.py:491

  • test_service_checker_mixed_containers also mocks _get_container_folder and run_command in a way that doesn’t validate how k8s containers are actually executed/inspected. To exercise the intended behavior, it would help to assert the code uses the k8s_... container name/ID for docker operations while still reporting the service as acms via the label.
@patch('swsscommon.swsscommon.ConfigDBConnector.connect', MagicMock())
@patch('health_checker.service_checker.ServiceChecker._get_container_folder', MagicMock(return_value=test_path))
@patch('sonic_py_common.multi_asic.is_multi_asic', MagicMock(return_value=False))
@patch('docker.DockerClient')
@patch('health_checker.utils.run_command')
@patch('swsscommon.swsscommon.ConfigDBConnector')
def test_service_checker_mixed_containers(mock_config_db, mock_run, mock_docker_client):
    """Test that service checker handles both regular Docker and Kubernetes containers"""
    mock_db_data = MagicMock()
    mock_get_table = MagicMock()
    mock_db_data.get_table = mock_get_table
    mock_config_db.return_value = mock_db_data
    mock_get_table.return_value = {
        'snmp': {
            'state': 'enabled',
            'has_global_scope': 'True',
            'has_per_asic_scope': 'False',
        },
        'acms': {
            'state': 'enabled',
            'has_global_scope': 'True',
            'has_per_asic_scope': 'False',
        }
    }
    
    mock_containers = MagicMock()
    
    # Regular Docker container
    mock_snmp_container = MagicMock()
    mock_snmp_container.name = 'snmp'
    mock_snmp_container.labels = {}
    
    # Kubernetes container
    mock_acms_container = MagicMock()
    mock_acms_container.name = 'k8s_acms_acms-pod-test_sonic_12345678_0'
    mock_acms_container.labels = {
        'io.kubernetes.pod.namespace': 'sonic',
        'io.kubernetes.docker.type': 'container',
        'io.kubernetes.container.name': 'acms'
    }
    
    mock_containers.list = MagicMock(return_value=[mock_snmp_container, mock_acms_container])
    mock_docker_client_object = MagicMock()
    mock_docker_client.return_value = mock_docker_client_object
    mock_docker_client_object.containers = mock_containers
    
    mock_run.return_value = mock_supervisorctl_output
    
    checker = ServiceChecker()
    config = Config()
    checker.check(config)
    
    # Verify both types of containers are recognized
    running_containers = checker.get_current_running_containers()
    assert 'snmp' in running_containers
    assert 'acms' in running_containers
    
    # Verify both are added to critical processes
    assert 'snmp' in checker.container_critical_processes
    assert 'acms' in checker.container_critical_processes

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI review requested due to automatic review settings February 25, 2026 06:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (3)

src/system-health/health_checker/service_checker.py:166

  • The logic assumes that all Kubernetes containers will have namespace set to "sonic". However, if a container has some Kubernetes labels but a different namespace (or no namespace), it will still be processed as a regular Docker container and undergo critical process checking. Consider adding a check to handle containers with partial Kubernetes labels more gracefully, or document the assumption that only containers in the "sonic" namespace are considered Kubernetes-managed.
                if ns == "sonic":
                    # Kubernetes-managed container - add service name to running containers
                    # but skip critical process checking (k8s has its own health mechanisms)
                    if dtype == "container" and kname and kname not in ("<no value>", "POD"):
                        running_containers.add(kname)
                    continue

src/system-health/health_checker/service_checker.py:164

  • The check for kname not in ("<no value>", "POD") is defensive but lacks documentation. Consider adding a comment explaining where the "" string might come from (e.g., specific Kubernetes versions or configurations that might return this placeholder value), or if this was observed in practice, to help future maintainers understand this edge case.
                    if dtype == "container" and kname and kname not in ("<no value>", "POD"):

src/system-health/health_checker/service_checker.py:156

  • While the existing tests will likely still pass, they don't explicitly set the labels attribute on mock containers. Consider updating existing tests (e.g., test_service_checker_single_asic, test_service_checker_multi_asic) to explicitly set mock_container.labels = {} for regular Docker containers. This makes the test expectations clearer and follows the pattern established in the new test_service_checker_mixed_containers test, improving test maintainability and making the distinction between k8s and regular Docker containers more explicit.
                labels = ctr.labels or {}

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).


if ns == "sonic":
# Kubernetes-managed container - add service name to running containers
# but skip critical process checking (k8s has its own health mechanisms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does k8s's health checks verify that required processes are running inside a container (not just the container itself running)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants