Rollup kubectl commands to a common library by addyess · Pull Request #31 · canonical/kube-galaxy-test

addyess · 2026-03-05T15:10:20Z

This pull request refactors the codebase to centralize and standardize Kubernetes cluster interactions through the new pkg.utils.client module. The changes replace direct subprocess and shell command invocations with higher-level utility functions, improving error handling, code readability, and maintainability. The refactor touches multiple command handlers, component hooks, testing utilities, and log collection routines, ensuring consistent cluster connectivity checks and resource management.

Cluster Interaction Refactor:

Replaced direct kubectl subprocess calls with utility functions like get_context, get_nodes, wait_for_nodes, and wait_for_pods in cmd/status.py, improving error handling and code clarity. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbR7-R10), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbL42-R57), [[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbL66-L113))
Updated cmd/test.py to use verify_connectivity and get_context for cluster checks, and to catch ClusterError for robust error reporting. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080R11-R12), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080L25-R29), [[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080L60-R46))
Refactored pkg/components/_base.py to use apply_manifest for manifest application, replacing manual shell invocation. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-9284e1434d18989a884f9426dcc87d55b14738c34b8687d5dc275603915530e7R21), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-9284e1434d18989a884f9426dcc87d55b14738c34b8687d5dc275603915530e7L246-R247))
Modified pkg/components/kubeadm.py to use verify_connectivity, wait_for_nodes, and get_api_server_status for cluster verification. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-7ba701716aa2bd4629dd7ccebe02c04ae5ec63da22cfb6fc2d36d40a01d84185R17-R21), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-7ba701716aa2bd4629dd7ccebe02c04ae5ec63da22cfb6fc2d36d40a01d84185L187-R194))

Testing and Namespace Management:

Updated pkg/testing/spread.py to use verify_connectivity, create_namespace, and delete_namespace for test setup and teardown, replacing raw shell commands and improving error handling. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dR21), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL114-R115), [[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL151-R158), [[4]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL182-R175))

Log Collection Improvements:

Refactored pkg/utils/logs.py to use utility functions for collecting cluster info, node data, pod logs, and events, replacing direct shell calls and improving error handling. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL3-L9), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL59-R69), [[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL72-R87), [[4]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL103-R122), [[5]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL145-R134), [[6]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL163-R150))

Testing Updates:

Updated unit tests to mock and verify calls to the new utility functions instead of shell invocations, ensuring test coverage for the refactored logic. ([tests/unit/components/test_container_manifest.pyL171-R183](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-dad66a03932e1d299909c2f5a1a18641d9742d878cdae0408e53d09aadfc5b2cL171-R183))

Copilot

Pull request overview

This PR consolidates all inline kubectl subprocess calls scattered across cmd/, pkg/components/, and pkg/utils/ into a centralized client.py library, replacing raw run(["kubectl", ...]) and subprocess.run(["kubectl", ...]) calls with named, typed wrappers that uniformly raise ClusterError on failure.

Changes:

New src/kube_galaxy/pkg/utils/client.py: 412-line library encapsulating all kubectl operations (connectivity, nodes, pods, logs, events, namespaces, manifests).
Refactor logs.py, spread.py, _base.py, kubeadm.py, cmd/test.py, cmd/status.py: Replace direct run(["kubectl", ...]) / subprocess.run(...) calls with the new client library functions.
Update tests/unit/components/test_container_manifest.py: Adjust the bootstrap_hook test to patch apply_manifest instead of run.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`src/kube_galaxy/pkg/utils/client.py`	New centralized kubectl client library
`src/kube_galaxy/pkg/utils/logs.py`	Use client library for all kubectl calls
`src/kube_galaxy/pkg/testing/spread.py`	Delegate namespace management to client library
`src/kube_galaxy/pkg/components/_base.py`	Use `apply_manifest()` in `bootstrap_hook`
`src/kube_galaxy/pkg/components/kubeadm.py`	Use client library in `verify_hook`
`src/kube_galaxy/cmd/test.py`	Use client library for connectivity/context checks
`src/kube_galaxy/cmd/status.py`	Use client library for node/context display
`tests/unit/components/test_container_manifest.py`	Updated test to patch `apply_manifest`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/kube_galaxy/cmd/status.py

Copilot · 2026-03-05T15:17:08Z

src/kube_galaxy/pkg/utils/client.py

+def get_context_info() -> dict[str, str]:
+    """
+    Get current context information.
+
+    Returns:
+        Dict with 'context' key containing current context name
+
+    Raises:
+        ClusterError: If context info cannot be retrieved
+    """
+    try:
+        context = get_context()
+        return {"context": context}
+    except ClusterError:
+        raise


The get_context_info() function at lines 159–173 is a thin, unused wrapper around get_context() that adds no value. It is not imported or called anywhere in the codebase, and its try/except/raise is a no-op. This dead code should be removed.

Copilot · 2026-03-05T15:17:08Z

src/kube_galaxy/pkg/testing/spread.py

@@ -179,22 +170,9 @@ def _cleanup_test_namespace(namespace: str, timeout: int = 60) -> None:
        ClusterError: If namespace deletion fails
    """
    try:
-        info(f"  Cleaning up namespace: {namespace}")
-
-        # Delete namespace
-        run(
-            ["kubectl", "delete", "namespace", namespace, "--timeout", f"{timeout}s"],
-            check=True,
-        )
-
-        success(f"Namespace deleted: {namespace}")
-
-    except ShellError as exc:
-        # Don't fail if namespace doesn't exist
-        if "not found" in str(exc):
-            warning(f"  Namespace {namespace} not found (may already be deleted)")
-        else:
-            raise ClusterError(f"Failed to delete namespace {namespace}: {exc}") from exc
+        delete_namespace(namespace, timeout)
+    except ClusterError:
+        raise


The _create_test_namespace() and _cleanup_test_namespace() functions both now consist solely of a try: <delegate>() / except ClusterError: raise pattern. This is completely transparent and adds no value — it is equivalent to just calling create_namespace() or delete_namespace() directly without the try/except wrapper. These wrappers can be simplified to a direct call or eliminated entirely.

Copilot · 2026-03-05T15:17:09Z

src/kube_galaxy/pkg/components/_base.py

                    )
-                run(["kubectl", "apply", "-f", str(self.manifest_path)], check=True)
-                info(f"Applied manifest for {comp_name}")
+                apply_manifest(self.manifest_path)


The bootstrap_hook() in _base.py now propagates ClusterError from apply_manifest() when the manifest installation method is CONTAINER_MANIFEST. The rest of the component lifecycle hooks in this file (and throughout the codebase) only raise ComponentError. This breaks the expected error contract for the hook interface. The apply_manifest() call should either catch ClusterError and re-raise it as ComponentError, or the docstring for bootstrap_hook should be updated to document that it may also raise ClusterError.

src/kube_galaxy/pkg/utils/client.py

Copilot · 2026-03-05T15:17:09Z

src/kube_galaxy/cmd/status.py

        pass


 def _verify_cluster_health(timeout: int) -> None:


The except (ClusterError, Exception) pattern is redundant: ClusterError is a subclass of Exception, so listing it separately has no effect. The catch clause is equivalent to except Exception. You should either use just except Exception (to suppress all errors, as the old code did), or keep except ClusterError only if the intent is to only handle cluster-related errors — but currently it silently swallows all other exceptions too because of Exception. The same issue is repeated on line 70.

src/kube_galaxy/cmd/status.py

Copilot · 2026-03-05T15:17:09Z

src/kube_galaxy/pkg/utils/client.py

+def get_nodes(wide: bool = False, namespace: str = "") -> str:
+    """
+    Get nodes information.
+
+    Args:
+        wide: Return wide output (includes internal IP, kernel version, etc.)
+        namespace: Namespace filter (unused for nodes, included for API consistency)
+
+    Returns:
+        Node information as string
+
+    Raises:
+        ClusterError: If node info cannot be retrieved
+    """
+    try:
+        cmd = ["kubectl", "get", "nodes"]
+        if wide:
+            cmd.append("-o")
+            cmd.append("wide")
+        result = run(cmd, check=True, capture_output=True, text=True)
+        return result.stdout
+    except ShellError as exc:
+        raise ClusterError(f"Failed to retrieve nodes: {exc}") from exc


The namespace parameter in get_nodes() is explicitly documented as "unused for nodes, included for API consistency," but accepting a parameter that is never used is a misleading API design that could confuse callers into thinking it has an effect. If the intention is future use or API consistency, this should at least be noted in the docstring or the parameter should be removed until it is actually used.

Copilot · 2026-03-05T15:17:10Z

src/kube_galaxy/pkg/utils/client.py

+
+    Raises:
+        ClusterError: If logs cannot be retrieved (other than missing logs)
+    """
+    try:
+        result = run(
+            ["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"],
+            check=False,
+            capture_output=True,
+            text=True,
+        )
+        # Non-zero exit is OK if pod has no logs; return empty
+        return result.stdout
+    except ShellError as exc:
+        raise ClusterError(f"Failed to retrieve logs for {namespace}/{pod_name}: {exc}") from exc


The except ShellError as exc block in get_pod_logs (lines 332–333) is unreachable dead code. run() only raises ShellError when check=True, but this call uses check=False. The function will always return result.stdout regardless of the kubectl exit code, and the exception handler can never be triggered. Either set check=True and handle failures meaningfully, or remove the dead except block and the ShellError import if not used elsewhere in the file.

Suggested change

Raises:

ClusterError: If logs cannot be retrieved (other than missing logs)

"""

try:

result = run(

["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"],

check=False,

capture_output=True,

text=True,

)

# Non-zero exit is OK if pod has no logs; return empty

return result.stdout

except ShellError as exc:

raise ClusterError(f"Failed to retrieve logs for {namespace}/{pod_name}: {exc}") from exc

"""

result = run(

["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"],

check=False,

capture_output=True,

text=True,

)

# Non-zero exit is OK if pod has no logs; return empty

return result.stdout

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/kube_galaxy/pkg/testing/spread.py:130

In _verify_test_prerequisites, the except ShellError handler no longer catches failures from verify_connectivity(), which now raises ClusterError instead of ShellError. If the cluster is unreachable, the ClusterError propagates uncaught past this handler, bypasses the "Test prerequisites not met" error message, and gets double-wrapped by run_spread_tests's except Exception as "Test execution failed: Failed to connect to cluster: ...".

To fix this, either add ClusterError to the except clause (except (ShellError, ClusterError) as exc), or let the ClusterError propagate naturally and remove the except ShellError wrapper (since connectivity errors are already descriptive).

        verify_connectivity()

        # Check for spread
        info("Verifying spread test framework...")
        result = run(["which", "spread"], check=True, capture_output=True)
        spread_path = result.stdout.strip()
        success(f"Found spread at {spread_path}")

        # Check for lxclient (required by spread)
        info("Verifying lxclient (required by spread)...")
        result = run(["which", "lxc"], check=True, capture_output=True)
        lxclient_path = result.stdout.strip()
        success(f"Found lxclient at {lxclient_path}")

    except ShellError as exc:
        raise ClusterError("Test prerequisites not met") from exc

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-05T15:40:21Z

src/kube_galaxy/pkg/utils/logs.py

+            except ClusterError:
                # Pod might not have logs, skip
                pass


The except ClusterError handler at this line is dead code. get_pod_logs() is implemented with check=False and never raises any exception — it always returns a string. The comment "Pod might not have logs, skip" is also misleading, since that case is already handled inside get_pod_logs itself by returning an empty string. This try/except block can be removed entirely.

Copilot · 2026-03-05T15:40:21Z

src/kube_galaxy/pkg/testing/spread.py

@@ -179,22 +170,9 @@ def _cleanup_test_namespace(namespace: str, timeout: int = 60) -> None:
        ClusterError: If namespace deletion fails
    """
    try:
-        info(f"  Cleaning up namespace: {namespace}")
-
-        # Delete namespace
-        run(
-            ["kubectl", "delete", "namespace", namespace, "--timeout", f"{timeout}s"],
-            check=True,
-        )
-
-        success(f"Namespace deleted: {namespace}")
-
-    except ShellError as exc:
-        # Don't fail if namespace doesn't exist
-        if "not found" in str(exc):
-            warning(f"  Namespace {namespace} not found (may already be deleted)")
-        else:
-            raise ClusterError(f"Failed to delete namespace {namespace}: {exc}") from exc
+        delete_namespace(namespace, timeout)
+    except ClusterError:
+        raise


The try/except ClusterError: raise pattern in _create_test_namespace and _cleanup_test_namespace is redundant — catching an exception only to immediately re-raise it is a no-op. These try/except blocks can be removed entirely, letting the ClusterError propagate naturally to callers.

src/kube_galaxy/pkg/utils/client.py

src/kube_galaxy/cmd/status.py

* Bootstrap project * Test `single-node-no-cni` manifest in PR (#1) Scaffold out the project to create kubeadm clusters on PRs and verify their installation. * Missing custom installation of kubernetes container services * Missing implementation of spread tests * Reduce sudo usage and exec calls (#20) * Install cni-plugins before kubelet starts (#21) * Understand why gh runners don't need cni-plugins * Create component for cni-plugins * Remove early debug * handle all installation methods * Address review comments * Import images via pull or import (#22) * Begin to implement image pull * Adjust hooks to not need arch passed * Bind kubeadm bootstrap to pre-pulled images defined by the manifest * Address review comments * Dynamic hook execution * Update src/kube_galaxy/pkg/components/kubeadm.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Address review comments * Golf --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove unused fields from manifest and components (#23) * Tidy up more of the components (#24) * Tidy up more of the components * Review comment * Parallelize image pulls and install (#25) * First shot deploying with a calico CNI (#26) * First shots deploying with a calico CNI * Validate manifest applications * Validate kube-system is deployed * Update tests/unit/components/test_container_manifest.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply review comments --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Prepare for testing with spread (#28) * Prepare for testing with spread * Ready CNCF validation * Prepared a Sonobouy component * Kubeadm bootstraps with no node taints * Review comments * Single out the kube-galaxy tasks from each component's spread suite. * Update implementation to match documentation (#30) * Rollup kubectl commands to a common library (#31) * Rollup kubectl commands to a common library * Apply review comments * Additional unit tests * Apply review comments --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 15:10

Copilot started reviewing on behalf of addyess March 5, 2026 15:10 View session

Rollup kubectl commands to a common library

628acb2

addyess force-pushed the chore/rollup-kubectl-to-py-client branch from e9cfe31 to 628acb2 Compare March 5, 2026 15:17

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Apply review comments

6fd1650

addyess force-pushed the chore/rollup-kubectl-to-py-client branch from 843bbe0 to 6fd1650 Compare March 5, 2026 15:29

Additional unit tests

5b3dde1

addyess requested a review from Copilot March 5, 2026 15:34

Copilot started reviewing on behalf of addyess March 5, 2026 15:35 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Apply review comments

4b55b22

addyess force-pushed the chore/rollup-kubectl-to-py-client branch from 85f5466 to 4b55b22 Compare March 5, 2026 16:01

addyess merged commit 029202e into project-bootstrap Mar 5, 2026
7 checks passed

addyess deleted the chore/rollup-kubectl-to-py-client branch March 5, 2026 16:30

Conversation

addyess commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

addyess commented Mar 5, 2026 •

edited

Loading