Skip to content

Rollup kubectl commands to a common library#31

Merged
addyess merged 4 commits intoproject-bootstrapfrom
chore/rollup-kubectl-to-py-client
Mar 5, 2026
Merged

Rollup kubectl commands to a common library#31
addyess merged 4 commits intoproject-bootstrapfrom
chore/rollup-kubectl-to-py-client

Conversation

@addyess
Copy link
Contributor

@addyess addyess commented Mar 5, 2026

This pull request refactors the codebase to centralize and standardize Kubernetes cluster interactions through the new pkg.utils.client module. The changes replace direct subprocess and shell command invocations with higher-level utility functions, improving error handling, code readability, and maintainability. The refactor touches multiple command handlers, component hooks, testing utilities, and log collection routines, ensuring consistent cluster connectivity checks and resource management.

Cluster Interaction Refactor:

  • Replaced direct kubectl subprocess calls with utility functions like get_context, get_nodes, wait_for_nodes, and wait_for_pods in cmd/status.py, improving error handling and code clarity. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbR7-R10), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbL42-R57), [[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbL66-L113))
  • Updated cmd/test.py to use verify_connectivity and get_context for cluster checks, and to catch ClusterError for robust error reporting. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080R11-R12), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080L25-R29), [[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080L60-R46))
  • Refactored pkg/components/_base.py to use apply_manifest for manifest application, replacing manual shell invocation. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-9284e1434d18989a884f9426dcc87d55b14738c34b8687d5dc275603915530e7R21), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-9284e1434d18989a884f9426dcc87d55b14738c34b8687d5dc275603915530e7L246-R247))
  • Modified pkg/components/kubeadm.py to use verify_connectivity, wait_for_nodes, and get_api_server_status for cluster verification. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-7ba701716aa2bd4629dd7ccebe02c04ae5ec63da22cfb6fc2d36d40a01d84185R17-R21), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-7ba701716aa2bd4629dd7ccebe02c04ae5ec63da22cfb6fc2d36d40a01d84185L187-R194))

Testing and Namespace Management:

  • Updated pkg/testing/spread.py to use verify_connectivity, create_namespace, and delete_namespace for test setup and teardown, replacing raw shell commands and improving error handling. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dR21), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL114-R115), [[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL151-R158), [[4]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL182-R175))

Log Collection Improvements:

  • Refactored pkg/utils/logs.py to use utility functions for collecting cluster info, node data, pod logs, and events, replacing direct shell calls and improving error handling. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL3-L9), [[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL59-R69), [[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL72-R87), [[4]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL103-R122), [[5]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL145-R134), [[6]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL163-R150))

Testing Updates:

  • Updated unit tests to mock and verify calls to the new utility functions instead of shell invocations, ensuring test coverage for the refactored logic. ([tests/unit/components/test_container_manifest.pyL171-R183](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-dad66a03932e1d299909c2f5a1a18641d9742d878cdae0408e53d09aadfc5b2cL171-R183))

Copilot AI review requested due to automatic review settings March 5, 2026 15:10
@addyess addyess force-pushed the chore/rollup-kubectl-to-py-client branch from e9cfe31 to 628acb2 Compare March 5, 2026 15:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates all inline kubectl subprocess calls scattered across cmd/, pkg/components/, and pkg/utils/ into a centralized client.py library, replacing raw run(["kubectl", ...]) and subprocess.run(["kubectl", ...]) calls with named, typed wrappers that uniformly raise ClusterError on failure.

Changes:

  • New src/kube_galaxy/pkg/utils/client.py: 412-line library encapsulating all kubectl operations (connectivity, nodes, pods, logs, events, namespaces, manifests).
  • Refactor logs.py, spread.py, _base.py, kubeadm.py, cmd/test.py, cmd/status.py: Replace direct run(["kubectl", ...]) / subprocess.run(...) calls with the new client library functions.
  • Update tests/unit/components/test_container_manifest.py: Adjust the bootstrap_hook test to patch apply_manifest instead of run.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/kube_galaxy/pkg/utils/client.py New centralized kubectl client library
src/kube_galaxy/pkg/utils/logs.py Use client library for all kubectl calls
src/kube_galaxy/pkg/testing/spread.py Delegate namespace management to client library
src/kube_galaxy/pkg/components/_base.py Use apply_manifest() in bootstrap_hook
src/kube_galaxy/pkg/components/kubeadm.py Use client library in verify_hook
src/kube_galaxy/cmd/test.py Use client library for connectivity/context checks
src/kube_galaxy/cmd/status.py Use client library for node/context display
tests/unit/components/test_container_manifest.py Updated test to patch apply_manifest

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +159 to +173
def get_context_info() -> dict[str, str]:
"""
Get current context information.

Returns:
Dict with 'context' key containing current context name

Raises:
ClusterError: If context info cannot be retrieved
"""
try:
context = get_context()
return {"context": context}
except ClusterError:
raise
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_context_info() function at lines 159–173 is a thin, unused wrapper around get_context() that adds no value. It is not imported or called anywhere in the codebase, and its try/except/raise is a no-op. This dead code should be removed.

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +175
@@ -179,22 +170,9 @@ def _cleanup_test_namespace(namespace: str, timeout: int = 60) -> None:
ClusterError: If namespace deletion fails
"""
try:
info(f" Cleaning up namespace: {namespace}")

# Delete namespace
run(
["kubectl", "delete", "namespace", namespace, "--timeout", f"{timeout}s"],
check=True,
)

success(f"Namespace deleted: {namespace}")

except ShellError as exc:
# Don't fail if namespace doesn't exist
if "not found" in str(exc):
warning(f" Namespace {namespace} not found (may already be deleted)")
else:
raise ClusterError(f"Failed to delete namespace {namespace}: {exc}") from exc
delete_namespace(namespace, timeout)
except ClusterError:
raise
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _create_test_namespace() and _cleanup_test_namespace() functions both now consist solely of a try: <delegate>() / except ClusterError: raise pattern. This is completely transparent and adds no value — it is equivalent to just calling create_namespace() or delete_namespace() directly without the try/except wrapper. These wrappers can be simplified to a direct call or eliminated entirely.

Copilot uses AI. Check for mistakes.
)
run(["kubectl", "apply", "-f", str(self.manifest_path)], check=True)
info(f"Applied manifest for {comp_name}")
apply_manifest(self.manifest_path)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bootstrap_hook() in _base.py now propagates ClusterError from apply_manifest() when the manifest installation method is CONTAINER_MANIFEST. The rest of the component lifecycle hooks in this file (and throughout the codebase) only raise ComponentError. This breaks the expected error contract for the hook interface. The apply_manifest() call should either catch ClusterError and re-raise it as ComponentError, or the docstring for bootstrap_hook should be updated to document that it may also raise ClusterError.

Copilot uses AI. Check for mistakes.
pass


def _verify_cluster_health(timeout: int) -> None:
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except (ClusterError, Exception) pattern is redundant: ClusterError is a subclass of Exception, so listing it separately has no effect. The catch clause is equivalent to except Exception. You should either use just except Exception (to suppress all errors, as the old code did), or keep except ClusterError only if the intent is to only handle cluster-related errors — but currently it silently swallows all other exceptions too because of Exception. The same issue is repeated on line 70.

Copilot uses AI. Check for mistakes.
Comment on lines +176 to +198
def get_nodes(wide: bool = False, namespace: str = "") -> str:
"""
Get nodes information.

Args:
wide: Return wide output (includes internal IP, kernel version, etc.)
namespace: Namespace filter (unused for nodes, included for API consistency)

Returns:
Node information as string

Raises:
ClusterError: If node info cannot be retrieved
"""
try:
cmd = ["kubectl", "get", "nodes"]
if wide:
cmd.append("-o")
cmd.append("wide")
result = run(cmd, check=True, capture_output=True, text=True)
return result.stdout
except ShellError as exc:
raise ClusterError(f"Failed to retrieve nodes: {exc}") from exc
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace parameter in get_nodes() is explicitly documented as "unused for nodes, included for API consistency," but accepting a parameter that is never used is a misleading API design that could confuse callers into thinking it has an effect. If the intention is future use or API consistency, this should at least be noted in the docstring or the parameter should be removed until it is actually used.

Copilot uses AI. Check for mistakes.
Comment on lines +319 to +333

Raises:
ClusterError: If logs cannot be retrieved (other than missing logs)
"""
try:
result = run(
["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"],
check=False,
capture_output=True,
text=True,
)
# Non-zero exit is OK if pod has no logs; return empty
return result.stdout
except ShellError as exc:
raise ClusterError(f"Failed to retrieve logs for {namespace}/{pod_name}: {exc}") from exc
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except ShellError as exc block in get_pod_logs (lines 332–333) is unreachable dead code. run() only raises ShellError when check=True, but this call uses check=False. The function will always return result.stdout regardless of the kubectl exit code, and the exception handler can never be triggered. Either set check=True and handle failures meaningfully, or remove the dead except block and the ShellError import if not used elsewhere in the file.

Suggested change
Raises:
ClusterError: If logs cannot be retrieved (other than missing logs)
"""
try:
result = run(
["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"],
check=False,
capture_output=True,
text=True,
)
# Non-zero exit is OK if pod has no logs; return empty
return result.stdout
except ShellError as exc:
raise ClusterError(f"Failed to retrieve logs for {namespace}/{pod_name}: {exc}") from exc
"""
result = run(
["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"],
check=False,
capture_output=True,
text=True,
)
# Non-zero exit is OK if pod has no logs; return empty
return result.stdout

Copilot uses AI. Check for mistakes.
@addyess addyess force-pushed the chore/rollup-kubectl-to-py-client branch from 843bbe0 to 6fd1650 Compare March 5, 2026 15:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/kube_galaxy/pkg/testing/spread.py:130

  • In _verify_test_prerequisites, the except ShellError handler no longer catches failures from verify_connectivity(), which now raises ClusterError instead of ShellError. If the cluster is unreachable, the ClusterError propagates uncaught past this handler, bypasses the "Test prerequisites not met" error message, and gets double-wrapped by run_spread_tests's except Exception as "Test execution failed: Failed to connect to cluster: ...".

To fix this, either add ClusterError to the except clause (except (ShellError, ClusterError) as exc), or let the ClusterError propagate naturally and remove the except ShellError wrapper (since connectivity errors are already descriptive).

        verify_connectivity()

        # Check for spread
        info("Verifying spread test framework...")
        result = run(["which", "spread"], check=True, capture_output=True)
        spread_path = result.stdout.strip()
        success(f"Found spread at {spread_path}")

        # Check for lxclient (required by spread)
        info("Verifying lxclient (required by spread)...")
        result = run(["which", "lxc"], check=True, capture_output=True)
        lxclient_path = result.stdout.strip()
        success(f"Found lxclient at {lxclient_path}")

    except ShellError as exc:
        raise ClusterError("Test prerequisites not met") from exc

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 116 to 118
except ClusterError:
# Pod might not have logs, skip
pass
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except ClusterError handler at this line is dead code. get_pod_logs() is implemented with check=False and never raises any exception — it always returns a string. The comment "Pod might not have logs, skip" is also misleading, since that case is already handled inside get_pod_logs itself by returning an empty string. This try/except block can be removed entirely.

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +175
@@ -179,22 +170,9 @@ def _cleanup_test_namespace(namespace: str, timeout: int = 60) -> None:
ClusterError: If namespace deletion fails
"""
try:
info(f" Cleaning up namespace: {namespace}")

# Delete namespace
run(
["kubectl", "delete", "namespace", namespace, "--timeout", f"{timeout}s"],
check=True,
)

success(f"Namespace deleted: {namespace}")

except ShellError as exc:
# Don't fail if namespace doesn't exist
if "not found" in str(exc):
warning(f" Namespace {namespace} not found (may already be deleted)")
else:
raise ClusterError(f"Failed to delete namespace {namespace}: {exc}") from exc
delete_namespace(namespace, timeout)
except ClusterError:
raise
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/except ClusterError: raise pattern in _create_test_namespace and _cleanup_test_namespace is redundant — catching an exception only to immediately re-raise it is a no-op. These try/except blocks can be removed entirely, letting the ClusterError propagate naturally to callers.

Copilot uses AI. Check for mistakes.
@addyess addyess force-pushed the chore/rollup-kubectl-to-py-client branch from 85f5466 to 4b55b22 Compare March 5, 2026 16:01
@addyess addyess merged commit 029202e into project-bootstrap Mar 5, 2026
7 checks passed
@addyess addyess deleted the chore/rollup-kubectl-to-py-client branch March 5, 2026 16:30
addyess added a commit that referenced this pull request Mar 5, 2026
* Bootstrap project

* Test `single-node-no-cni` manifest in PR (#1)

Scaffold out the project to create kubeadm clusters on PRs and verify their installation. 
* Missing custom installation of kubernetes container services
* Missing implementation of spread tests

* Reduce sudo usage and exec calls (#20)

* Install cni-plugins before kubelet starts (#21)

* Understand why gh runners don't need cni-plugins

* Create component for cni-plugins

* Remove early debug

* handle all installation methods

* Address review comments

* Import images via pull or import (#22)

* Begin to implement image pull

* Adjust hooks to not need arch passed

* Bind kubeadm bootstrap to pre-pulled images defined by the manifest

* Address review comments

* Dynamic hook execution

* Update src/kube_galaxy/pkg/components/kubeadm.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review comments

* Golf

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Remove unused fields from manifest and components (#23)

* Tidy up more of the components (#24)

* Tidy up more of the components

* Review comment

* Parallelize image pulls and install (#25)

* First shot deploying with a calico CNI (#26)

* First shots deploying with a calico CNI

* Validate manifest applications

* Validate kube-system is deployed

* Update tests/unit/components/test_container_manifest.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply review comments

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Prepare for testing with spread (#28)

* Prepare for testing with spread

* Ready CNCF validation

* Prepared a Sonobouy component

* Kubeadm bootstraps with no node taints

* Review comments

* Single out the kube-galaxy tasks from each component's spread suite.

* Update implementation to match documentation (#30)

* Rollup kubectl commands to a common library (#31)

* Rollup kubectl commands to a common library

* Apply review comments

* Additional unit tests

* Apply review comments

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants