Rollup kubectl commands to a common library#31
Conversation
e9cfe31 to
628acb2
Compare
There was a problem hiding this comment.
Pull request overview
This PR consolidates all inline kubectl subprocess calls scattered across cmd/, pkg/components/, and pkg/utils/ into a centralized client.py library, replacing raw run(["kubectl", ...]) and subprocess.run(["kubectl", ...]) calls with named, typed wrappers that uniformly raise ClusterError on failure.
Changes:
- New
src/kube_galaxy/pkg/utils/client.py: 412-line library encapsulating all kubectl operations (connectivity, nodes, pods, logs, events, namespaces, manifests). - Refactor
logs.py,spread.py,_base.py,kubeadm.py,cmd/test.py,cmd/status.py: Replace directrun(["kubectl", ...])/subprocess.run(...)calls with the new client library functions. - Update
tests/unit/components/test_container_manifest.py: Adjust thebootstrap_hooktest to patchapply_manifestinstead ofrun.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/kube_galaxy/pkg/utils/client.py |
New centralized kubectl client library |
src/kube_galaxy/pkg/utils/logs.py |
Use client library for all kubectl calls |
src/kube_galaxy/pkg/testing/spread.py |
Delegate namespace management to client library |
src/kube_galaxy/pkg/components/_base.py |
Use apply_manifest() in bootstrap_hook |
src/kube_galaxy/pkg/components/kubeadm.py |
Use client library in verify_hook |
src/kube_galaxy/cmd/test.py |
Use client library for connectivity/context checks |
src/kube_galaxy/cmd/status.py |
Use client library for node/context display |
tests/unit/components/test_container_manifest.py |
Updated test to patch apply_manifest |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
src/kube_galaxy/pkg/utils/client.py
Outdated
| def get_context_info() -> dict[str, str]: | ||
| """ | ||
| Get current context information. | ||
|
|
||
| Returns: | ||
| Dict with 'context' key containing current context name | ||
|
|
||
| Raises: | ||
| ClusterError: If context info cannot be retrieved | ||
| """ | ||
| try: | ||
| context = get_context() | ||
| return {"context": context} | ||
| except ClusterError: | ||
| raise |
There was a problem hiding this comment.
The get_context_info() function at lines 159–173 is a thin, unused wrapper around get_context() that adds no value. It is not imported or called anywhere in the codebase, and its try/except/raise is a no-op. This dead code should be removed.
| @@ -179,22 +170,9 @@ def _cleanup_test_namespace(namespace: str, timeout: int = 60) -> None: | |||
| ClusterError: If namespace deletion fails | |||
| """ | |||
| try: | |||
| info(f" Cleaning up namespace: {namespace}") | |||
|
|
|||
| # Delete namespace | |||
| run( | |||
| ["kubectl", "delete", "namespace", namespace, "--timeout", f"{timeout}s"], | |||
| check=True, | |||
| ) | |||
|
|
|||
| success(f"Namespace deleted: {namespace}") | |||
|
|
|||
| except ShellError as exc: | |||
| # Don't fail if namespace doesn't exist | |||
| if "not found" in str(exc): | |||
| warning(f" Namespace {namespace} not found (may already be deleted)") | |||
| else: | |||
| raise ClusterError(f"Failed to delete namespace {namespace}: {exc}") from exc | |||
| delete_namespace(namespace, timeout) | |||
| except ClusterError: | |||
| raise | |||
There was a problem hiding this comment.
The _create_test_namespace() and _cleanup_test_namespace() functions both now consist solely of a try: <delegate>() / except ClusterError: raise pattern. This is completely transparent and adds no value — it is equivalent to just calling create_namespace() or delete_namespace() directly without the try/except wrapper. These wrappers can be simplified to a direct call or eliminated entirely.
| ) | ||
| run(["kubectl", "apply", "-f", str(self.manifest_path)], check=True) | ||
| info(f"Applied manifest for {comp_name}") | ||
| apply_manifest(self.manifest_path) |
There was a problem hiding this comment.
The bootstrap_hook() in _base.py now propagates ClusterError from apply_manifest() when the manifest installation method is CONTAINER_MANIFEST. The rest of the component lifecycle hooks in this file (and throughout the codebase) only raise ComponentError. This breaks the expected error contract for the hook interface. The apply_manifest() call should either catch ClusterError and re-raise it as ComponentError, or the docstring for bootstrap_hook should be updated to document that it may also raise ClusterError.
| pass | ||
|
|
||
|
|
||
| def _verify_cluster_health(timeout: int) -> None: |
There was a problem hiding this comment.
The except (ClusterError, Exception) pattern is redundant: ClusterError is a subclass of Exception, so listing it separately has no effect. The catch clause is equivalent to except Exception. You should either use just except Exception (to suppress all errors, as the old code did), or keep except ClusterError only if the intent is to only handle cluster-related errors — but currently it silently swallows all other exceptions too because of Exception. The same issue is repeated on line 70.
src/kube_galaxy/pkg/utils/client.py
Outdated
| def get_nodes(wide: bool = False, namespace: str = "") -> str: | ||
| """ | ||
| Get nodes information. | ||
|
|
||
| Args: | ||
| wide: Return wide output (includes internal IP, kernel version, etc.) | ||
| namespace: Namespace filter (unused for nodes, included for API consistency) | ||
|
|
||
| Returns: | ||
| Node information as string | ||
|
|
||
| Raises: | ||
| ClusterError: If node info cannot be retrieved | ||
| """ | ||
| try: | ||
| cmd = ["kubectl", "get", "nodes"] | ||
| if wide: | ||
| cmd.append("-o") | ||
| cmd.append("wide") | ||
| result = run(cmd, check=True, capture_output=True, text=True) | ||
| return result.stdout | ||
| except ShellError as exc: | ||
| raise ClusterError(f"Failed to retrieve nodes: {exc}") from exc |
There was a problem hiding this comment.
The namespace parameter in get_nodes() is explicitly documented as "unused for nodes, included for API consistency," but accepting a parameter that is never used is a misleading API design that could confuse callers into thinking it has an effect. If the intention is future use or API consistency, this should at least be noted in the docstring or the parameter should be removed until it is actually used.
src/kube_galaxy/pkg/utils/client.py
Outdated
|
|
||
| Raises: | ||
| ClusterError: If logs cannot be retrieved (other than missing logs) | ||
| """ | ||
| try: | ||
| result = run( | ||
| ["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"], | ||
| check=False, | ||
| capture_output=True, | ||
| text=True, | ||
| ) | ||
| # Non-zero exit is OK if pod has no logs; return empty | ||
| return result.stdout | ||
| except ShellError as exc: | ||
| raise ClusterError(f"Failed to retrieve logs for {namespace}/{pod_name}: {exc}") from exc |
There was a problem hiding this comment.
The except ShellError as exc block in get_pod_logs (lines 332–333) is unreachable dead code. run() only raises ShellError when check=True, but this call uses check=False. The function will always return result.stdout regardless of the kubectl exit code, and the exception handler can never be triggered. Either set check=True and handle failures meaningfully, or remove the dead except block and the ShellError import if not used elsewhere in the file.
| Raises: | |
| ClusterError: If logs cannot be retrieved (other than missing logs) | |
| """ | |
| try: | |
| result = run( | |
| ["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"], | |
| check=False, | |
| capture_output=True, | |
| text=True, | |
| ) | |
| # Non-zero exit is OK if pod has no logs; return empty | |
| return result.stdout | |
| except ShellError as exc: | |
| raise ClusterError(f"Failed to retrieve logs for {namespace}/{pod_name}: {exc}") from exc | |
| """ | |
| result = run( | |
| ["kubectl", "logs", "-n", namespace, pod_name, f"--tail={tail}"], | |
| check=False, | |
| capture_output=True, | |
| text=True, | |
| ) | |
| # Non-zero exit is OK if pod has no logs; return empty | |
| return result.stdout |
843bbe0 to
6fd1650
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/kube_galaxy/pkg/testing/spread.py:130
- In
_verify_test_prerequisites, theexcept ShellErrorhandler no longer catches failures fromverify_connectivity(), which now raisesClusterErrorinstead ofShellError. If the cluster is unreachable, theClusterErrorpropagates uncaught past this handler, bypasses the "Test prerequisites not met" error message, and gets double-wrapped byrun_spread_tests'sexcept Exceptionas "Test execution failed: Failed to connect to cluster: ...".
To fix this, either add ClusterError to the except clause (except (ShellError, ClusterError) as exc), or let the ClusterError propagate naturally and remove the except ShellError wrapper (since connectivity errors are already descriptive).
verify_connectivity()
# Check for spread
info("Verifying spread test framework...")
result = run(["which", "spread"], check=True, capture_output=True)
spread_path = result.stdout.strip()
success(f"Found spread at {spread_path}")
# Check for lxclient (required by spread)
info("Verifying lxclient (required by spread)...")
result = run(["which", "lxc"], check=True, capture_output=True)
lxclient_path = result.stdout.strip()
success(f"Found lxclient at {lxclient_path}")
except ShellError as exc:
raise ClusterError("Test prerequisites not met") from exc
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
src/kube_galaxy/pkg/utils/logs.py
Outdated
| except ClusterError: | ||
| # Pod might not have logs, skip | ||
| pass |
There was a problem hiding this comment.
The except ClusterError handler at this line is dead code. get_pod_logs() is implemented with check=False and never raises any exception — it always returns a string. The comment "Pod might not have logs, skip" is also misleading, since that case is already handled inside get_pod_logs itself by returning an empty string. This try/except block can be removed entirely.
| @@ -179,22 +170,9 @@ def _cleanup_test_namespace(namespace: str, timeout: int = 60) -> None: | |||
| ClusterError: If namespace deletion fails | |||
| """ | |||
| try: | |||
| info(f" Cleaning up namespace: {namespace}") | |||
|
|
|||
| # Delete namespace | |||
| run( | |||
| ["kubectl", "delete", "namespace", namespace, "--timeout", f"{timeout}s"], | |||
| check=True, | |||
| ) | |||
|
|
|||
| success(f"Namespace deleted: {namespace}") | |||
|
|
|||
| except ShellError as exc: | |||
| # Don't fail if namespace doesn't exist | |||
| if "not found" in str(exc): | |||
| warning(f" Namespace {namespace} not found (may already be deleted)") | |||
| else: | |||
| raise ClusterError(f"Failed to delete namespace {namespace}: {exc}") from exc | |||
| delete_namespace(namespace, timeout) | |||
| except ClusterError: | |||
| raise | |||
There was a problem hiding this comment.
The try/except ClusterError: raise pattern in _create_test_namespace and _cleanup_test_namespace is redundant — catching an exception only to immediately re-raise it is a no-op. These try/except blocks can be removed entirely, letting the ClusterError propagate naturally to callers.
85f5466 to
4b55b22
Compare
* Bootstrap project * Test `single-node-no-cni` manifest in PR (#1) Scaffold out the project to create kubeadm clusters on PRs and verify their installation. * Missing custom installation of kubernetes container services * Missing implementation of spread tests * Reduce sudo usage and exec calls (#20) * Install cni-plugins before kubelet starts (#21) * Understand why gh runners don't need cni-plugins * Create component for cni-plugins * Remove early debug * handle all installation methods * Address review comments * Import images via pull or import (#22) * Begin to implement image pull * Adjust hooks to not need arch passed * Bind kubeadm bootstrap to pre-pulled images defined by the manifest * Address review comments * Dynamic hook execution * Update src/kube_galaxy/pkg/components/kubeadm.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Address review comments * Golf --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove unused fields from manifest and components (#23) * Tidy up more of the components (#24) * Tidy up more of the components * Review comment * Parallelize image pulls and install (#25) * First shot deploying with a calico CNI (#26) * First shots deploying with a calico CNI * Validate manifest applications * Validate kube-system is deployed * Update tests/unit/components/test_container_manifest.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply review comments --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Prepare for testing with spread (#28) * Prepare for testing with spread * Ready CNCF validation * Prepared a Sonobouy component * Kubeadm bootstraps with no node taints * Review comments * Single out the kube-galaxy tasks from each component's spread suite. * Update implementation to match documentation (#30) * Rollup kubectl commands to a common library (#31) * Rollup kubectl commands to a common library * Apply review comments * Additional unit tests * Apply review comments --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request refactors the codebase to centralize and standardize Kubernetes cluster interactions through the new
pkg.utils.clientmodule. The changes replace direct subprocess and shell command invocations with higher-level utility functions, improving error handling, code readability, and maintainability. The refactor touches multiple command handlers, component hooks, testing utilities, and log collection routines, ensuring consistent cluster connectivity checks and resource management.Cluster Interaction Refactor:
kubectlsubprocess calls with utility functions likeget_context,get_nodes,wait_for_nodes, andwait_for_podsincmd/status.py, improving error handling and code clarity. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbR7-R10),[[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbL42-R57),[[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-2161de7d44dc4ad3cb56311fb2b22fe13b5a4c04b85ecfe952b61d9edbad10dbL66-L113))cmd/test.pyto useverify_connectivityandget_contextfor cluster checks, and to catchClusterErrorfor robust error reporting. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080R11-R12),[[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080L25-R29),[[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-a58e62f3bb64636bf2280866ad40ce26ac38a9857f4b30bb5253ff7477838080L60-R46))pkg/components/_base.pyto useapply_manifestfor manifest application, replacing manual shell invocation. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-9284e1434d18989a884f9426dcc87d55b14738c34b8687d5dc275603915530e7R21),[[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-9284e1434d18989a884f9426dcc87d55b14738c34b8687d5dc275603915530e7L246-R247))pkg/components/kubeadm.pyto useverify_connectivity,wait_for_nodes, andget_api_server_statusfor cluster verification. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-7ba701716aa2bd4629dd7ccebe02c04ae5ec63da22cfb6fc2d36d40a01d84185R17-R21),[[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-7ba701716aa2bd4629dd7ccebe02c04ae5ec63da22cfb6fc2d36d40a01d84185L187-R194))Testing and Namespace Management:
pkg/testing/spread.pyto useverify_connectivity,create_namespace, anddelete_namespacefor test setup and teardown, replacing raw shell commands and improving error handling. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dR21),[[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL114-R115),[[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL151-R158),[[4]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-4413028033c6836f57da086070dc1e39412584e680a9b75c097c2436b301533dL182-R175))Log Collection Improvements:
pkg/utils/logs.pyto use utility functions for collecting cluster info, node data, pod logs, and events, replacing direct shell calls and improving error handling. ([[1]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL3-L9),[[2]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL59-R69),[[3]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL72-R87),[[4]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL103-R122),[[5]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL145-R134),[[6]](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-8b72a84731539cf9275f99127dc6c01f6a1873fb8c25245cad82ecba9109ec1eL163-R150))Testing Updates:
[tests/unit/components/test_container_manifest.pyL171-R183](https://github.com/canonical/kube-galaxy-test/pull/31/files#diff-dad66a03932e1d299909c2f5a1a18641d9742d878cdae0408e53d09aadfc5b2cL171-R183))