Skip to content

Migrate kubectl_apply_manifest module to helm#5282

Open
agrawalkhushi18 wants to merge 15 commits intoGoogleCloudPlatform:developfrom
agrawalkhushi18:apply_manifest_migration
Open

Migrate kubectl_apply_manifest module to helm#5282
agrawalkhushi18 wants to merge 15 commits intoGoogleCloudPlatform:developfrom
agrawalkhushi18:apply_manifest_migration

Conversation

@agrawalkhushi18
Copy link
Copy Markdown
Contributor

@agrawalkhushi18 agrawalkhushi18 commented Feb 25, 2026

This PR migrates the kubectl_apply_manifest module from the gavinbunney/kubectl provider to the official hashicorp/helm provider.

Key Changes

1. Re-implementation of kubectl_apply_manifests

  • Replaced the underlying module logic to use hashicorp/helm via a local raw-config-chart.

2. Enhanced Manifest Processing in locals

  • Relocated and consolidated all file-reading, templating, and directory-scanning logic from the submodule into the main module's locals block.
  • The Helm provider requires pure YAML string inputs. The logic pre-processes all input types (URLs, .tftpl templates, raw files, and directories) into a single rendered string before passing it to Helm.

3. Server-Side Apply (SSA) Deprecation

  • Removed server_side_apply support as it is no longer necessary. Few blueprints have been modified accordingly.
  • Helm stores release state in GZIP-compressed Secrets (limit ~1MB+), effectively bypassing the 256KB constraint of the kubectl.kubernetes.io/last-applied-configuration annotation that necessitated SSA for large CRDs.
  • Modified the kubectl-apply README.md according to the helm provider settings.

4. Release Naming Stability

  • Introduced a random_id resource to generate a unique 4-byte suffix for each Helm release (e.g., manifest-apply-ceab0dfc-0).
  • Added hashicorp/random to versions.tf to support conflict-free release naming.
  • This prevents namespace collisions when multiple modules (e.g., gke-cluster, gke-node-pool, and user workloads) instantiate kubectl-apply simultaneously within the same blueprint. The 4-byte length was chosen to guarantee uniqueness while staying comfortably within Kubernetes' 53-character name limit.

5. Atomic Operations

  • Enabled atomic = true by default on the Helm release.
  • Ensures that failed manifest applications (e.g., a typo or quota error) automatically trigger a rollback, preventing the cluster from being left in a "half-applied" zombie state.

NOTE:

  • The extra step to re-apply nccl-config and nri-device-injector manifest is removed from the test file because the gpu_direct.tf has been updated to fetch the latest manifests instead of an older commit hash which was causing the discrepancy in nccl tests.
  • The changes to the outputs.tf of slinky module resolves a deployment race condition where gke-nodeset Custom Resources were being applied before the required slinky Helm CRDs finished installing. By anchoring explicit depends_on constraints to the slinky module outputs, we force the Terraform DAG to correctly serialize both the creation and destruction phases of the deployment.

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @agrawalkhushi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the Kubernetes manifest application process by migrating from a custom kubectl provider to the official hashicorp/helm provider. This change enhances the reliability and stability of manifest deployments, simplifies the handling of various manifest sources, and introduces robust atomic operations with automatic rollbacks, leading to a more predictable and resilient cluster management experience.

Highlights

  • Re-implementation of kubectl_apply_manifests: The underlying module logic was replaced to use the hashicorp/helm provider via a local raw-config-chart, and the kubectl submodule and gavinbunney/kubectl provider dependency were entirely removed.
  • Enhanced Manifest Processing in locals: All file-reading, templating, and directory-scanning logic was relocated and consolidated into the main module's locals block. This pre-processes all input types (URLs, .tftpl templates, raw files, and directories) into a single rendered YAML string before passing it to Helm.
  • Server-Side Apply (SSA) Deprecation: Server-side apply support was removed as it is no longer necessary. Helm stores release state in GZIP-compressed Secrets, effectively bypassing the 256KB annotation constraint that previously necessitated SSA for large CRDs.
  • Release Naming Stability: A random_id resource was introduced to generate a unique 4-byte suffix for each Helm release, preventing namespace collisions when multiple modules instantiate kubectl-apply simultaneously within the same blueprint.
  • Atomic Operations: atomic = true was enabled by default on the Helm release, ensuring that failed manifest applications automatically trigger a rollback, preventing the cluster from being left in a 'half-applied' zombie state.
Changelog
  • community/examples/xpk-n2-filestore/xpk-n2-filestore.yaml
    • Removed the 'server_side_apply: true' flag from multiple manifest sources.
  • examples/gke-h4d/gke-h4d.yaml
    • Removed the 'server_side_apply: true' flag from the MPI operator manifest.
  • modules/management/kubectl-apply/README.md
    • Added the 'random' provider to the requirements and providers sections.
    • Updated the source path for the 'kubectl_apply_manifests' module from './kubectl' to './helm_install'.
    • Added 'random_id.release_suffix' to the resources table.
  • modules/management/kubectl-apply/main.tf
    • Refactored the 'processed_apply_manifests_map' local to consolidate various manifest sources into a single 'content' field.
    • Removed 'source', 'template_vars', and 'server_side_apply' from the 'processed_apply_manifests_map' output.
    • Introduced a 'random_id' resource to generate a unique suffix for Helm release names.
    • Changed the 'kubectl_apply_manifests' module's source from './kubectl' to './helm_install'.
    • Configured the 'kubectl_apply_manifests' module to use Helm release parameters, including 'release_name', 'chart_name', 'chart_version', 'namespace', 'atomic', 'wait', and 'values_yaml'.
  • modules/management/kubectl-apply/variables.tf
    • Removed the 'server_side_apply' attribute from the 'apply_manifests' input variable type definition.
  • modules/management/kubectl-apply/versions.tf
    • Added the 'hashicorp/random' provider with version '>= 2.1' to the required providers.
Activity
  • No specific activity (comments, reviews, progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@agrawalkhushi18 agrawalkhushi18 added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Feb 25, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the kubectl-apply module to use the hashicorp/helm provider instead of the gavinbunney/kubectl provider for applying manifests. This is a significant improvement that addresses limitations like the annotation size for large CRDs by leveraging Helm's release state storage. The introduction of unique release names using random_id and enabling atomic operations by default also enhances the robustness and reliability of applying manifests.

The review has identified a couple of areas for improvement. The logic for scanning directories of manifests is a bit too broad and could lead to errors if non-manifest files are present. Additionally, the namespace for Helm releases is hardcoded, which reduces the flexibility of the module compared to the previous implementation.

It's also worth noting that while the PR description states the gavinbunney/kubectl provider is removed entirely, it appears to still be in use by the install_asapd_lite part of this module. To fully complete the migration, this could also be updated to use the new Helm-based approach in a future change.

@agrawalkhushi18 agrawalkhushi18 added the release-breaking-changes Prevents "smooth" re-deploy across versions label Feb 25, 2026
@agrawalkhushi18 agrawalkhushi18 force-pushed the apply_manifest_migration branch from 5ae1f78 to ba70406 Compare March 18, 2026 03:44
@agrawalkhushi18 agrawalkhushi18 marked this pull request as ready for review March 20, 2026 06:50
@agrawalkhushi18 agrawalkhushi18 requested review from a team and samskillman as code owners March 20, 2026 06:50
Copy link
Copy Markdown
Contributor

@shubpal07 shubpal07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together! Migrating from raw kubectl apply to Helm is a great architectural move that will really improve state management and reliability for these manifests. Left a few inline comments.

Copy link
Copy Markdown
Contributor

@SwarnaBharathiMantena SwarnaBharathiMantena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the kubectl sub-folder going to be cleaned out or is there some other existing dependency on it?

@agrawalkhushi18
Copy link
Copy Markdown
Contributor Author

Is the kubectl sub-folder going to be cleaned out or is there some other existing dependency on it?

It can be cleaned out once the migration proves to be robust enough after all the kubectl_apply modules have been transitioned to use helm provider. We can monitor the tests for 1 month and then deprecate it.

@agrawalkhushi18 agrawalkhushi18 force-pushed the apply_manifest_migration branch from 77ecd95 to 7a6e360 Compare March 30, 2026 06:49
@agrawalkhushi18 agrawalkhushi18 changed the title Migrate kubectl_apply_manifest to helm Migrate kubectl_apply_manifest module to helm Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-breaking-changes Prevents "smooth" re-deploy across versions release-module-improvements Added to release notes under the "Module Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants