Skip to content

feat: add support for gke clusterclass #1442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

salasberryfin
Copy link
Contributor

@salasberryfin salasberryfin commented Mar 11, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds support for managed clusters (GKE) provisioning using ClusterClass, implementing the template types required to satisfy the CAPI contract.

New templates for GKE cluster provisioning via ClusterClass are added to ./templates:

  • Standard GKE cluster: ./templates/cluster-template-gke-clusterclass.yaml and ./templates/cluster-template-gke-topology.yaml
  • Autopilot cluster: ./templates/cluster-template-gke-autopilot-clusterclass.yaml and ./templates/cluster-template-gke-autopilot-topology

I've tried these two templates with the current CAPI and CAPG and both clusters are provisioned successfully.

The CAPI contract for using ClusterClass requires that a bootstrap object is passed in the infrastructure definition, which is a problem when provisioning managed clusters, which do not need cluster bootstrapping. Other providers have used different alternatives to bypass this requirement: CAPZ does reference Kubeadm (which then creates a dependency on the provider) and CAPA implements its own version of a bootstrap provider for EKS. I think the simplest solution is to create a "semi-dummy" GKEConfig controller that will report the Ready status to core CAPI when GKEMachinePools are available. This controller does not apply changes other than reporting the readiness of the cluster's infrastructure. When using GKE Autopilot there's no need to reference a bootstrap object as underlying infrastructure is completely managed by GCP, as can be seen in the new templates.

Pending: adding e2e tests

Existing GKE E2E tests are not running because of issues with GCP permissions which we're trying to get fixed here. There is a separate PR to enable these tests on CI. Considering GKE is still an experimental feature of CAPG, I'd suggest we try to get this feature added and, in parallel, work on having E2E permissions fixed. Specially considering that this adds lots of lines of new code, having to keep this PR up to date for a long time may become quite complex.

Which issue(s) this PR fixes:
Fixes #1387

Special notes for your reviewer:

Changes affect the experimental GKE feature which hopefully helps with publishing changes in the API.

NOTE: After bumping to CAPI v1.10 this PR required a number of changes. Since some of these have already been applied in existing GKE (non-clusterclass) logic, there was some overlap of code and I've switched some methods to functions so that they can be reused across different resources (i.e. GCPManagedMachinePool and GCPManagedMachinePoolTemplate have equivalent ValidateUpdate logic and it would not make sense to maintain separate code for each resource). Some of the changes also apply to using individual validation of fields in the specification, rather than doing bulk validation of multiple fields at the same time: this means the code is more verbose but I consider it is overall more clear and definitely makes it more reusable and scalable down the line.

Follow-ups:

  • Add documentation to book.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Experimental feature GKE now supports provisioning via ClusterClass

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 11, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 11, 2025
Copy link

netlify bot commented Mar 11, 2025

Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Name Link
🔨 Latest commit 013bc4c
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-cluster-api-gcp/deploys/688a30ed84daad0008220619
😎 Deploy Preview https://deploy-preview-1442--kubernetes-sigs-cluster-api-gcp.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@salasberryfin salasberryfin force-pushed the gke-clusterclass-support branch 3 times, most recently from 29808fb to 93c4e3d Compare March 13, 2025 10:02
@salasberryfin salasberryfin changed the title WIP: feat: add support for gke clusterclass feat: add support for gke clusterclass Mar 13, 2025
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 13, 2025
@salasberryfin salasberryfin marked this pull request as ready for review March 13, 2025 10:06
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2025
@k8s-ci-robot k8s-ci-robot requested a review from damdo March 13, 2025 10:06
@salasberryfin salasberryfin force-pushed the gke-clusterclass-support branch 6 times, most recently from 833f5a4 to 8def4a9 Compare March 13, 2025 14:16
@salasberryfin salasberryfin changed the title feat: add support for gke clusterclass WIP: feat: add support for gke clusterclass Mar 13, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2025
@salasberryfin salasberryfin force-pushed the gke-clusterclass-support branch 3 times, most recently from 7856476 to 36977fa Compare March 17, 2025 12:20
@salasberryfin salasberryfin changed the title WIP: feat: add support for gke clusterclass feat: add support for gke clusterclass Mar 18, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 8, 2025
@salasberryfin
Copy link
Contributor Author

/retest

@salasberryfin salasberryfin changed the title feat: add support for gke clusterclass WIP: feat: add support for gke clusterclass Jul 14, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2025
@salasberryfin salasberryfin force-pushed the gke-clusterclass-support branch from 1ebac57 to 3cc41e5 Compare July 16, 2025 08:09
@salasberryfin
Copy link
Contributor Author

/retest

1 similar comment
@salasberryfin
Copy link
Contributor Author

/retest

@salasberryfin
Copy link
Contributor Author

@salasberryfin salasberryfin changed the title WIP: feat: add support for gke clusterclass feat: add support for gke clusterclass Jul 22, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 22, 2025
Copy link
Contributor

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for working on this. This PR has been open for while, in my opinion we should merge it as is and improve in follow up PRs if needed. The feature is experimental anyway.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexander-demicev, salasberryfin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alexander-demicev
Copy link
Contributor

/remove-lifecycle-stale

@salasberryfin
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2025
@salasberryfin salasberryfin force-pushed the gke-clusterclass-support branch from 3cc41e5 to 04bf0da Compare July 30, 2025 08:29
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 30, 2025
@salasberryfin
Copy link
Contributor Author

/retest

@salasberryfin salasberryfin force-pushed the gke-clusterclass-support branch from 04bf0da to 7211e08 Compare July 30, 2025 09:00
@salasberryfin salasberryfin force-pushed the gke-clusterclass-support branch from 15e4c60 to c11ca9e Compare July 30, 2025 14:48
@alexander-demicev
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2025
@salasberryfin
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 11, 2025
@k8s-ci-robot k8s-ci-robot merged commit 3f6ed14 into kubernetes-sigs:main Aug 11, 2025
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Additional role to service account used by CAPG e2e tests GKE ClusterClass support
4 participants