Skip to content

Conversation

bnallapeta
Copy link
Contributor

What this PR does / why we need it:
Fixes a nil pointer panic in OpenStackMachineReconciler when OpenStackCluster.Status.Network is nil, which occurs in Hosted Control Plane scenarios. The controller now gracefully handles missing cluster network by checking for nil before access and returning a terminal error instead of panicking. Also adds comprehensive HCP E2E test suite.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2380

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 3, 2025
@k8s-ci-robot k8s-ci-robot requested a review from lentzi90 August 3, 2025 05:22
Copy link

netlify bot commented Aug 3, 2025

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit c07dc18
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-cluster-api-openstack/deploys/68a8830147009a0008d569b9
😎 Deploy Preview https://deploy-preview-2635--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot requested a review from mdbooth August 3, 2025 05:22
@bnallapeta bnallapeta marked this pull request as draft August 3, 2025 05:22
Copy link

linux-foundation-easycla bot commented Aug 3, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 3, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @bnallapeta. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 3, 2025
@bnallapeta
Copy link
Contributor Author

@EmilienM @mdbooth @lentzi90

Marking it as Draft as we are working on the e2e tests. This is the approach being taken for e2e. Let me know your thoughts.

  1. Spawn up a kind cluster -> turn it into a management cluster by installing CAPI/CAPO on it
  2. Deploy a k8s cluster on OpenStack using this kind cluster. Let's call this Cluster A
  3. Turn Cluster A into management cluster by installing CAPI/CAPO on it
  4. Deploy a k8s cluster in the hosted control plane method using kamaji provider using cluster A. Let's call this Cluster B
  5. Run the actual e2e tests to test out the panic case on Cluster B

Right now, we are somewhere in the 3rd/4th step and are facing the below challenges:

First, when we specify our Kamaji as a control plane provider in e2e_conf.yaml and name it as kamaji this error pops up:

[FAILED] The e2e test config file is not valid
Expected success, but got an error:
    <*errors.fundamental | 0xc000c00228>:
    invalid argument: invalid config: control-plane-provider should be named kubeadm

Second, if we try to bypass and name this kamaji as kubeadm, kamaji doesn't get installed as such and later on, we face issues with finding Kamaji CRDs

[FAILED] in [It] - /root/cluster-api-provider-openstack/test/e2e/suites/hcp/hcp_helpers.go:148 @ 07/31/25 11:01:25.32 < Exit [It] should create and manage HCP-capable cluster @ 07/31/25 11:01:25.32 (7m12.465s) << Timeline [FAILED] Timed out after 180.000s. Expected success, but got an error: <*apiutil.ErrResourceDiscoveryFailed | 0xc000123428>: unable to retrieve the complete list of server APIs: [kamaji.clastix.io/v1alpha1](http://kamaji.clastix.io/v1alpha1): no matches for [kamaji.clastix.io/v1alpha1](http://kamaji.clastix.io/v1alpha1), Resource= { { Group: "[kamaji.clastix.io](http://kamaji.clastix.io/)", Version: "v1alpha1", }: <*meta.NoResourceMatchError | 0xc0012143c0>{ PartialResource: { Group: "[kamaji.clastix.io](http://kamaji.clastix.io/)", Version: "v1alpha1", Resource: "", }, }, } In [It] at: /root/cluster-api-provider-openstack/test/e2e/suites/hcp/hcp_helpers.go:148 @ 07/31/25 11:01:25.32

Would really appreciate your help on this to move forward.

cc @orkhan-os

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate this is just a draft, but I had a quick look over it anyway.

@bnallapeta bnallapeta force-pushed the hcp-2380 branch 2 times, most recently from 3321655 to 0ab4e3c Compare August 4, 2025 14:45
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 14, 2025
@bnallapeta bnallapeta marked this pull request as ready for review August 14, 2025 06:23
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 14, 2025
@k8s-ci-robot k8s-ci-robot requested a review from EmilienM August 14, 2025 06:23
@lentzi90
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 18, 2025
Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change lgtm. Just some weirdness in the tests which needs clearing up.

Copy link
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that this changes the behavior for security groups on OpenStackMachines? We cannot sneak in that change as a bug fix (🐛 ). If we want to change it (and I am not sure we do), the PR has to be marked as a breaking change (⚠️ ).

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 21, 2025
Copy link
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me know. Added some comments for the documentation

@bnallapeta
Copy link
Contributor Author

@lentzi90 addressed all the comments on docs. Just one conversation yet to be resolved.

Also, with this commit, I updated all references of tenant cluster to workload cluster to stay in touch with CAPI/CAPO nomenclature.

Signed-off-by: Bharath Nallapeta <[email protected]>

addressed PR comments on secGroups

Signed-off-by: Bharath Nallapeta <[email protected]>
@bnallapeta bnallapeta force-pushed the hcp-2380 branch 2 times, most recently from bd01438 to cefcbae Compare August 22, 2025 12:31
Signed-off-by: Bharath Nallapeta <[email protected]>

Signed-off-by: Bharath Nallapeta <[email protected]>

Signed-off-by: Bharath Nallapeta <[email protected]>
Copy link
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this now. @mdbooth do you have time to take another look also?
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lentzi90

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Inbox
Development

Successfully merging this pull request may close these issues.

Panic in OpenStackMachineReconciler if OpenStackCluster.Status.Network is nil (Hosted Control Plane scenario)
4 participants