-
Notifications
You must be signed in to change notification settings - Fork 277
🐛 Fix panic when OpenStackCluster.Status.Network is nil in HCP scenarios #2635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Bharath Nallapeta <[email protected]>
Signed-off-by: Bharath Nallapeta <[email protected]>
Signed-off-by: Bharath Nallapeta <[email protected]>
Signed-off-by: Bharath Nallapeta <[email protected]>
Signed-off-by: Bharath Nallapeta <[email protected]>
Signed-off-by: Bharath Nallapeta <[email protected]>
Signed-off-by: Bharath Nallapeta <[email protected]>
…e/data/ccm/cloud-controller-manager.yaml to use cloud.conf with [Global] and [LoadBalancer] sections, addressing "expected section header" error. - Modified test/e2e/suites/hcp/hcp_helpers.go to align with HCP test setup. - Updated Makefile to support HCP test execution. - Ensured control plane provider is named "kubeadm" to avoid "invalid config: control-plane-provider should be named kubeadm" error for Kamaji (v0.15.3).
Signed-off-by: Bharath Nallapeta <[email protected]>
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @bnallapeta. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Marking it as Draft as we are working on the e2e tests. This is the approach being taken for e2e. Let me know your thoughts.
Right now, we are somewhere in the 3rd/4th step and are facing the below challenges: First, when we specify our Kamaji as a control plane provider in e2e_conf.yaml and name it as kamaji this error pops up:
Second, if we try to bypass and name this kamaji as kubeadm, kamaji doesn't get installed as such and later on, we face issues with finding Kamaji CRDs
Would really appreciate your help on this to move forward. cc @orkhan-os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate this is just a draft, but I had a quick look over it anyway.
var defaultNetworkID string | ||
if openStackCluster.Status.Network != nil { | ||
defaultNetworkID = openStackCluster.Status.Network.ID | ||
} | ||
|
||
// If no cluster network is available AND the machine spec did not define any ports with a network, we cannot choose a network. | ||
if defaultNetworkID == "" && len(openStackMachine.Spec.Ports) == 0 { | ||
return nil, capoerrors.Terminal(infrav1.InvalidMachineSpecReason, "no network configured: cluster network is missing and machine spec does not define ports with a network") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I feel like this splits this logic across this function and openStackMachineSpecToOpenStackServerSpec
. Did you consider putting this logic in openStackMachineSpecToOpenStackServerSpec
and modifying its signature to return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
defaultNetID := "" | ||
if openStackCluster.Status.Network != nil { | ||
defaultNetID = openStackCluster.Status.Network.ID | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More evidence of what I was saying above: this duplicates part of the functionality in the test. If we did this in openStackMachineSpecToOpenStackServerSpec
we could just test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Handle HTTP 409 (SecurityGroupRuleExists) as success - the rule already exists | ||
if strings.Contains(err.Error(), "SecurityGroupRuleExists") || strings.Contains(err.Error(), "already exists") { | ||
s.scope.Logger().V(4).Info("Security group rule already exists, treating as success", "description", r.Description, "securityGroupID", securityGroupID) | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unrelated? Separate PR, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across this issue while I was testing the PR. It won't work without this change.
But sure, I can open another one and then rebase this later on.
Signed-off-by: Bharath Nallapeta <[email protected]>
What this PR does / why we need it:
Fixes a nil pointer panic in OpenStackMachineReconciler when OpenStackCluster.Status.Network is nil, which occurs in Hosted Control Plane scenarios. The controller now gracefully handles missing cluster network by checking for nil before access and returning a terminal error instead of panicking. Also adds comprehensive HCP E2E test suite.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2380
TODOs:
/hold