-
Notifications
You must be signed in to change notification settings - Fork 277
🐛 Improve handling of missing load balancer permissions #2629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Currently, when a user tries to create a cluster using OpenStack credentials which are missing the load balancer permissions, CAPO adds the finalized to the OpenStackCluster resource then fails to create the load balancer. When the user then tries to delete the cluster, CAPO makes a GET request to the Octavia API to get the load balancer details and receives a 403 (permission denied) response, so the only way to allow the cluster deletion to proceed is to manually remove the finalizer from the OpenStackCluster resource. This change prevents the above edge case by only attempting to delete the API server load balancer if the load balancer ID is populated in the OpenStackCluster's status field.
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @sd109! |
Hi @sd109. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Thank you for this PR. The change you've proposed to check the status field certainly fixes the immediate issue. However, there's a more robust way to solve this that avoids a potential race condition (where the LB is created but the status isn't updated). Instead of avoiding the call to delete the load balancer, I think it's better to make the This would handle the permission error gracefully, allowing cluster deletion to proceed without relying on the status field being in sync. Once that change is made, adding a unit test for that specific error handling logic would be the final step. Let me know what you think of this approach. |
Right now, CAPO only supports Octavia when a load balancer is being enabled via Matt had in mind a major rework for our load balancer support, having some sort of a way to define a load balancer provider (octavia, metallb, etc) but this hasn't been done. The reason why I don't want this patch is that you should not have an failure to delete a load balancer if openStackCluster.Spec.APIServerLoadBalancer.IsEnabled was initially set to enabled; simply because if you could have created it, you should be able to disable it. Let me know if I missed something, I'm happy to discuss further and refine here a fix that works for you. |
While I agree users shouldn't enable LB without proper permissions, OpenStack permission models can be complex (e.g., IAM policies that allow create but not delete, or permissions that change between cluster lifecycle events). A more robust solution would be to make |
@EmilienM agreed, but even with Octavia-enabled clouds we still see problems arise when other tools are used to wrap CAPO functionality, since users are not always explicitly aware that For example, when using the magnum-capi-helm driver a user might execute an We originally proposed a fix on the Magnum driver side here, perhaps the previous discussion over there will help to provide some additional context. |
/ok-to-test |
What this PR does / why we need it:
Currently, when a user tries to create a cluster using OpenStack credentials which are missing the load balancer permissions, CAPO adds the finalized to the OpenStackCluster resource then fails to create the load balancer. When the user then tries to delete the cluster, CAPO makes a GET request to the Octavia API to get the load balancer details and receives a 403 (permission denied) response, so the only way to allow the cluster deletion to proceed is to manually remove the finalizer from the OpenStackCluster resource.
This change prevents the above edge case by only attempting to delete the API server load balancer if the load balancer ID is populated in the OpenStackCluster's status field.
TODOs:
/hold