-
Notifications
You must be signed in to change notification settings - Fork 284
🐛 Handle SOFT_DELETED and DELETED states in server deletion #2834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Bharath Nallapeta <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
mandre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this allows DeleteInstance() to complete when OpenStack is configured with soft-deletion, I'm a bit unclear what happens for all the resources the server depends on (volumes, trunk ports). Are we leaking them?
Also, what happens when the server is brought back to life?
Were you able to check this change against an OpenStack cloud configured with soft-deletion?
When CAPO creates servers with volumes, it sets DeleteOnTermination=true. So, when the server is deleted, OpenStack automatically deletes the volumes. This is not on CAPO.
Afaik, once CAPO is done with its cycle, it removes all the labels, finalizers etc and thus, from CAPO's perspective, the server becomes an orphaned resource. Even if it is brought back later, it won't be associated with CAPO. (@lentzi90 could you please chime in here?)
No. |
However, because the server is still there, I expect the port cleanup to fail, so perhaps we're creating new problems with this change, leading to resource leaks. It would be good to double check against a real environment. Or are we simply considering the server and associated resources as "not CAPO's problem anymore?". In which case, I think it would deserve being stated explicitly in the docs, because it would then be the user's responsibility to clean the resources after the soft-deletion period ended. |
|
@mandre let me test this on a real env and get back. |
What this PR does / why we need it:
When OpenStack's soft-delete is enabled, deleted servers enter SOFT_DELETED state instead of being immediately purged, causing CAPO's deletion poll to timeout and stall cluster cleanup.
This PR updates the deletion poll to treat SOFT_DELETED (and DELETED) as success, allowing reconciliation to proceed while respecting OpenStack's reclaim policy.
** Special Notes **
Design Approach
Updated
DeleteInstanceto treat servers inSOFT_DELETEDorDELETEDstate as successfully deleted. This allows cluster deletion to complete when OpenStack has soft delete enabled, while respecting the cloud admin's recovery policy. CAPO proceeds with cleanup, and OpenStack handles permanent deletion per its configured reclaim interval.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #2618
TODOs:
/hold