Skip to content

Conversation

@bnallapeta
Copy link
Contributor

@bnallapeta bnallapeta commented Nov 14, 2025

What this PR does / why we need it:
When OpenStack's soft-delete is enabled, deleted servers enter SOFT_DELETED state instead of being immediately purged, causing CAPO's deletion poll to timeout and stall cluster cleanup.
This PR updates the deletion poll to treat SOFT_DELETED (and DELETED) as success, allowing reconciliation to proceed while respecting OpenStack's reclaim policy.

** Special Notes **

Design Approach
Updated DeleteInstance to treat servers in SOFT_DELETED or DELETED state as successfully deleted. This allows cluster deletion to complete when OpenStack has soft delete enabled, while respecting the cloud admin's recovery policy. CAPO proceeds with cleanup, and OpenStack handles permanent deletion per its configured reclaim interval.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2618

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 14, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 14, 2025
@netlify
Copy link

netlify bot commented Nov 14, 2025

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 7121d6f
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-cluster-api-openstack/deploys/6916ab8f6a7d4c0009cb5264
😎 Deploy Preview https://deploy-preview-2834--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@bnallapeta bnallapeta changed the title fix: Handle SOFT_DELETED and DELETED states in server deletion 🐛 Handle SOFT_DELETED and DELETED states in server deletion Nov 14, 2025
Copy link
Contributor

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this allows DeleteInstance() to complete when OpenStack is configured with soft-deletion, I'm a bit unclear what happens for all the resources the server depends on (volumes, trunk ports). Are we leaking them?
Also, what happens when the server is brought back to life?

Were you able to check this change against an OpenStack cloud configured with soft-deletion?

@bnallapeta
Copy link
Contributor Author

bnallapeta commented Nov 17, 2025

@mandre

I'm a bit unclear what happens for all the resources the server depends on (volumes, trunk ports). Are we leaking them?

When CAPO creates servers with volumes, it sets DeleteOnTermination=true. So, when the server is deleted, OpenStack automatically deletes the volumes. This is not on CAPO.
Ports and trunks are explicitly cleaned up by CAPO after DeleteInstance returns, regardless of soft delete. So, behavior isn't changed with this change.

Also, what happens when the server is brought back to life?

Afaik, once CAPO is done with its cycle, it removes all the labels, finalizers etc and thus, from CAPO's perspective, the server becomes an orphaned resource. Even if it is brought back later, it won't be associated with CAPO. (@lentzi90 could you please chime in here?)

Were you able to check this change against an OpenStack cloud configured with soft-deletion?

No.

@mandre
Copy link
Contributor

mandre commented Nov 17, 2025

@mandre

I'm a bit unclear what happens for all the resources the server depends on (volumes, trunk ports). Are we leaking them?

When CAPO creates servers with volumes, it sets DeleteOnTermination=true. So, when the server is deleted, OpenStack automatically deletes the volumes. This is not on CAPO. Ports and trunks are explicitly cleaned up by CAPO after DeleteInstance returns, regardless of soft delete. So, behavior isn't changed with this change.

However, because the server is still there, I expect the port cleanup to fail, so perhaps we're creating new problems with this change, leading to resource leaks. It would be good to double check against a real environment. Or are we simply considering the server and associated resources as "not CAPO's problem anymore?". In which case, I think it would deserve being stated explicitly in the docs, because it would then be the user's responsibility to clean the resources after the soft-deletion period ended.

@bnallapeta
Copy link
Contributor Author

@mandre let me test this on a real env and get back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Inbox

Development

Successfully merging this pull request may close these issues.

Can not delete k8s cluster when OpenStack’s soft delete feature is enabled

3 participants