Skip to content

Revert "PVC Unbinder: do not clear claimRef"#914

Merged
chrisseto merged 1 commit intomainfrom
chris/p/revert-pvc-unbinder-do-not
Jun 13, 2025
Merged

Revert "PVC Unbinder: do not clear claimRef"#914
chrisseto merged 1 commit intomainfrom
chris/p/revert-pvc-unbinder-do-not

Conversation

@chrisseto
Copy link
Contributor

This reverts commit a89e202 and adds the CLI flag --allow-pv-rebinding to opt into the unreversed functionality.

A recent incident hit an exceptional edge case where all redpanda Pods where temporarily stuck in "Pending". Their underlying Nodes were marked as "NotReady" which caused the TaintManager to evict said Pods and prevented them from being rescheduled. The unbinder functioned as expected but due to the reverted commit the still existing PVs were not eligible for rebinding. As a result, a new cluster was formed.

The original motivation for the reverted commit was due to an edge case where a Node's name was reused in EKS due to using Node IPs as their name. Our cloud has since begun using the instance ID as the Node name. To remain on the conservative side, we've opted to make this functionality opt in.

Copy link
Contributor

@andrewstucki andrewstucki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm. a couple nits and a question about error handling behavior

// This strategy is only valid for volumes that utilize .HostPath or .Local.
func (r *Controller) maybeRecyclePersistentVolume(ctx context.Context, pv *corev1.PersistentVolume) error {
if pv.Spec.HostPath == nil && pv.Spec.Local == nil {
return fmt.Errorf("%T must specify .Spec.HostPath or .Spec.Local for recycling: %q", pv, pv.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this was reverted to how the code was before... but does it make sense to actually error here rather than just log and return? What happens if this is enabled and someone uses a different storage class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out! This is actually filtered out earlier in the process so we shouldn't ever hit this case but the controller on the whole likely doesn't handle such cases well, it's more so meant to not disrupt such cases rather than handle them intelligently. I'm going to add a TODO to denote as much.

This reverts commit a89e202 and adds
the CLI flag `--allow-pv-rebinding` to opt into the un-reverted
functionality.

A recent incident hit an exceptional edge case where all redpanda Pods where
temporarily stuck in "Pending". Their underlying Nodes were marked as
"NotReady" which caused the TaintManager to evict said Pods and prevented them
from being rescheduled. The unbinder functioned as expected but due to the
reverted commit the still existing PVs were not eligible for rebinding. As a
result, a new cluster was formed.

The original motivation for the reverted commit was due to an edge case where a
Node's name was reused in EKS due to using Node IPs as their name. Our cloud
has since begun using the instance ID as the Node name. To remain on the
conservative side, we've opted to make this functionality opt in.
@chrisseto chrisseto force-pushed the chris/p/revert-pvc-unbinder-do-not branch from 829cbee to 2923c3f Compare June 13, 2025 18:49
@chrisseto chrisseto enabled auto-merge (rebase) June 13, 2025 19:18
@chrisseto chrisseto merged commit 162ef60 into main Jun 13, 2025
10 checks passed
@chrisseto chrisseto deleted the chris/p/revert-pvc-unbinder-do-not branch June 13, 2025 19:34
@chrisseto
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
release/v25.1.x
release/v2.4.x
release/v2.3.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants