Revert "PVC Unbinder: do not clear claimRef"#914
Conversation
82fc119 to
829cbee
Compare
andrewstucki
left a comment
There was a problem hiding this comment.
overall lgtm. a couple nits and a question about error handling behavior
| // This strategy is only valid for volumes that utilize .HostPath or .Local. | ||
| func (r *Controller) maybeRecyclePersistentVolume(ctx context.Context, pv *corev1.PersistentVolume) error { | ||
| if pv.Spec.HostPath == nil && pv.Spec.Local == nil { | ||
| return fmt.Errorf("%T must specify .Spec.HostPath or .Spec.Local for recycling: %q", pv, pv.Name) |
There was a problem hiding this comment.
I'm assuming this was reverted to how the code was before... but does it make sense to actually error here rather than just log and return? What happens if this is enabled and someone uses a different storage class?
There was a problem hiding this comment.
Good call out! This is actually filtered out earlier in the process so we shouldn't ever hit this case but the controller on the whole likely doesn't handle such cases well, it's more so meant to not disrupt such cases rather than handle them intelligently. I'm going to add a TODO to denote as much.
This reverts commit a89e202 and adds the CLI flag `--allow-pv-rebinding` to opt into the un-reverted functionality. A recent incident hit an exceptional edge case where all redpanda Pods where temporarily stuck in "Pending". Their underlying Nodes were marked as "NotReady" which caused the TaintManager to evict said Pods and prevented them from being rescheduled. The unbinder functioned as expected but due to the reverted commit the still existing PVs were not eligible for rebinding. As a result, a new cluster was formed. The original motivation for the reverted commit was due to an edge case where a Node's name was reused in EKS due to using Node IPs as their name. Our cloud has since begun using the instance ID as the Node name. To remain on the conservative side, we've opted to make this functionality opt in.
829cbee to
2923c3f
Compare
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
This reverts commit a89e202 and adds the CLI flag
--allow-pv-rebindingto opt into the unreversed functionality.A recent incident hit an exceptional edge case where all redpanda Pods where temporarily stuck in "Pending". Their underlying Nodes were marked as "NotReady" which caused the TaintManager to evict said Pods and prevented them from being rescheduled. The unbinder functioned as expected but due to the reverted commit the still existing PVs were not eligible for rebinding. As a result, a new cluster was formed.
The original motivation for the reverted commit was due to an edge case where a Node's name was reused in EKS due to using Node IPs as their name. Our cloud has since begun using the instance ID as the Node name. To remain on the conservative side, we've opted to make this functionality opt in.