-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
What steps did you take and what happened?
I was switching a MachineDeployment from RollingUpdate strategy to OnDelete strategy. During this transition, the controller encountered a negative replica count calculation in the OnDelete rollout logic.
Specifically, at line 129 in machinedeployment_rollout_ondelete.go:
https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_rollout_ondelete.go#L129
The machineSetScaleDownAmountDueToMachineDeletion calculation resulted in -1, which then caused:
A log message: "Unexpected negative scale down amount"
The negative value being used in the replica calculation at line 132 (subtracting a negative number, effectively adding)
The controller to enter an infinite loop, continuously flipping the MachineSet replicas up and down
What did you expect to happen?
The controller should:
Never allow negative scale down amounts to propagate through the replica calculations
Handle edge cases gracefully without entering infinite reconciliation loops
Either:
Clamp the value to 0 if negative, or
Skip processing that MachineSet and continue to the next one
Cluster API version
1.8.12
Kubernetes version
1.30.11
Anything else you would like to add?
Proposed Solution:
Add a safeguard to prevent negative values from affecting replica counts:
if machineSetScaleDownAmountDueToMachineDeletion < 0 {
log.V(4).Error(errors.Errorf("Unexpected negative scale down amount: %d", machineSetScaleDownAmountDueToMachineDeletion), fmt.Sprintf("Error reconciling MachineSet %s", oldMS.Name))
machineSetScaleDownAmountDueToMachineDeletion = 0 // to keep it always positive
continue
}
This would ensure the replica count remains correct and prevents the infinite reconciliation loop.
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.