-
Couldn't load subscription status.
- Fork 1.4k
Description
CAPI currently rounds down (math.Floor) the number of machines calculated from a percentage value (set by a parameter to GetScaledValueFromIntOrPercent):
func getMaxUnhealthy(mhc *clusterv1.MachineHealthCheck) (int, error) {
maxUnhealthy, err := intstr.GetScaledValueFromIntOrPercent(ptr.To(ptr.Deref(mhc.Spec.Remediation.TriggerIf.UnhealthyLessThanOrEqualTo, defaultMaxUnhealthy)), int(ptr.Deref[int32](mhc.Status.ExpectedMachines, 0)), false)
if err != nil {
return 0, err
}
return maxUnhealthy, nil
}
So if I set 20% and have 3 machines, the controller determines "maxUnhealthy = floor(3 * 20%) = 0 machines" and therefore won't allow any remediation, as shown in the conditions:
v1beta2:
conditions:
- lastTransitionTime: "2025-10-28T11:18:13Z"
message: 'Remediation is not allowed, the number of not started or unhealthy
machines exceeds maxUnhealthy (total: 3, unhealthy: 1, maxUnhealthy: 20%)'
observedGeneration: 2
reason: TooManyUnhealthy
status: "False"
type: RemediationAllowed
I'd argue that by setting a positive percentage value here, as a user, I still intend to see remediation happening even if there are 5 machines or less. Particularly in a MachinePool scenario (see remediation support PR), where the number of machines may go up and down, I can't really set a fixed number (compared to a percentage). Choosing one percentage value as reasonable default would be nicer, but only if that still does the remediation.
Rounding up solves this, or one could always allow a minimum of 1 machine if a positive percentage/numeric value is configured.
/area machinepool