Skip to content

Fix scale-to-zero bug when spec.replicas is nil#515

Merged
otrosien merged 1 commit intomasterfrom
fix-empty-eds
Dec 16, 2025
Merged

Fix scale-to-zero bug when spec.replicas is nil#515
otrosien merged 1 commit intomasterfrom
fix-empty-eds

Conversation

@otrosien
Copy link
Member

This fixes a critical regression introduced in PR #511 (commit c53f570) where an EDS could scale to 0 replicas despite having minReplicas set.

Bug Scenario:

  1. kubectl patch clears spec.replicas to nil
  2. edsReplicas() returns 0 (per c53f570 change)
  3. scaleEDS() writes eds.Spec.Replicas = &0 at line 935
  4. Autoscaler returns no-op (e.g., excludeSystemIndices filters all indices)
  5. No-op means scalingOperation.NodeReplicas = nil
  6. Line 945 condition fails, so line 959 doesn't execute
  7. Result: spec.replicas stays at 0, violating minReplicas

Root Cause:
The c53f570 change made edsReplicas() return 0 for autoscaling-enabled EDS with nil replicas, intending to let the autoscaler calculate the initial value. However, when the autoscaler returns a no-op (nil NodeReplicas), there's no fallback to prevent spec.replicas from being set to 0.

Fix:
Add defensive check in scaleEDS() before writing to spec.replicas:

  • When currentReplicas == 0 and minReplicas > 0:
    • First try status.replicas (reflects actual StatefulSet state)
    • Fall back to minReplicas for new/uninitialized EDS
  • This prevents writing 0 when it would violate bounds

Test Coverage:

  • Added TestScaleToZeroPrevention with 3 test cases
  • Validates the defensive logic for all scenarios
  • All existing tests continue to pass

@otrosien otrosien added the bugfix Bug fixes and patches, e.g. fixing of a production issue that is affecting the customer experience. label Dec 12, 2025
This fixes a critical regression introduced in PR #511 (commit c53f570)
where an EDS could scale to 0 replicas despite having minReplicas set.

Bug Scenario:
1. kubectl patch clears spec.replicas to nil
2. edsReplicas() returns 0 (per c53f570 change)
3. scaleEDS() writes eds.Spec.Replicas = &0 at line 935
4. Autoscaler returns no-op (e.g., excludeSystemIndices filters all indices)
5. No-op means scalingOperation.NodeReplicas = nil
6. Line 945 condition fails, so line 959 doesn't execute
7. Result: spec.replicas stays at 0, violating minReplicas

Root Cause:
The c53f570 change made edsReplicas() return 0 for autoscaling-enabled EDS
with nil replicas, intending to let the autoscaler calculate the initial
value. However, when the autoscaler returns a no-op (nil NodeReplicas),
there's no fallback to prevent spec.replicas from being set to 0.

Fix:
Add defensive check in scaleEDS() before writing to spec.replicas:
- When currentReplicas == 0 and minReplicas > 0:
  - First try status.replicas (reflects actual StatefulSet state)
  - Fall back to minReplicas for new/uninitialized EDS
- This prevents writing 0 when it would violate bounds

Test Coverage:
- Added TestScaleToZeroPrevention with 3 test cases
- Validates the defensive logic for all scenarios
- All existing tests continue to pass

Signed-off-by: Oliver Trosien <oliver.trosien@zalando.de>
@mchernyavskaya
Copy link

  1. Does this mean that intentional scaling to 0 (like we do with the deployments directly sometimes before we want to drop them) would not be possible?
  2. Does this mean that es-operator scaling would not apply at all to system EDSes (those that only hold system indices)? So they would never scale?

@otrosien
Copy link
Member Author

otrosien commented Dec 12, 2025

We don't support scaling down to 0 pods, and shouldn't support it without additional safeguarding considerations. See this ticket for a some old discussion: #68

Regarding the second question, system indices can/should be treated special in most cases, see spec.excludeSystemIndices. Beware that hiding system indices for an EDS that is meant for holding these exclusively will make ES-Operator assume the EDS is empty, so here the situation is different.

@otrosien otrosien marked this pull request as ready for review December 12, 2025 11:53
@mikkeloscar
Copy link
Collaborator

👍

1 similar comment
@adrobisch
Copy link
Contributor

👍

@otrosien otrosien merged commit 17e9822 into master Dec 16, 2025
11 checks passed
@otrosien otrosien deleted the fix-empty-eds branch December 16, 2025 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fixes and patches, e.g. fixing of a production issue that is affecting the customer experience.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants