NO-JIRA: Fix static pod pruning logic for non-contiguous set of revisions #2060
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix static pod pruning logic for non-contiguous set of revisions
Problem
The
PruneControllercontains a logic bug inrevisionsToKeep()that prevents pruning when the protected revision set is non-contiguous but spans from revision1toLatestAvailableRevision.Scenario that triggers the bug:
Node has very old
LastFailedRevision: 5Cluster is now at
LatestAvailableRevision: 100Limits are
failedRevisionLimit: 5,succeededRevisionLimit: 5Protected set becomes {1,2,3,4,5,96,97,98,99,100} (10 revisions)
The buggy logic sees:
First element:
1Last element:
100Returns
keepAll = true-> No pruning happens.This causes a lot of
revision-status-* ConfigMaps(and their ownedConfigMaps) to accumulate until a later failed revision eventually removes the first revision from the set.Solution
Check if the set has exactly LatestAvailableRevision elements before triggering the keepAll optimization. This ensures that the set has no gaps and is in-fact contiguous.
Testing
Added test case: "prunes non-contiguous set (keeps 1-10 and 96-100, prunes 11-95)" that verifies:
Two nodes with
LastFailedRevision: 5andLastFailedRevision: 10CurrentRevision: 100on both nodesLatestAvailableRevision: 100Protected set:
{1,2,3,4,5,6,7,8,9,10,96,97,98,99,100}(15 revisions)Revisions
11 - 95are pruned.keepAlloptimization does not trigger.