Skip to content

Conversation

@grandeit
Copy link

Fix static pod pruning logic for non-contiguous set of revisions

Problem

The PruneController contains a logic bug in revisionsToKeep() that prevents pruning when the protected revision set is non-contiguous but spans from revision 1 to LatestAvailableRevision.

Scenario that triggers the bug:
Node has very old LastFailedRevision: 5
Cluster is now at LatestAvailableRevision: 100
Limits are failedRevisionLimit: 5, succeededRevisionLimit: 5
Protected set becomes {1,2,3,4,5,96,97,98,99,100} (10 revisions)

The buggy logic sees:
First element: 1
Last element: 100
Returns keepAll = true -> No pruning happens.
This causes a lot of revision-status-* ConfigMaps (and their owned ConfigMaps) to accumulate until a later failed revision eventually removes the first revision from the set.

Solution

Check if the set has exactly LatestAvailableRevision elements before triggering the keepAll optimization. This ensures that the set has no gaps and is in-fact contiguous.

Testing

Added test case: "prunes non-contiguous set (keeps 1-10 and 96-100, prunes 11-95)" that verifies:

Two nodes with LastFailedRevision: 5 and LastFailedRevision: 10
CurrentRevision: 100 on both nodes
LatestAvailableRevision: 100

Protected set: {1,2,3,4,5,6,7,8,9,10,96,97,98,99,100} (15 revisions)
Revisions 11 - 95 are pruned.
keepAll optimization does not trigger.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 27, 2025
@openshift-ci-robot
Copy link

@grandeit: This pull request explicitly references no jira issue.

In response to this:

Fix static pod pruning logic for non-contiguous set of revisions

Problem

The PruneController contains a logic bug in revisionsToKeep() that prevents pruning when the protected revision set is non-contiguous but spans from revision 1 to LatestAvailableRevision.

Scenario that triggers the bug:
Node has very old LastFailedRevision: 5
Cluster is now at LatestAvailableRevision: 100
Limits are failedRevisionLimit: 5, succeededRevisionLimit: 5
Protected set becomes {1,2,3,4,5,96,97,98,99,100} (10 revisions)

The buggy logic sees:
First element: 1
Last element: 100
Returns keepAll = true -> No pruning happens.
This causes a lot of revision-status-* ConfigMaps (and their owned ConfigMaps) to accumulate until a later failed revision eventually removes the first revision from the set.

Solution

Check if the set has exactly LatestAvailableRevision elements before triggering the keepAll optimization. This ensures that the set has no gaps and is in-fact contiguous.

Testing

Added test case: "prunes non-contiguous set (keeps 1-10 and 96-100, prunes 11-95)" that verifies:

Two nodes with LastFailedRevision: 5 and LastFailedRevision: 10
CurrentRevision: 100 on both nodes
LatestAvailableRevision: 100

Protected set: {1,2,3,4,5,6,7,8,9,10,96,97,98,99,100} (15 revisions)
Revisions 11 - 95 are pruned.
keepAll optimization does not trigger.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 27, 2025

Hi @grandeit. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: grandeit
Once this PR has been reviewed and has the lgtm label, please assign dgrisonnet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants