Skip to content

Conversation

@sdodson
Copy link
Member

@sdodson sdodson commented Nov 21, 2025

Summary

  • Add availableInertia field and WithAvailableInertia() method to StatusSyncer to support inertia for Available conditions
  • Add WithStatusControllerAPIServicesAvailableInertia() helper that sets 5-second inertia for APIServicesAvailable conditions
  • Apply availableInertia when setting OperatorAvailable condition in StatusSyncer.Sync()

This prevents brief transient errors (like temporary network issues or missing HTTP headers) from causing APIServicesAvailable conditions to flap between Available=True and Available=False. The 5-second inertia allows these transient issues to self-resolve before affecting the operator's Available status.

Test plan

  • Verify that temporary APIServicesAvailable errors (< 5 seconds) do not cause Available=False
  • Verify that persistent APIServicesAvailable errors (> 5 seconds) correctly set Available=False
  • Run existing unit and integration tests

🤖 Generated with Claude Code

sdodson and others added 2 commits November 20, 2025 15:07
Add the ability to configure inertia for Available conditions in the
StatusSyncer, following the same pattern as degradedInertia.

Without this change, Available conditions flip to False immediately upon
any error, regardless of how brief the error is. This causes false
positives in CI and confuses admins during upgrades when transient errors
(like "malformed header: missing HTTP content-type") that last only 1
second trigger Available=False.

Changes:
- Add availableInertia field to StatusSyncer struct
- Add WithAvailableInertia() method to configure inertia
- Use availableInertia in Sync() when setting OperatorAvailable condition

🤖 Generated with Claude Code via /jira:solve OCPBUGS-23746

Co-Authored-By: Claude <[email protected]>
Add WithStatusControllerAPIServicesAvailableInertia helper function to
configure a 5-second inertia for APIServicesAvailable conditions. This
prevents brief transient errors (like missing HTTP content-type headers)
from causing Available=False in the ClusterOperator status.

The 5-second duration is chosen to:
- Tolerate brief network hiccups and transient errors (JIRA shows 1s errors)
- Still catch real issues quickly (much shorter than 2-minute degraded inertia)
- Reduce false positives in CI during upgrades

Usage example for operators using APIServices:
  statusControllerOptions = append(statusControllerOptions,
    apiservercontrollerset.WithStatusControllerAPIServicesAvailableInertia())

🤖 Generated with Claude Code via /jira:solve OCPBUGS-23746

Co-Authored-By: Claude <[email protected]>
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 21, 2025
@openshift-ci-robot
Copy link

@sdodson: This pull request references Jira Issue OCPBUGS-23746, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Summary

  • Add availableInertia field and WithAvailableInertia() method to StatusSyncer to support inertia for Available conditions
  • Add WithStatusControllerAPIServicesAvailableInertia() helper that sets 5-second inertia for APIServicesAvailable conditions
  • Apply availableInertia when setting OperatorAvailable condition in StatusSyncer.Sync()

This prevents brief transient errors (like temporary network issues or missing HTTP headers) from causing APIServicesAvailable conditions to flap between Available=True and Available=False. The 5-second inertia allows these transient issues to self-resolve before affecting the operator's Available status.

Test plan

  • Verify that temporary APIServicesAvailable errors (< 5 seconds) do not cause Available=False
  • Verify that persistent APIServicesAvailable errors (> 5 seconds) correctly set Available=False
  • Run existing unit and integration tests

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from jsafrane November 21, 2025 16:20
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdodson
Once this PR has been reviewed and has the lgtm label, please assign p0lyn0mial for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested a review from tkashem November 21, 2025 16:20
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2025

@sdodson: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Duration: 5 * time.Second, // tolerate brief transient errors
}).Inertia,
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get @wking 's opinion on this (a few years ago I recall he was opposed to a change similar to this one, but I don't remember well).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'd showed him his PR but mentioned i was going to work on reproducing the issue and then confirming that this fixes it before I asked for more attention. Moving this back to draft for now.

@sdodson sdodson marked this pull request as draft November 24, 2025 15:05
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants