-
Notifications
You must be signed in to change notification settings - Fork 248
OCPBUGS-23746: Add availableInertia support to prevent transient APIServicesAvailable flapping #2057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add the ability to configure inertia for Available conditions in the StatusSyncer, following the same pattern as degradedInertia. Without this change, Available conditions flip to False immediately upon any error, regardless of how brief the error is. This causes false positives in CI and confuses admins during upgrades when transient errors (like "malformed header: missing HTTP content-type") that last only 1 second trigger Available=False. Changes: - Add availableInertia field to StatusSyncer struct - Add WithAvailableInertia() method to configure inertia - Use availableInertia in Sync() when setting OperatorAvailable condition 🤖 Generated with Claude Code via /jira:solve OCPBUGS-23746 Co-Authored-By: Claude <[email protected]>
Add WithStatusControllerAPIServicesAvailableInertia helper function to
configure a 5-second inertia for APIServicesAvailable conditions. This
prevents brief transient errors (like missing HTTP content-type headers)
from causing Available=False in the ClusterOperator status.
The 5-second duration is chosen to:
- Tolerate brief network hiccups and transient errors (JIRA shows 1s errors)
- Still catch real issues quickly (much shorter than 2-minute degraded inertia)
- Reduce false positives in CI during upgrades
Usage example for operators using APIServices:
statusControllerOptions = append(statusControllerOptions,
apiservercontrollerset.WithStatusControllerAPIServicesAvailableInertia())
🤖 Generated with Claude Code via /jira:solve OCPBUGS-23746
Co-Authored-By: Claude <[email protected]>
|
@sdodson: This pull request references Jira Issue OCPBUGS-23746, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sdodson The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@sdodson: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| Duration: 5 * time.Second, // tolerate brief transient errors | ||
| }).Inertia, | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to get @wking 's opinion on this (a few years ago I recall he was opposed to a change similar to this one, but I don't remember well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'd showed him his PR but mentioned i was going to work on reproducing the issue and then confirming that this fixes it before I asked for more attention. Moving this back to draft for now.
Summary
availableInertiafield andWithAvailableInertia()method to StatusSyncer to support inertia for Available conditionsWithStatusControllerAPIServicesAvailableInertia()helper that sets 5-second inertia for APIServicesAvailable conditionsThis prevents brief transient errors (like temporary network issues or missing HTTP headers) from causing APIServicesAvailable conditions to flap between Available=True and Available=False. The 5-second inertia allows these transient issues to self-resolve before affecting the operator's Available status.
Test plan
🤖 Generated with Claude Code