-
Notifications
You must be signed in to change notification settings - Fork 248
OCPBUGS-62517: Fix DeploymentController to comply with OpenShift Available API contract #2058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hi @joelanford , could you help approve it when you get a chance? Thanks! |
| // Per API contract, remain Available=True during normal operations | ||
| availableCondition = availableCondition. | ||
| WithStatus(opv1.ConditionTrue). | ||
| WithMessage("Deployment is rolling out"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to set the message here "Waiting for Deployment" to be consistent with previous semantics and any tooling?
655055c to
af0c5a8
Compare
|
@jianzhangbjz: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: grokspawn, jianzhangbjz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @bertinatto @p0lyn0mial, could you help approve it? Thanks! |
jsafrane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that this will need far more thoughts and work. And a lot of unit tests.
| // Check if deployment is actively being updated (spec change being rolled out) | ||
| // This is the primary indicator of a "normal upgrade" in progress | ||
| if deployment.Generation != deployment.Status.ObservedGeneration { | ||
| // Spec has changed, deployment controller is working on rolling it out | ||
| // Per API contract, remain Available during this normal upgrade | ||
| return true | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment + return true is wrong. Consider initial Deployment creation during cluster installation - the status is not populated yet (ObservedGeneration is 0) and there is no replica yet. The overall status of such operator is definitely not Available = true.
| // If we're actively rolling out, remain Available per API contract | ||
| if isRollingOut { | ||
| return true | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, during installation the status should be Available = false.
| // Special case: brand new deployment with no status conditions yet | ||
| // This happens during initial deployment before Kubernetes has had a chance to update status | ||
| if len(deployment.Status.Conditions) == 0 && deployment.Status.ObservedGeneration == 0 { | ||
| // Deployment just created, is progressing normally | ||
| return true | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just wrong.
- The code is not reachable
- The status should be
Available = Falseduring installation andAvailable = trueduring upgrade. This code has no clue about that.
|
BTW, many CSI driver operators use DeploymentController and we don't experience (With an exception of single node clusters - everyone gets |
Thanks for your info! Add PDB in operator-framework/operator-controller#2362, closed this PR first. |
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The
DeploymentControllerwas violating the OpenShift API contract which states: 'A component must not report Available=False during the course of a normal upgrade'.This fix ensures Available remains True during normal rolling updates and only goes False for actual failures.
Assisted-by: Claude code