Skip to content

Conversation

@jianzhangbjz
Copy link
Member

@jianzhangbjz jianzhangbjz commented Nov 24, 2025

The DeploymentController was violating the OpenShift API contract which states: 'A component must not report Available=False during the course of a normal upgrade'.
This fix ensures Available remains True during normal rolling updates and only goes False for actual failures.
Assisted-by: Claude code

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Nov 24, 2025
@openshift-ci-robot
Copy link

@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The DeploymentController was violating the OpenShift API contract which states: 'A component must not report Available=False during the course of a normal upgrade'.
This fix ensures Available remains True during normal rolling updates and only goes False for actual failures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

The DeploymentController was violating the OpenShift API contract which states: 'A component must not report Available=False during the course of a normal upgrade'.
This fix ensures Available remains True during normal rolling updates and only goes False for actual failures.
Assisted-by: Claude code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jianzhangbjz
Copy link
Member Author

Hi @joelanford , could you help approve it when you get a chance? Thanks!

// Per API contract, remain Available=True during normal operations
availableCondition = availableCondition.
WithStatus(opv1.ConditionTrue).
WithMessage("Deployment is rolling out").

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to set the message here "Waiting for Deployment" to be consistent with previous semantics and any tooling?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 25, 2025

@jianzhangbjz: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@grokspawn
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 25, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: grokspawn, jianzhangbjz
Once this PR has been reviewed and has the lgtm label, please assign p0lyn0mial for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jianzhangbjz
Copy link
Member Author

Hi @bertinatto @p0lyn0mial, could you help approve it? Thanks!

Copy link
Contributor

@jsafrane jsafrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that this will need far more thoughts and work. And a lot of unit tests.

Comment on lines +429 to +435
// Check if deployment is actively being updated (spec change being rolled out)
// This is the primary indicator of a "normal upgrade" in progress
if deployment.Generation != deployment.Status.ObservedGeneration {
// Spec has changed, deployment controller is working on rolling it out
// Per API contract, remain Available during this normal upgrade
return true
}
Copy link
Contributor

@jsafrane jsafrane Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment + return true is wrong. Consider initial Deployment creation during cluster installation - the status is not populated yet (ObservedGeneration is 0) and there is no replica yet. The overall status of such operator is definitely not Available = true.

Comment on lines +458 to +461
// If we're actively rolling out, remain Available per API contract
if isRollingOut {
return true
}
Copy link
Contributor

@jsafrane jsafrane Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, during installation the status should be Available = false.

Comment on lines +486 to +491
// Special case: brand new deployment with no status conditions yet
// This happens during initial deployment before Kubernetes has had a chance to update status
if len(deployment.Status.Conditions) == 0 && deployment.Status.ObservedGeneration == 0 {
// Deployment just created, is progressing normally
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just wrong.

  1. The code is not reachable
  2. The status should be Available = False during installation and Available = true during upgrade. This code has no clue about that.

@jsafrane
Copy link
Contributor

BTW, many CSI driver operators use DeploymentController and we don't experience Available=False during upgrade. Do you have correct PDBs set? Node drain should leave always at least one Deployment pod running, which keeps the controller Available.

(With an exception of single node clusters - everyone gets Available=false there during upgrade. So far nobody was concerned enough to fix it.)

@jianzhangbjz
Copy link
Member Author

Do you have correct PDBs set? Node drain should leave always at least one Deployment pod running, which keeps the controller Available.

Thanks for your info! Add PDB in operator-framework/operator-controller#2362, closed this PR first.

@openshift-ci-robot
Copy link

@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

The DeploymentController was violating the OpenShift API contract which states: 'A component must not report Available=False during the course of a normal upgrade'.
This fix ensures Available remains True during normal rolling updates and only goes False for actual failures.
Assisted-by: Claude code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants