Skip to content

Conversation

openshift-cherrypick-robot

This is an automated cherry-pick of #5305

/assign pablintino

@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-62341 has been cloned as Jira Issue OCPBUGS-63126. Will retitle bug to link to clone.
/retitle [release-4.19] OCPBUGS-63126: Ensure the node passed to RunCordonOrUncordon comes from the latest updated state

In response to this:

This is an automated cherry-pick of #5305

/assign pablintino

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot changed the title [release-4.19] OCPBUGS-62341: Ensure the node passed to RunCordonOrUncordon comes from the latest updated state [release-4.19] OCPBUGS-63126: Ensure the node passed to RunCordonOrUncordon comes from the latest updated state Oct 15, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 15, 2025
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-63126, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected dependent Jira Issue OCPBUGS-62341 to target a version in 4.20.0, but it targets "4.21.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #5305

/assign pablintino

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sergiordlr
Copy link
Contributor

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 15, 2025
@sergiordlr
Copy link
Contributor

Verified using IPI on AWS

  1. Create a webhook that will make fail any attempt to change the .spec.unschedulable value in a node. It will make all cordon/uncordon operations fail

This is an example of a webhook failing all cordon/uncordon operations: https://github.com/sergiordlr/temp-testfiles/tree/master/webhook_example

  1. Apply a machineconfiguraion to make MCO cordon/uncordon the nodes to apply the config
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-machine-config-0
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,dGVzdA==
        path: /etc/test-file-0.test

  1. Check that the MCO controller cannot cordon the node and starts retrying
I1016 16:20:17.839772       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: cordoning
I1016 16:20:17.839814       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: initiating cordon (currently schedulable: true)
I1016 16:20:17.862667       1 drain_controller.go:581] cordon failed with: cordon error: admission webhook "unschedulable-webhook.default.svc" denied the request: Changing .spec.unschedulable on node is forbidden., retrying
I1016 16:20:27.863719       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: initiating cordon (currently schedulable: false)
I1016 16:20:27.866998       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: RunCordonOrUncordon() succeeded but node is still not in cordon state, retrying
  1. Remove the MutatingWebhookConfiguration created in step 1 to allow cordon/uncordon operations succeed again
  2. Check that the controller can now cordon the node and start applying the config
I1016 16:20:17.862667       1 drain_controller.go:581] cordon failed with: cordon error: admission webhook "unschedulable-webhook.default.svc" denied the request: Changing .spec.unschedulable on node is forbidden., retrying
I1016 16:20:27.863719       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: initiating cordon (currently schedulable: false)
I1016 16:20:27.866998       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: RunCordonOrUncordon() succeeded but node is still not in cordon state, retrying
I1016 16:20:47.868665       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: initiating cordon (currently schedulable: true)
I1016 16:20:47.892043       1 drain_controller.go:581] cordon failed with: cordon error: admission webhook "unschedulable-webhook.default.svc" denied the request: Changing .spec.unschedulable on node is forbidden., retrying
I1016 16:21:27.892227       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: initiating cordon (currently schedulable: false)
I1016 16:21:27.896462       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: RunCordonOrUncordon() succeeded but node is still not in cordon state, retrying
I1016 16:22:47.896662       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: initiating cordon (currently schedulable: true)
I1016 16:22:47.906316       1 node_controller.go:606] Pool worker[zone=us-east-2a]: node ip-10-0-22-244.us-east-2.compute.internal: Reporting unready: node ip-10-0-22-244.us-east-2.compute.internal is reporting Unschedulable
I1016 16:22:47.911764       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: cordon succeeded (currently schedulable: false)
I1016 16:22:47.927238       1 node_controller.go:606] Pool worker[zone=us-east-2a]: node ip-10-0-22-244.us-east-2.compute.internal: changed taints
I1016 16:22:47.935186       1 drain_controller.go:193] node ip-10-0-22-244.us-east-2.compute.internal: initiating drain
E1016 16:22:48.970970       1 drain_controller.go:163] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/aws-ebs-csi-driver-node-8jj5j, openshift-cluster-node-tuning-operator/tuned-2zg9m, openshift-dns/dns-default-jxd5t, openshift-dns/node-resolver-fkpj7, openshift-image-registry/node-ca-8snhd, openshift-ingress-canary/ingress-canary-vq97k, openshift-insights/insights-runtime-extractor-qswmb, openshift-machine-config-operator/machine-config-daemon-rsbl9, openshift-monitoring/node-exporter-46wkm, openshift-multus/multus-4fkrb, openshift-multus/multus-additional-cni-plugins-8458v, openshift-multus/network-metrics-daemon-rjsv5, openshift-network-diagnostics/network-check-target-4bwk4, openshift-network-operator/iptables-alerter-p6bqr, openshift-ovn-kubernetes/ovnkube-node-sfhnx
  1. The configuration is properly applied in all nodes

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 16, 2025
@pablintino
Copy link
Contributor

/retest-required
/lgtm
/verified by @sergiordlr

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 22, 2025
@openshift-ci-robot
Copy link
Contributor

@pablintino: This PR has been marked as verified by @sergiordlr.

In response to this:

/retest-required
/lgtm
/verified by @sergiordlr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 22, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: openshift-cherrypick-robot, pablintino

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2025
@pablintino
Copy link
Contributor

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@pablintino: This pull request references Jira Issue OCPBUGS-63126, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected dependent Jira Issue OCPBUGS-63127 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

@openshift-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit c90d027 link true /test unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants