-
Notifications
You must be signed in to change notification settings - Fork 52
OCPBUGS-63524: fix: e2e: CAPI creation issue/scale-down wait issue/machine nodeRef comparison issue #429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-63524: fix: e2e: CAPI creation issue/scale-down wait issue/machine nodeRef comparison issue #429
Conversation
The MAPI-authoritative MachineSet migration test was using WaitForMachineSet after a scale-down operation. This function is designed for scale-up scenarios where it waits for new machines to reach "Running" phase and verifies node readiness by connecting to the workload cluster. For scale-down operations, this is inappropriate because: - No new machines are being provisioned that need to become running - It requires workload cluster connectivity to verify node status - The remaining machines were already running before the scale-down The test was failing with "not all Machines are running: 0 of 1" after 30 minutes because the CAPI MachineSet controller couldn't connect to the workload cluster to verify node status, causing availableReplicas to be reported as 0. Replace WaitForMachineSet with verifyMachinesetReplicas for the scale-down test, consistent with the analogous test in machineset_migration_capi_authoritative_test.go. The verifyMachinesetReplicas function only verifies the replica count matches the expected value, which is sufficient for scale-down validation.
The CAPIMachineStatusEqual function was missing NodeRef in its comparison of CAPI machine status fields. This meant that when a MAPI machine received a node assignment (status.nodeRef), the sync controller didn't detect it as a change and didn't sync it to the CAPI machine mirror. This caused the CAPI machine to have an empty NodeRef, which led to: - CAPI MachineSet controller unable to verify node status - MachineSet reporting availableReplicas: 0 for running machines - Incorrect machine readiness calculations The conversion function already correctly included NodeRef in the converted status, but without the comparison, status updates were not triggered when only NodeRef changed. Add NodeRef to the list of compared fields in CAPIMachineStatusEqual so that changes to NodeRef are properly detected and synced from MAPI to CAPI machine mirrors.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@damdo: This pull request references Jira Issue OCPBUGS-63524, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughChanges include filtering machine reference selection to worker nodes via label selector, expanding status equality comparison to include NodeRef field, removing a test synchronization wait step, and updating logging terminology to clarify CAPI context. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to data retention organization setting 📒 Files selected for processing (4)
💤 Files with no reviewable changes (1)
🧰 Additional context used🧬 Code graph analysis (1)e2e/machine_migration_helpers.go (1)
🔇 Additional comments (3)
Comment |
|
/test e2e-aws-capi-techpreview |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
/pipeline required |
|
Scheduling tests matching the |
chrischdi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrischdi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@damdo: This pull request references Jira Issue OCPBUGS-63524, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/override ci/prow/e2e-aws-ovn Our component doesn't run in Default env. |
|
/override ci/prow/regression-clusterinfra-aws-ipi-techpreview-capi All tests passed bare the known case issue being fixed in https://github.com/openshift/openshift-tests-private/pull/28396 |
|
/label acknowledge-critical-fixes-only Fixes e2e flakes in e2e-aws-capi-techpreview |
|
/hold |
|
@damdo: This PR has been marked as verified by DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-aws-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@damdo: Overrode contexts on behalf of damdo: ci/prow/regression-clusterinfra-aws-ipi-techpreview-capi DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/e2e-openstack-ovn-techpreview Overriding openstack as the are CI issues with it (UDP errors) and this change does not affect that job. |
|
/unhold |
|
@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-openstack-ovn-techpreview DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@damdo: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@damdo: Jira Issue OCPBUGS-63524: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-63524 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cherry-pick release-4.21 |
|
@damdo: new pull request created: #433 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
e2e: createCAPIMachine should only list workers for cloning one
The CAPI machine creation function should only take worker machines as a cloning reference, so when getting a list of current CAPI machines it should exclude the control plane machines.
e2e: fix CAPI MachineSet scale-down test using wrong wait function
The MAPI-authoritative MachineSet migration test was using
WaitForMachineSet after a scale-down operation. This function is
designed for scale-up scenarios where it waits for new machines to
reach "Running" phase and verifies node readiness by connecting to
the workload cluster.
For scale-down operations, this is inappropriate because:
The test was failing with "not all Machines are running: 0 of 1"
after 30 minutes because the CAPI MachineSet controller couldn't
connect to the workload cluster to verify node status, causing
availableReplicas to be reported as 0.
Replace WaitForMachineSet with verifyMachinesetReplicas for the
scale-down test, consistent with the analogous test in
machineset_migration_capi_authoritative_test.go. The
verifyMachinesetReplicas function only verifies the replica count
matches the expected value, which is sufficient for scale-down
validation.
fix: sync: add NodeRef to CAPI machine status comparison
The CAPIMachineStatusEqual function was missing NodeRef in its
comparison of CAPI machine status fields. This meant that when a
MAPI machine received a node assignment (status.nodeRef), the sync
controller didn't detect it as a change and didn't sync it to the
CAPI machine mirror.
This caused the CAPI machine to have an empty NodeRef, which led to:
The conversion function already correctly included NodeRef in the
converted status, but without the comparison, status updates were
not triggered when only NodeRef changed.
Add NodeRef to the list of compared fields in CAPIMachineStatusEqual
so that changes to NodeRef are properly detected and synced from
MAPI to CAPI machine mirrors.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.