Skip to content

Conversation

@damdo
Copy link
Member

@damdo damdo commented Dec 14, 2025

  • e2e: createCAPIMachine should only list workers for cloning one
    The CAPI machine creation function should only take worker machines as a cloning reference, so when getting a list of current CAPI machines it should exclude the control plane machines.

  • e2e: fix CAPI MachineSet scale-down test using wrong wait function
    The MAPI-authoritative MachineSet migration test was using
    WaitForMachineSet after a scale-down operation. This function is
    designed for scale-up scenarios where it waits for new machines to
    reach "Running" phase and verifies node readiness by connecting to
    the workload cluster.

    For scale-down operations, this is inappropriate because:

    • No new machines are being provisioned that need to become running
    • It requires workload cluster connectivity to verify node status
    • The remaining machines were already running before the scale-down

    The test was failing with "not all Machines are running: 0 of 1"
    after 30 minutes because the CAPI MachineSet controller couldn't
    connect to the workload cluster to verify node status, causing
    availableReplicas to be reported as 0.

    Replace WaitForMachineSet with verifyMachinesetReplicas for the
    scale-down test, consistent with the analogous test in
    machineset_migration_capi_authoritative_test.go. The
    verifyMachinesetReplicas function only verifies the replica count
    matches the expected value, which is sufficient for scale-down
    validation.

  • fix: sync: add NodeRef to CAPI machine status comparison

    The CAPIMachineStatusEqual function was missing NodeRef in its
    comparison of CAPI machine status fields. This meant that when a
    MAPI machine received a node assignment (status.nodeRef), the sync
    controller didn't detect it as a change and didn't sync it to the
    CAPI machine mirror.

    This caused the CAPI machine to have an empty NodeRef, which led to:

    • CAPI MachineSet controller unable to verify node status
    • MachineSet reporting availableReplicas: 0 for running machines
    • Incorrect machine readiness calculations

    The conversion function already correctly included NodeRef in the
    converted status, but without the comparison, status updates were
    not triggered when only NodeRef changed.

    Add NodeRef to the list of compared fields in CAPIMachineStatusEqual
    so that changes to NodeRef are properly detected and synced from
    MAPI to CAPI machine mirrors.

Summary by CodeRabbit

  • Bug Fixes

    • Improved machine selection filtering to correctly identify worker machines in CAPI operations.
    • Enhanced machine status comparison to include additional reference attributes for comprehensive status validation.
  • Tests

    • Optimized CAPI MachineSet migration test execution flow for better efficiency.

✏️ Tip: You can customize this high-level summary in your review settings.

The MAPI-authoritative MachineSet migration test was using
WaitForMachineSet after a scale-down operation. This function is
designed for scale-up scenarios where it waits for new machines to
reach "Running" phase and verifies node readiness by connecting to
the workload cluster.

For scale-down operations, this is inappropriate because:
- No new machines are being provisioned that need to become running
- It requires workload cluster connectivity to verify node status
- The remaining machines were already running before the scale-down

The test was failing with "not all Machines are running: 0 of 1"
after 30 minutes because the CAPI MachineSet controller couldn't
connect to the workload cluster to verify node status, causing
availableReplicas to be reported as 0.

Replace WaitForMachineSet with verifyMachinesetReplicas for the
scale-down test, consistent with the analogous test in
machineset_migration_capi_authoritative_test.go. The
verifyMachinesetReplicas function only verifies the replica count
matches the expected value, which is sufficient for scale-down
validation.
The CAPIMachineStatusEqual function was missing NodeRef in its
comparison of CAPI machine status fields. This meant that when a
MAPI machine received a node assignment (status.nodeRef), the sync
controller didn't detect it as a change and didn't sync it to the
CAPI machine mirror.

This caused the CAPI machine to have an empty NodeRef, which led to:
- CAPI MachineSet controller unable to verify node status
- MachineSet reporting availableReplicas: 0 for running machines
- Incorrect machine readiness calculations

The conversion function already correctly included NodeRef in the
converted status, but without the comparison, status updates were
not triggered when only NodeRef changed.

Add NodeRef to the list of compared fields in CAPIMachineStatusEqual
so that changes to NodeRef are properly detected and synced from
MAPI to CAPI machine mirrors.
@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 14, 2025
@openshift-ci-robot
Copy link

@damdo: This pull request references Jira Issue OCPBUGS-63524, which is invalid:

  • expected the bug to target either version "4.22." or "openshift-4.22.", but it targets "4.21.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

  • e2e: createCAPIMachine should only list workers for cloning one
    The CAPI machine creation function should only take worker machines as a cloning reference, so when getting a list of current CAPI machines it should exclude the control plane machines.

  • e2e: fix CAPI MachineSet scale-down test using wrong wait function
    The MAPI-authoritative MachineSet migration test was using
    WaitForMachineSet after a scale-down operation. This function is
    designed for scale-up scenarios where it waits for new machines to
    reach "Running" phase and verifies node readiness by connecting to
    the workload cluster.

    For scale-down operations, this is inappropriate because:

    • No new machines are being provisioned that need to become running
    • It requires workload cluster connectivity to verify node status
    • The remaining machines were already running before the scale-down

    The test was failing with "not all Machines are running: 0 of 1"
    after 30 minutes because the CAPI MachineSet controller couldn't
    connect to the workload cluster to verify node status, causing
    availableReplicas to be reported as 0.

    Replace WaitForMachineSet with verifyMachinesetReplicas for the
    scale-down test, consistent with the analogous test in
    machineset_migration_capi_authoritative_test.go. The
    verifyMachinesetReplicas function only verifies the replica count
    matches the expected value, which is sufficient for scale-down
    validation.

  • fix: sync: add NodeRef to CAPI machine status comparison

    The CAPIMachineStatusEqual function was missing NodeRef in its
    comparison of CAPI machine status fields. This meant that when a
    MAPI machine received a node assignment (status.nodeRef), the sync
    controller didn't detect it as a change and didn't sync it to the
    CAPI machine mirror.

    This caused the CAPI machine to have an empty NodeRef, which led to:

    • CAPI MachineSet controller unable to verify node status
    • MachineSet reporting availableReplicas: 0 for running machines
    • Incorrect machine readiness calculations

    The conversion function already correctly included NodeRef in the
    converted status, but without the comparison, status updates were
    not triggered when only NodeRef changed.

    Add NodeRef to the list of compared fields in CAPIMachineStatusEqual
    so that changes to NodeRef are properly detected and synced from
    MAPI to CAPI machine mirrors.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Dec 14, 2025

Walkthrough

Changes include filtering machine reference selection to worker nodes via label selector, expanding status equality comparison to include NodeRef field, removing a test synchronization wait step, and updating logging terminology to clarify CAPI context.

Changes

Cohort / File(s) Summary
E2E machine migration and filtering
e2e/machine_migration_helpers.go
Narrowed CAPI machine reference selection to worker machines by adding a label selector that excludes control plane machines in createCAPIMachine
Test synchronization updates
e2e/machineset_migration_mapi_authoritative_test.go
Removed WaitForMachineSet invocation after scaling CAPI MachineSet to 1 replica in test scenario
Framework logging updates
e2e/framework/machineset.go
Updated WaitForMachineSet log message to include "CAPI" prefix for clarity
Status equality comparison
pkg/util/sync.go
Extended CAPIMachineStatusEqual to compare NodeRef field using deep equality and record differences in diff map

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • e2e/machine_migration_helpers.go: Verify the worker label selector logic and confirm it correctly excludes control plane machines without unintended side effects
  • e2e/machineset_migration_mapi_authoritative_test.go: Confirm removal of wait call does not mask timing-related test flakiness or race conditions
  • pkg/util/sync.go: Verify NodeRef comparison logic integrates properly with existing status equality checks

Poem

🐰 The MachineSet now speaks with CAPI pride,
Worker nodes filter through with labeled guide,
NodeRef comparisons run deep and true,
Synchronization waits bid test adieu! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies three distinct bug fixes across the CAPI e2e test suite, directly reflecting the main changes in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8ee10a1 and 7410210.

📒 Files selected for processing (4)
  • e2e/framework/machineset.go (1 hunks)
  • e2e/machine_migration_helpers.go (1 hunks)
  • e2e/machineset_migration_mapi_authoritative_test.go (0 hunks)
  • pkg/util/sync.go (1 hunks)
💤 Files with no reviewable changes (1)
  • e2e/machineset_migration_mapi_authoritative_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
e2e/machine_migration_helpers.go (1)
e2e/framework/machine.go (1)
  • GetMachines (18-44)
🔇 Additional comments (3)
e2e/framework/machineset.go (1)

139-139: LGTM!

The log message clarification improves test output readability by explicitly mentioning "CAPI" context.

pkg/util/sync.go (1)

209-211: LGTM!

Adding NodeRef to the status comparison is essential for correct synchronization. The implementation follows the existing pattern and fixes the issue where NodeRef updates from MAPI machines were not detected, causing CAPI machines to remain without NodeRef and MachineSet controllers to report availableReplicas: 0.

e2e/machine_migration_helpers.go (1)

27-36: LGTM!

The worker label selector correctly filters machine selection to exclude control plane machines by using DoesNotExist operator on clusterv1beta1.MachineControlPlaneLabel. This ensures only worker machines are used as cloning references, which aligns with the PR objective. The constant is properly imported from the CAPI library and the implementation is correct.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from chrischdi and nrb December 14, 2025 20:52
@damdo
Copy link
Member Author

damdo commented Dec 14, 2025

/test e2e-aws-capi-techpreview

@damdo damdo added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2025
@openshift-ci-robot
Copy link

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@damdo
Copy link
Member Author

damdo commented Dec 15, 2025

/pipeline required

@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

Copy link
Contributor

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrischdi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 15, 2025
@damdo
Copy link
Member Author

damdo commented Dec 15, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 15, 2025
@openshift-ci-robot
Copy link

@damdo: This pull request references Jira Issue OCPBUGS-63524, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huali9

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from huali9 December 15, 2025 08:53
@damdo
Copy link
Member Author

damdo commented Dec 15, 2025

/override ci/prow/e2e-aws-ovn

Our component doesn't run in Default env.
Failure unrelated to these e2e changes. It is an HAProxy issue

@damdo
Copy link
Member Author

damdo commented Dec 15, 2025

/override ci/prow/regression-clusterinfra-aws-ipi-techpreview-capi

All tests passed bare the known case issue being fixed in https://github.com/openshift/openshift-tests-private/pull/28396

@damdo
Copy link
Member Author

damdo commented Dec 15, 2025

/label acknowledge-critical-fixes-only

Fixes e2e flakes in e2e-aws-capi-techpreview

@damdo
Copy link
Member Author

damdo commented Dec 15, 2025

/hold

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Dec 15, 2025
@openshift-ci-robot
Copy link

@damdo: This PR has been marked as verified by https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/429/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-capi-techpreview/2000486059946807296.

Details

In response to this:

/verified by https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/429/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-capi-techpreview/2000486059946807296

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 15, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-aws-ovn

Details

In response to this:

/override ci/prow/e2e-aws-ovn

Our component doesn't run in Default env.
Failure unrelated to these e2e changes. It is an HAProxy issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

@damdo: Overrode contexts on behalf of damdo: ci/prow/regression-clusterinfra-aws-ipi-techpreview-capi

Details

In response to this:

/override ci/prow/regression-clusterinfra-aws-ipi-techpreview-capi

All tests passed bare the known case issue being fixed in https://github.com/openshift/openshift-tests-private/pull/28396

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@damdo
Copy link
Member Author

damdo commented Dec 15, 2025

/override ci/prow/e2e-openstack-ovn-techpreview

Overriding openstack as the are CI issues with it (UDP errors) and this change does not affect that job.

Also https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/429/pull-ci-openshift-cluster-capi-operator-main-e2e-openstack-capi-techpreview/2000486084277964800 passed

@damdo
Copy link
Member Author

damdo commented Dec 15, 2025

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-openstack-ovn-techpreview

Details

In response to this:

/override ci/prow/e2e-openstack-ovn-techpreview

Overriding openstack as the are CI issues with it (UDP errors) and this change does not affect that job.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

@damdo: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit bd0557a into openshift:main Dec 15, 2025
25 checks passed
@openshift-ci-robot
Copy link

@damdo: Jira Issue OCPBUGS-63524: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-63524 has not been moved to the MODIFIED state.

This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload.

Details

In response to this:

  • e2e: createCAPIMachine should only list workers for cloning one
    The CAPI machine creation function should only take worker machines as a cloning reference, so when getting a list of current CAPI machines it should exclude the control plane machines.

  • e2e: fix CAPI MachineSet scale-down test using wrong wait function
    The MAPI-authoritative MachineSet migration test was using
    WaitForMachineSet after a scale-down operation. This function is
    designed for scale-up scenarios where it waits for new machines to
    reach "Running" phase and verifies node readiness by connecting to
    the workload cluster.

    For scale-down operations, this is inappropriate because:

    • No new machines are being provisioned that need to become running
    • It requires workload cluster connectivity to verify node status
    • The remaining machines were already running before the scale-down

    The test was failing with "not all Machines are running: 0 of 1"
    after 30 minutes because the CAPI MachineSet controller couldn't
    connect to the workload cluster to verify node status, causing
    availableReplicas to be reported as 0.

    Replace WaitForMachineSet with verifyMachinesetReplicas for the
    scale-down test, consistent with the analogous test in
    machineset_migration_capi_authoritative_test.go. The
    verifyMachinesetReplicas function only verifies the replica count
    matches the expected value, which is sufficient for scale-down
    validation.

  • fix: sync: add NodeRef to CAPI machine status comparison

    The CAPIMachineStatusEqual function was missing NodeRef in its
    comparison of CAPI machine status fields. This meant that when a
    MAPI machine received a node assignment (status.nodeRef), the sync
    controller didn't detect it as a change and didn't sync it to the
    CAPI machine mirror.

    This caused the CAPI machine to have an empty NodeRef, which led to:

    • CAPI MachineSet controller unable to verify node status
    • MachineSet reporting availableReplicas: 0 for running machines
    • Incorrect machine readiness calculations

    The conversion function already correctly included NodeRef in the
    converted status, but without the comparison, status updates were
    not triggered when only NodeRef changed.

    Add NodeRef to the list of compared fields in CAPIMachineStatusEqual
    so that changes to NodeRef are properly detected and synced from
    MAPI to CAPI machine mirrors.

Summary by CodeRabbit

  • Bug Fixes

  • Improved machine selection filtering to correctly identify worker machines in CAPI operations.

  • Enhanced machine status comparison to include additional reference attributes for comprehensive status validation.

  • Tests

  • Optimized CAPI MachineSet migration test execution flow for better efficiency.

✏️ Tip: You can customize this high-level summary in your review settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@damdo
Copy link
Member Author

damdo commented Dec 16, 2025

/cherry-pick release-4.21

@openshift-cherrypick-robot

@damdo: new pull request created: #433

Details

In response to this:

/cherry-pick release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants