MGMT-23191: Ensure MCO configs are present before rebooting the bootstrap node#2039
MGMT-23191: Ensure MCO configs are present before rebooting the bootstrap node#2039pastequo wants to merge 1 commit intoopenshift:masterfrom
Conversation
…trap node Co-authored by glennswest
|
@pastequo: This pull request references MGMT-23191 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe changes implement a bootstrap-safe synchronization mechanism for MachineConfig annotation consistency. When bootstrapping or with single control plane nodes, the installer waits for all master nodes to converge on consistent MachineConfig annotations (currentConfig, desiredConfig, and state) before proceeding with reboot. Supporting changes expand the k8s_client interface to retrieve MachineConfig objects and update API version references. Changes
Sequence DiagramsequenceDiagram
participant Installer as Installer
participant K8SClient as K8S Client
participant API as Kubernetes API
Installer->>Installer: During bootstrap/single CP node
Installer->>K8SClient: waitForMCAnnotationsConsistent()
loop Until consistent
K8SClient->>API: ListNodesByRole("master")
API-->>K8SClient: Master node list
K8SClient->>API: GetMachineConfig(name) for each node
API-->>K8SClient: MachineConfig objects
K8SClient->>K8SClient: Validate annotations exist<br/>and reference valid MachineConfigs
K8SClient->>K8SClient: Check all nodes converged<br/>to same desiredConfig
Note over K8SClient: Continue polling if inconsistent
end
K8SClient-->>Installer: Success (consistent state)
Installer->>Installer: Proceed with reboot
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
|
@pastequo: This pull request references MGMT-23191 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pastequo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/installer/installer.go (1)
929-985: Bound the new MC-annotation wait to avoid indefinite install hangs.Line 929 uses
waitForeverTimeout; if masters never converge, installation can stall indefinitely before finalize/reboot.♻️ Suggested timeout guard
const ( InstallDir = "/opt/install-dir" KubeconfigPath = "/opt/openshift/auth/kubeconfig" @@ registryDataDirOnMedia = "/run/media/iso/registry" + waitForMCAnnotationsTimeout = 30 * time.Minute ) @@ func (i *installer) waitForMCAnnotationsConsistent(ctx context.Context, kc k8s_client.K8SClient) error { i.log.Info("Waiting for MachineConfig annotations to be consistent on all master nodes") - return utils.WaitForPredicateWithContext(ctx, waitForeverTimeout, generalWaitInterval, func() bool { + return utils.WaitForPredicateWithContext(ctx, waitForMCAnnotationsTimeout, generalWaitInterval, func() bool { nodes, err := kc.ListNodesByRole("master")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/installer/installer.go` around lines 929 - 985, The wait loop in the method calling WaitForPredicateWithContext currently uses waitForeverTimeout and can hang indefinitely; change the timeout argument to a bounded duration (e.g., introduce a new constant like mcAnnotationsWaitTimeout or use an existing install timeout) and make it configurable if appropriate, then pass that bounded timeout into WaitForPredicateWithContext instead of waitForeverTimeout; update any related tests or callers and keep the predicate logic (which references kc.ListNodesByRole and annotations mcCurrentConfigAnnotation, mcDesiredConfigAnnotation, mcStateAnnotation and calls kc.GetMachineConfig) unchanged.src/installer/installer_test.go (1)
146-167: Tighten MC lookup assertions to prevent false-positive test passes.The helper currently expects
MinTimes(1)for bothListNodesByRoleandGetMachineConfig, which allows regressions to slip through. The actual implementation callsListNodesByRole("master")once andGetMachineConfigtwice (forcurrentConfiganddesiredConfig). With one node in the test data, these should be pinned to exact call counts and the config parameter should be verified.Suggested stricter expectations
- mockk8sclient.EXPECT().ListNodesByRole("master").Return(masterNodesWithMCAnnotations, nil).MinTimes(1) - mockk8sclient.EXPECT().GetMachineConfig(gomock.Any(), gomock.Any()).Return(&mcfgv1.MachineConfig{}, nil).MinTimes(1) + mockk8sclient.EXPECT().ListNodesByRole("master").Return(masterNodesWithMCAnnotations, nil).Times(1) + mockk8sclient.EXPECT().GetMachineConfig(gomock.Any(), "rendered-master-12345").Return(&mcfgv1.MachineConfig{}, nil).Times(2)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/installer/installer_test.go` around lines 146 - 167, In waitForMCAnnotationsConsistentSuccess tighten the mock expectations: replace MinTimes(1) on mockk8sclient.ListNodesByRole("master") with an exact single call expectation (Times(1)) and change the GetMachineConfig expectation to expect exactly two calls (Times(2)); additionally assert the GetMachineConfig parameter equals the config names from the test node annotations ("rendered-master-12345") (use gomock.Eq or equivalent) so each of the currentConfig and desiredConfig lookups are verified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/installer/installer_test.go`:
- Around line 146-167: In waitForMCAnnotationsConsistentSuccess tighten the mock
expectations: replace MinTimes(1) on mockk8sclient.ListNodesByRole("master")
with an exact single call expectation (Times(1)) and change the GetMachineConfig
expectation to expect exactly two calls (Times(2)); additionally assert the
GetMachineConfig parameter equals the config names from the test node
annotations ("rendered-master-12345") (use gomock.Eq or equivalent) so each of
the currentConfig and desiredConfig lookups are verified.
In `@src/installer/installer.go`:
- Around line 929-985: The wait loop in the method calling
WaitForPredicateWithContext currently uses waitForeverTimeout and can hang
indefinitely; change the timeout argument to a bounded duration (e.g., introduce
a new constant like mcAnnotationsWaitTimeout or use an existing install timeout)
and make it configurable if appropriate, then pass that bounded timeout into
WaitForPredicateWithContext instead of waitForeverTimeout; update any related
tests or callers and keep the predicate logic (which references
kc.ListNodesByRole and annotations mcCurrentConfigAnnotation,
mcDesiredConfigAnnotation, mcStateAnnotation and calls kc.GetMachineConfig)
unchanged.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (4)
vendor/modules.txtis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/internal/log/log.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (4)
src/installer/installer.gosrc/installer/installer_test.gosrc/k8s_client/k8s_client.gosrc/k8s_client/mock_k8s_client.go
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2039 +/- ##
==========================================
- Coverage 48.48% 48.34% -0.15%
==========================================
Files 20 20
Lines 4333 4379 +46
==========================================
+ Hits 2101 2117 +16
- Misses 2011 2033 +22
- Partials 221 229 +8
🚀 New features to boost your workflow:
|
|
/retest |
|
/hold |
|
@pastequo: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This work was made from https://github.com/glennswest/assisted-installer
Ensure MCO configs are repsent before rebooting the bootstrap node
cc @glennswest
Summary by CodeRabbit
New Features
Chores