Add ability to auto retry failed resources#695
Add ability to auto retry failed resources#695nagadeesh-nagaraja wants to merge 14 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds annotation-driven and automatic retry support across BIOS/BMC controllers: new Spec.FailedAutoRetryCount and Status.AutoRetryCountRemaining/ObservedGeneration fields, controller retry/propagation logic and conditions, deepcopy/CRD/schema updates, mock failed tasks, CLI flag, and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Controller as Reconciler (Controller)
participant K8s as Kubernetes API
participant Child as Child CR
participant Redfish as Redfish/BMC mock
Note over Controller,K8s: Failure detected or retry annotation applied
Controller->>K8s: Read Resource / Patch Status (init/decrement AutoRetryCountRemaining / set Retry condition / set Pending)
K8s-->>Controller: ACK
Controller->>Child: Propagate retry annotation or create/patch child
Child->>K8s: Update annotations/status
K8s-->>Child: ACK
Controller->>Redfish: Perform upgrade/check (image URI may cause failure)
Redfish-->>Controller: Task status (failed or success)
alt failed and retries remaining
Controller->>K8s: Patch Status (AutoRetryCountRemaining--, set RetryOfFailedResourceConditionIssued)
K8s-->>Controller: ACK
Controller->>Controller: Requeue (transition to Pending)
else no retries left
Controller->>K8s: Mark Failed (no requeue)
K8s-->>Controller: ACK
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/bmcsettings_controller_test.go (1)
466-574:⚠️ Potential issue | 🔴 CriticalRemove duplicate test: two
Itblocks with identical name and body at lines 466–574 and 576–684.Both define
It("Should replace missing BMCSettings ref in server", ...)with identical implementations. Ginkgo requires unique descriptions for siblingItnodes — this duplicate will cause the test suite to panic. Remove one of these tests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller_test.go` around lines 466 - 574, There are two identical Ginkgo tests both declared as It("Should replace missing BMCSettings ref in server", ...) which causes a panic due to duplicate sibling descriptions; remove one of the duplicate It blocks (the entire second copy of the test) or rename one to a unique description if you intend to keep both, ensuring you remove the duplicate test body that references bmcSettings/bmcSettings2, serverMaintenanceList, and the Expect/Delete flows (the duplicated It with the same name and body should be deleted or renamed to a unique string).
🧹 Nitpick comments (11)
bmc/mockup.go (1)
146-175:BMCUpgradeTaskFailedStatusis byte-for-byte identical toBIOSUpgradeTaskFailedStatus— consider extracting a shared helper.Both failure sequences are currently duplicated (7 identical entries across 28 lines each). This is harmless in test-only code but could silently diverge if one is updated without the other.
♻️ Optional: Extract a shared default failed-task sequence
+// defaultUpgradeFailedTaskStatus returns a canonical failed upgrade task +// sequence ending in ExceptionTaskState at 99 % completion. +func defaultUpgradeFailedTaskStatus() []schemas.Task { + return []schemas.Task{ + {TaskState: schemas.NewTaskState, PercentComplete: gofish.ToRef(uint(0))}, + {TaskState: schemas.PendingTaskState, PercentComplete: gofish.ToRef(uint(0))}, + {TaskState: schemas.StartingTaskState, PercentComplete: gofish.ToRef(uint(0))}, + {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(10))}, + {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(20))}, + {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(98))}, + {TaskState: schemas.ExceptionTaskState, PercentComplete: gofish.ToRef(uint(99))}, + } +}Then in
InitializeDefaults:- r.BIOSUpgradeTaskFailedStatus = []schemas.Task{ - {TaskState: schemas.NewTaskState, PercentComplete: gofish.ToRef(uint(0))}, - // ... 6 more identical entries ... - } + r.BIOSUpgradeTaskFailedStatus = defaultUpgradeFailedTaskStatus() - r.BMCUpgradeTaskFailedStatus = []schemas.Task{ - {TaskState: schemas.NewTaskState, PercentComplete: gofish.ToRef(uint(0))}, - // ... 6 more identical entries ... - } + r.BMCUpgradeTaskFailedStatus = defaultUpgradeFailedTaskStatus()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bmc/mockup.go` around lines 146 - 175, Extract the duplicated failed-task sequence into a shared helper (e.g., func defaultFailedTaskSequence() []schemas.Task) and replace both BMCUpgradeTaskFailedStatus and BIOSUpgradeTaskFailedStatus initializations to call that helper; update InitializeDefaults to use the new helper for both variables so the identical 7-entry slice lives in one place and cannot drift.internal/controller/bmcsettings_controller_test.go (1)
720-728: 10µs polling interval may cause flakiness or excessive API calls.
WithPolling(10 * time.Microsecond)is extremely aggressive for a controller test that involves API server round-trips. Consider a more reasonable interval like100*time.Millisecondto reduce test flakiness and unnecessary load on the test API server.Proposed fix
Eventually(func(g Gomega) bool { g.Expect(Get(bmcSettings)()).To(Succeed()) return bmcSettings.Status.State == metalv1alpha1.BMCSettingsStateFailed && bmcSettings.Status.AutoRetryCountRemaining == nil - }).WithPolling((10 * time.Microsecond)).Should(BeTrue()) + }).WithPolling(100 * time.Millisecond).Should(BeTrue()) Eventually(func(g Gomega) bool { g.Expect(Get(bmcSettings)()).To(Succeed()) return bmcSettings.Status.State == metalv1alpha1.BMCSettingsStateFailed && bmcSettings.Status.AutoRetryCountRemaining != nil && *bmcSettings.Status.AutoRetryCountRemaining == int32(1) - }).WithPolling((10 * time.Microsecond)).Should(BeTrue()) + }).WithPolling(100 * time.Millisecond).Should(BeTrue())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller_test.go` around lines 720 - 728, The two Eventually checks in bmcsettings_controller_test.go use an overly aggressive WithPolling((10 * time.Microsecond)) which can cause flakiness and excess API traffic; update both WithPolling calls to a more reasonable interval such as 100*time.Millisecond (or a test-constant like test.PollInterval) so the checks on bmcSettings.Status (metalv1alpha1.BMCSettingsStateFailed and AutoRetryCountRemaining) poll at a stable, less-frequent cadence.bmc/common/helpers.go (1)
33-38: Fix typo in function nameCheckAttribues→CheckAttributes.The function is named
CheckAttribues(missing an "t") and is defined in bmc/common/helpers.go (line 23) with 3 call sites: bmc/redfish_kube.go (line 329), bmc/redfish_local.go (line 241), and bmc/oem/dell.go (line 467). This is a good time to fix it during refactoring. Remember to runmake lint-fixandmake testafter making the changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bmc/common/helpers.go` around lines 33 - 38, Rename the misspelled function CheckAttribues to CheckAttributes (update its declaration in bmc/common/helpers.go) and update every call site that references CheckAttribues to use CheckAttributes instead; ensure the signature and any associated types like InvalidBMCSettingsError remain unchanged, run make lint-fix and make test to verify formatting and correctness, and commit the rename as a single refactor so references are consistent across the codebase.api/v1alpha1/biossettings_types.go (1)
20-22: Add minimum validation to prevent negative retry counts across all types.
FailedAutoRetryCountis used in four type definitions (biossettings_types.go,biosversion_types.go,bmcsettings_types.go, andbmcversion_types.go) without validation markers. Adding+kubebuilder:validation:Minimum=0to each would provide API-level validation to prevent nonsensical negative configurations, consistent with the project's existing pattern of using validation markers (e.g., Priority fields).Proposed fix for biossettings_types.go
// FailedAutoRetryCount is the number of times the controller should automatically retry the BIOSSettings upgrade in case of failure before giving up. // +optional + // +kubebuilder:validation:Minimum=0 FailedAutoRetryCount *int32 `json:"failedAutoRetryCount,omitempty"`Apply the same fix to
biosversion_types.go,bmcsettings_types.go, andbmcversion_types.go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/biossettings_types.go` around lines 20 - 22, Add an API validation marker to prevent negative retry counts by annotating the FailedAutoRetryCount field with +kubebuilder:validation:Minimum=0 in all four type definitions; locate the FailedAutoRetryCount *int32 `json:"failedAutoRetryCount,omitempty"` field in the BIOSSettings, BIOSVersion, BMCSettings, and BMCVersion type declarations and add the kubebuilder validation comment immediately above each field so the generated CRD enforces a minimum of 0.api/v1alpha1/biosversion_types.go (1)
47-50: Same optional: consider+kubebuilder:validation:Minimum=0onFailedAutoRetryCountSame suggestion as the other API types in this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/biosversion_types.go` around lines 47 - 50, Add a kubebuilder validation tag to the FailedAutoRetryCount field on the BIOSVersion type so the value cannot be negative: update the struct field annotation for FailedAutoRetryCount (the pointer *int32 json:"failedAutoRetryCount,omitempty") to include `+kubebuilder:validation:Minimum=0` alongside the existing `+optional` tag to enforce non-negative values at CR validation time.api/v1alpha1/bmcsettings_types.go (1)
22-24: Same optional: consider+kubebuilder:validation:Minimum=0onFailedAutoRetryCountConsistent with the note in
bmcversion_types.go,BIOSVersionTemplate, andBIOSSettingsTemplate— negative values are semantically invalid.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/bmcsettings_types.go` around lines 22 - 24, The FailedAutoRetryCount field currently allows negative values; add a kubebuilder validation tag to enforce non-negativity by annotating FailedAutoRetryCount (*int32) with +kubebuilder:validation:Minimum=0 (keep the existing json:"failedAutoRetryCount,omitempty" tag) so the BMCSettings CRD schema prevents negative numbers at validation time.api/v1alpha1/bmcversion_types.go (1)
37-39: Consider adding+kubebuilder:validation:Minimum=0toFailedAutoRetryCountA negative retry count is semantically invalid. The same applies to the analogous fields in
BIOSVersionTemplate,BMCSettingsTemplate, andBIOSSettingsTemplate.♻️ Proposed addition
+ // +kubebuilder:validation:Minimum=0 // FailedAutoRetryCount is the number of times the controller should automatically retry the BMCVersion upgrade in case of failure before giving up. // +optional FailedAutoRetryCount *int32 `json:"failedAutoRetryCount,omitempty"`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/bmcversion_types.go` around lines 37 - 39, Add a non-negative validation to the retry count fields by annotating the pointer int32 fields with +kubebuilder:validation:Minimum=0; specifically add this comment annotation immediately above FailedAutoRetryCount in BMCVersion (the FailedAutoRetryCount *int32 `json:"failedAutoRetryCount,omitempty"` field) and do the same for the corresponding fields in BIOSVersionTemplate, BMCSettingsTemplate, and BIOSSettingsTemplate so the kubebuilder-generated CRD will reject negative values while keeping the existing pointer type and json tag.internal/controller/bmcversion_controller_test.go (1)
385-439: Use millisecond polling to reduce test flakiness.
10µs polling can busy-spin and be scheduling-sensitive in envtest; a small ms interval is usually more stable.Suggested change
-}).WithPolling((10 * time.Microsecond)).Should(BeTrue()) +}).WithPolling((5 * time.Millisecond)).Should(BeTrue()) ... -}).WithPolling((10 * time.Microsecond)).Should(BeTrue()) +}).WithPolling((5 * time.Millisecond)).Should(BeTrue())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcversion_controller_test.go` around lines 385 - 439, The test uses extremely small microsecond polling intervals (WithPolling((10 * time.Microsecond))) which can cause busy-spinning and flakiness; change those WithPolling calls in this test (references: retryCount, bmcVersion, the Eventually(...) blocks that call Get(bmcVersion)()) to use a millisecond poll interval (e.g., 10*time.Millisecond or 100*time.Millisecond) instead of 10µs so the Eventually assertions are more stable in envtest.internal/controller/biosversion_controller_test.go (1)
385-441: Use millisecond polling to reduce test flakiness.
5µs polling is extremely tight; a small ms interval is typically more stable in envtest.Suggested change
-}).WithPolling((5 * time.Microsecond)).Should(BeTrue()) +}).WithPolling((5 * time.Millisecond)).Should(BeTrue()) ... -}).WithPolling((5 * time.Microsecond)).Should(BeTrue()) +}).WithPolling((5 * time.Millisecond)).Should(BeTrue())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biosversion_controller_test.go` around lines 385 - 441, The test uses extremely tight polling intervals (WithPolling((5 * time.Microsecond))) in the "Should allow retry using annotation" Eventually checks which causes flakiness; change both occurrences of WithPolling((5 * time.Microsecond)) to a millisecond-scale interval (for example WithPolling(5 * time.Millisecond)) so the Eventually(func(...) ...) waits are more stable — the change should be applied to the Eventually(func(g Gomega) bool { ... }) calls referenced in this spec.internal/controller/bmcsettingsset_controller_test.go (1)
513-638: Use millisecond polling to reduce test flakiness.
1µs polling can be CPU-heavy and timing-sensitive in envtest.Suggested change
-Eventually(Object(bmcSettings01)).WithPolling(1 * time.Microsecond).MustPassRepeatedly(1).Should(SatisfyAll( +Eventually(Object(bmcSettings01)).WithPolling(5 * time.Millisecond).MustPassRepeatedly(1).Should(SatisfyAll(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettingsset_controller_test.go` around lines 513 - 638, The test uses an ultra-tight poll interval causing flakiness: replace the WithPolling(1 * time.Microsecond) call in the "Should successfully retry failed state child resources once" spec (the Eventually(Object(bmcSettings01)).WithPolling(...) invocation) with a millisecond-based interval (e.g., 1 * time.Millisecond or a slightly larger value like 10 * time.Millisecond) to reduce CPU pressure and timing sensitivity while keeping the rest of the assertion logic unchanged.internal/controller/bmcsettings_controller.go (1)
601-617: Failing BMCSettings when BMC is powered off could consume retry attempts for transient conditions.When
handleBMCPowerStateencounters a powered-off BMC, it immediately transitions toFailed. Combined withFailedAutoRetryCount, temporary power-offs during settings application will burn through retry attempts. Consider whether aRequeueAfterwith a condition update (waiting for power-on) would be more resilient than immediately failing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller.go` around lines 601 - 617, The handler currently marks the BMCSettings as Failed when the BMC is Powered Off (in handleBMCPowerState), which consumes retry budget; change the logic so that when BMC.Status.PowerState is not PoweredOn you update the BMCPoweredOffCondition to a non-failed interim state (e.g., Pending/Unknown) via r.Conditions.Update and call updateBMCSettingsStatus with a non-failed state instead of metalv1alpha1.BMCSettingsStateFailed, and return a reconcile result that requeues after a short delay (use controllerutil or return ctrl.Result{RequeueAfter: <duration>}) so the controller retries later without consuming a failure retry count. Ensure you still record the BMCPoweredOffReason and an informative message when updating BMCPoweredOffCondition in handleBMCPowerState.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/crd/bases/metal.ironcore.dev_biossettingssets.yaml`:
- Around line 73-78: This CRD change should not be made directly in the
generated YAML; instead update the Go API type/marker that defines
failedAutoRetryCount (e.g., the BIOSSettingsSet or related API struct and its
kubebuilder markers where FailedAutoRetryCount is declared), then re-run
controller-gen via the project make target (run `make manifests`) to regenerate
config/crd/bases/*.yaml; do not manually edit the generated
metal.ironcore.dev_biossettingssets.yaml file.
In `@config/crd/bases/metal.ironcore.dev_bmcsettings.yaml`:
- Around line 70-75: The CRD field `failedAutoRetryCount` was edited directly in
the generated YAML; instead update the Go API type and markers and regenerate
manifests: edit the BMCSettings API (type BMCSettings or its Spec struct) to add
or adjust the field (FailedAutoRetryCount) and its kubebuilder validation
markers (e.g., +kubebuilder:validation:Minimum/Maximum or
+kubebuilder:validation:Format/int32 as needed), then run `make manifests` to
regenerate the CRD; do not modify the generated YAML under config/crd/bases
directly.
In `@internal/controller/biossettings_controller_test.go`:
- Around line 864-872: The test currently asserts a transient nil
AutoRetryCountRemaining immediately when State == BIOSSettingsStateFailed which
can be flaky; instead split into two waits: first wait for
biosSettings.Status.State == metalv1alpha1.BIOSSettingsStateFailed using
Eventually(Get(biosSettings)()) and then in a second Eventually wait for
biosSettings.Status.AutoRetryCountRemaining != nil &&
*biosSettings.Status.AutoRetryCountRemaining == int32(1); update the two
existing Eventually blocks around Get, biosSettings, BIOSSettingsStateFailed and
AutoRetryCountRemaining to implement this ordering so the nil phase is not
asserted.
In `@internal/controller/biossettings_controller.go`:
- Around line 1059-1088: When scheduling an auto-retry for BIOSSettings (inside
the block guarded by settings.Spec.FailedAutoRetryCount != nil), add a guard to
skip retry when *settings.Spec.FailedAutoRetryCount == 0 and ensure you never
set AutoRetryCountRemaining to a negative value; when you do schedule a retry,
reset settings.Status.FlowState to the zero value (clear stale FlowState) before
patching so the flow restarts (e.g., set settings.Status.FlowState = "" or the
type's zero) and include that change in the MergeFrom patch (biosSettingsBase).
Apply the same pattern to the corresponding handlers in
biosversion_controller.go, bmcsettings_controller.go and
bmcversion_controller.go, and keep the existing handling of
RetryOfFailedResourceConditionIssued and AutoRetryCountRemaining update but
ensure remaining is computed safely (do not decrement below 0).
In `@internal/controller/biossettingsset_controller_test.go`:
- Around line 487-491: The Consistently assertion intended to verify BIOSSetting
03 is wrong because it still references biosSettings02; update the target object
to biosSettings03 in that Consistently(Object(...)) call (and adjust the
descriptive By(...) text if desired) so the assertion checks
HaveField("Status.State", metalv1alpha1.BIOSSettingsStateFailed) and
HaveField("Status.AutoRetryCountRemaining", Equal(GetPtr(int32(0)))) against
biosSettings03 instead of biosSettings02.
In `@internal/controller/biossettingsset_controller.go`:
- Around line 294-300: The CreateOrPatch call is attempting to modify the status
subresource (settings.Status.AutoRetryCountRemaining) which is ignored for
resources with +kubebuilder:subresource:status; replace this CreateOrPatch usage
with a status patch/update against the status subresource (use r.Status().Patch
or r.Status().Update like the existing patchStatus() helper) so you fetch the
current settings, set settings.Status.AutoRetryCountRemaining =
settings.Spec.FailedAutoRetryCount and then call r.Status().Patch(ctx,
&settings, client.MergeFrom(orig)) (or follow the exact patchStatus() pattern)
to persist the status change.
In `@internal/controller/biosversion_controller.go`:
- Around line 435-493: The auto-retry logic in processFailedState allows a retry
when Spec.FailedAutoRetryCount is set to 0 (remaining==nil path) and can
decrement into negative; fix by explicitly checking FailedAutoRetryCount before
permitting a retry: in processFailedState (and the equivalent functions in
bmcversion_controller.go, bmcsettings_controller.go, biossettings_controller.go)
add a guard that if biosVersion.Spec.FailedAutoRetryCount != nil &&
*biosVersion.Spec.FailedAutoRetryCount == 0 then do not enter the auto-retry
block (return false,nil or skip retry), and when initializing
AutoRetryCountRemaining compute the initial value only when
Spec.FailedAutoRetryCount > 0 (set to Spec-1), otherwise leave
AutoRetryCountRemaining at 0 or nil to prevent negative values. Ensure updates
use the same GetCondition/Status patch flow already present.
In `@internal/controller/biosversionset_controller_test.go`:
- Around line 524-528: The second Consistently assertion is inspecting the wrong
test object: it should target biosVersion03 instead of biosVersion02; update the
Consistently(Object(...)) call in the test to pass the biosVersion03 variable so
the block labeled "BIOSVersion 03 has not been changed" validates the third
BIOSVersion child (keep the same SatisfyAll checks on Status.State and
Status.AutoRetryCountRemaining).
In `@internal/controller/bmcsettings_controller.go`:
- Around line 450-467: The error handling when GetCondition(r.Conditions,
bmcSetting.Status.Conditions, BMCSettingsConditionWrongSettings) fails is
wrapping the wrong variable (it uses the outer err from CheckBMCAttributes);
change the fmt.Errorf call to wrap and return the actual GetCondition error
variable (errCond) so the true failure is reported. Locate the block around
GetCondition and replace the fmt.Errorf("failed to get Condition for skip reboot
post setting update %v", err) to reference errCond instead, ensuring subsequent
logic (r.Conditions.Update and r.updateBMCSettingsStatus) remains unchanged.
- Around line 575-619: The error messages in handleBMCPowerState incorrectly
reference "BMCVersion" and the local variable BMCPoweredOffCondition shadows the
package-level constant; fix by updating the two fmt.Errorf messages passed on
Conditions.Update failure to say "failed to update Pending BMCSettings update
condition" (the calls inside handleBMCPowerState that wrap
conditionutils.UpdateStatus/Reason/Message), and rename the local
BMCPoweredOffCondition variable to bmcPoweredOffCond (or similar) in
handleBMCPowerState to avoid shadowing the package-level BMCPoweredOffCondition
constant while keeping all uses (GetCondition, Conditions.Update, and
updateBMCSettingsStatus) consistent with the new local name.
- Around line 709-764: The code in handleFailedState has three issues: the
GetCondition error message mentions "BIOSSettings" instead of "BMCSettings", a
copy/paste var name uses biosSettingsBase instead of bmcSettingsBase, and the
FailedAutoRetryCount logic allows Spec==0 to still perform one retry. Fix by:
update the fmt.Errorf call in the GetCondition error path to mention BMCSettings
(reference: GetCondition(...) error handling), rename the local variable
biosSettingsBase to bmcSettingsBase where the DeepCopy is taken before
status.Patch, and change the retry-count logic around
bmcSetting.Spec.FailedAutoRetryCount / bmcSetting.Status.AutoRetryCountRemaining
so that when Spec is non-nil and *Spec == 0 you do not enqueue an automatic
retry (initialize remaining only if *Spec > 0 and decrement safely otherwise
skip).
In `@internal/controller/bmcsettingsset_controller.go`:
- Around line 114-120: Update the incorrect "BIOSSettings*" log messages to
reference the correct resource types "BMCSettingsSet" or "BMCSettings" and
follow the project's logging style (start with a capital letter, use active
voice, past tense for error messages, and include object type). Specifically,
replace the log message in the log.V(1).Info call that currently reads "Updated
BIOSSettingsSet state" to something like "Updated BMCSettingsSet state" and
adjust any similar messages around the other occurrence (lines noted 361-375) to
reference BMCSettings/BMCSettingsSet; ensure related error logs follow the same
tense/format and locate these by the log.V(1).Info calls and the
handleRetryAnnotationPropagation context to make consistent updates.
- Around line 348-355: The current code uses controllerutil.CreateOrPatch to
modify bmcSettings.Status.AutoRetryCountRemaining (via
controllerutil.CreateOrPatch and bmcSettings), but status is a subresource so
updates must go through the status client; replace the CreateOrPatch status
modification with a status patch via r.Client.Status().Patch (or
Status().Update) targeting bmcSettings.Status.AutoRetryCountRemaining.
Concretely: take a deep copy of the original bmcSettings (e.g., orig :=
bmcSettings.DeepCopy()), modify bmcSettings.Status.AutoRetryCountRemaining =
bmcSettings.Spec.FailedAutoRetryCount, then call r.Client.Status().Patch(ctx,
&bmcSettings, client.MergeFrom(orig)) and handle errors (append to errs) instead
of using controllerutil.CreateOrPatch for the status change.
In `@internal/controller/bmcversion_controller.go`:
- Around line 203-213: The variable name biosVersionBase in the retry block is a
copy-paste mistake and should be renamed to bmcVersionBase to match the
surrounding BMCVersion context; in the block guarded by
shouldRetryReconciliation(bmcVersion) replace the identifier biosVersionBase
with bmcVersionBase where you call bmcVersion.DeepCopy() and pass the MergeFrom
option to r.Patch so the snapshot variable name consistently reflects the
resource being patched (bmcVersion).
- Around line 438-467: The retry guard allows FailedAutoRetryCount==0 to enter
the retry block, causing one unintended retry; update the logic that checks
FailedAutoRetryCount/AutoRetryCountRemaining (the guard using "remaining == nil
|| *remaining > 0" and the subsequent computation of AutoRetryCountRemaining) so
zero is treated as "no retries" — e.g., require Spec.FailedAutoRetryCount != nil
&& *Spec.FailedAutoRetryCount > 0 before proceeding or explicitly skip when
*Spec.FailedAutoRetryCount == 0; apply the same change to the identical blocks
handling FailedAutoRetryCount and AutoRetryCountRemaining in the BMCVersion,
BIOSVersion, BMCSettings and BIOSSettings controllers (symbols:
FailedAutoRetryCount, AutoRetryCountRemaining,
GetCondition/RetryOfFailedResourceConditionIssued, Status().Patch).
In `@internal/controller/bmcversionset_controller_test.go`:
- Around line 513-516: The test name says "once" but the variable retryCount :=
2 and the code path setting FailedAutoRetryCount uses that value, so rename the
local variable retryCount to a clearer name like autoRetryCount (or
failedAutoRetryCount) and update the It(...) description to accurately reflect
behavior (for example "Should successfully retry failed state child resources
once via annotation after auto-retries are exhausted" or adjust to match 2
automatic retries). Update any references to retryCount in this test (e.g. where
FailedAutoRetryCount is set and where OperationAnnotation is asserted) so names
and the test description consistently reflect whether the attempt is an
auto-retry count or a single annotation-driven retry.
- Around line 609-613: The assertion is checking the wrong variable and
comparing against the wrong named type: switch the observed object from
bmcVersion02 to bmcVersion03 (so the test actually verifies BMCVersion03), and
compare Status.State to the correct BMCVersionState constant (e.g.,
metalv1alpha1.BMCVersionStateFailed or an equivalent BMCVersionState("Failed"))
instead of metalv1alpha1.BIOSSettingsStateFailed so the equality check uses the
same named type; keep the AutoRetryCountRemaining assertion as-is but ensure the
object reference is bmcVersion03 and the state comparison uses BMCVersionState.
In `@internal/controller/bmcversionset_controller.go`:
- Around line 289-295: The current CreateOrPatch call on BMCVersion silently
drops Status.AutoRetryCountRemaining because status is a subresource; replace
this CreateOrPatch usage with the same pattern as updateStatus(): make a deep
copy of the original BMCVersion (orig := bmcVersion.DeepCopy()), mutate
bmcVersion.Status.AutoRetryCountRemaining =
bmcVersion.Spec.FailedAutoRetryCount, then call r.Status().Patch(ctx,
&bmcVersion, client.MergeFrom(orig)) and append any returned error to errs;
remove the CreateOrPatch invocation to ensure the status subresource is
correctly persisted.
---
Outside diff comments:
In `@internal/controller/bmcsettings_controller_test.go`:
- Around line 466-574: There are two identical Ginkgo tests both declared as
It("Should replace missing BMCSettings ref in server", ...) which causes a panic
due to duplicate sibling descriptions; remove one of the duplicate It blocks
(the entire second copy of the test) or rename one to a unique description if
you intend to keep both, ensuring you remove the duplicate test body that
references bmcSettings/bmcSettings2, serverMaintenanceList, and the
Expect/Delete flows (the duplicated It with the same name and body should be
deleted or renamed to a unique string).
---
Nitpick comments:
In `@api/v1alpha1/biossettings_types.go`:
- Around line 20-22: Add an API validation marker to prevent negative retry
counts by annotating the FailedAutoRetryCount field with
+kubebuilder:validation:Minimum=0 in all four type definitions; locate the
FailedAutoRetryCount *int32 `json:"failedAutoRetryCount,omitempty"` field in the
BIOSSettings, BIOSVersion, BMCSettings, and BMCVersion type declarations and add
the kubebuilder validation comment immediately above each field so the generated
CRD enforces a minimum of 0.
In `@api/v1alpha1/biosversion_types.go`:
- Around line 47-50: Add a kubebuilder validation tag to the
FailedAutoRetryCount field on the BIOSVersion type so the value cannot be
negative: update the struct field annotation for FailedAutoRetryCount (the
pointer *int32 json:"failedAutoRetryCount,omitempty") to include
`+kubebuilder:validation:Minimum=0` alongside the existing `+optional` tag to
enforce non-negative values at CR validation time.
In `@api/v1alpha1/bmcsettings_types.go`:
- Around line 22-24: The FailedAutoRetryCount field currently allows negative
values; add a kubebuilder validation tag to enforce non-negativity by annotating
FailedAutoRetryCount (*int32) with +kubebuilder:validation:Minimum=0 (keep the
existing json:"failedAutoRetryCount,omitempty" tag) so the BMCSettings CRD
schema prevents negative numbers at validation time.
In `@api/v1alpha1/bmcversion_types.go`:
- Around line 37-39: Add a non-negative validation to the retry count fields by
annotating the pointer int32 fields with +kubebuilder:validation:Minimum=0;
specifically add this comment annotation immediately above FailedAutoRetryCount
in BMCVersion (the FailedAutoRetryCount *int32
`json:"failedAutoRetryCount,omitempty"` field) and do the same for the
corresponding fields in BIOSVersionTemplate, BMCSettingsTemplate, and
BIOSSettingsTemplate so the kubebuilder-generated CRD will reject negative
values while keeping the existing pointer type and json tag.
In `@bmc/common/helpers.go`:
- Around line 33-38: Rename the misspelled function CheckAttribues to
CheckAttributes (update its declaration in bmc/common/helpers.go) and update
every call site that references CheckAttribues to use CheckAttributes instead;
ensure the signature and any associated types like InvalidBMCSettingsError
remain unchanged, run make lint-fix and make test to verify formatting and
correctness, and commit the rename as a single refactor so references are
consistent across the codebase.
In `@bmc/mockup.go`:
- Around line 146-175: Extract the duplicated failed-task sequence into a shared
helper (e.g., func defaultFailedTaskSequence() []schemas.Task) and replace both
BMCUpgradeTaskFailedStatus and BIOSUpgradeTaskFailedStatus initializations to
call that helper; update InitializeDefaults to use the new helper for both
variables so the identical 7-entry slice lives in one place and cannot drift.
In `@internal/controller/biosversion_controller_test.go`:
- Around line 385-441: The test uses extremely tight polling intervals
(WithPolling((5 * time.Microsecond))) in the "Should allow retry using
annotation" Eventually checks which causes flakiness; change both occurrences of
WithPolling((5 * time.Microsecond)) to a millisecond-scale interval (for example
WithPolling(5 * time.Millisecond)) so the Eventually(func(...) ...) waits are
more stable — the change should be applied to the Eventually(func(g Gomega) bool
{ ... }) calls referenced in this spec.
In `@internal/controller/bmcsettings_controller_test.go`:
- Around line 720-728: The two Eventually checks in
bmcsettings_controller_test.go use an overly aggressive WithPolling((10 *
time.Microsecond)) which can cause flakiness and excess API traffic; update both
WithPolling calls to a more reasonable interval such as 100*time.Millisecond (or
a test-constant like test.PollInterval) so the checks on bmcSettings.Status
(metalv1alpha1.BMCSettingsStateFailed and AutoRetryCountRemaining) poll at a
stable, less-frequent cadence.
In `@internal/controller/bmcsettings_controller.go`:
- Around line 601-617: The handler currently marks the BMCSettings as Failed
when the BMC is Powered Off (in handleBMCPowerState), which consumes retry
budget; change the logic so that when BMC.Status.PowerState is not PoweredOn you
update the BMCPoweredOffCondition to a non-failed interim state (e.g.,
Pending/Unknown) via r.Conditions.Update and call updateBMCSettingsStatus with a
non-failed state instead of metalv1alpha1.BMCSettingsStateFailed, and return a
reconcile result that requeues after a short delay (use controllerutil or return
ctrl.Result{RequeueAfter: <duration>}) so the controller retries later without
consuming a failure retry count. Ensure you still record the BMCPoweredOffReason
and an informative message when updating BMCPoweredOffCondition in
handleBMCPowerState.
In `@internal/controller/bmcsettingsset_controller_test.go`:
- Around line 513-638: The test uses an ultra-tight poll interval causing
flakiness: replace the WithPolling(1 * time.Microsecond) call in the "Should
successfully retry failed state child resources once" spec (the
Eventually(Object(bmcSettings01)).WithPolling(...) invocation) with a
millisecond-based interval (e.g., 1 * time.Millisecond or a slightly larger
value like 10 * time.Millisecond) to reduce CPU pressure and timing sensitivity
while keeping the rest of the assertion logic unchanged.
In `@internal/controller/bmcversion_controller_test.go`:
- Around line 385-439: The test uses extremely small microsecond polling
intervals (WithPolling((10 * time.Microsecond))) which can cause busy-spinning
and flakiness; change those WithPolling calls in this test (references:
retryCount, bmcVersion, the Eventually(...) blocks that call Get(bmcVersion)())
to use a millisecond poll interval (e.g., 10*time.Millisecond or
100*time.Millisecond) instead of 10µs so the Eventually assertions are more
stable in envtest.
|
@nagadeesh-nagaraja do you think it would be better to have the retry count globally configurable via flags first instead of addding the tot he spec? @coderabbitai what do you think about that? |
|
Tip For best results, initiate chat on the files or code changes.
Current approach (spec-level):
Global flag approach:
My recommendation: Hybrid approach 🎯 Consider implementing a global default via controller flag + optional per-resource override in the spec:
This gives you:
This pattern is common in Kubernetes (e.g., how Pod SecurityContext works with PodSecurityPolicies/Standards) and provides the best of both worlds. Given this is a new feature, it would also allow you to start with a simple global default and let real-world usage inform whether per-resource configuration is frequently needed. What do you think? Would this hybrid approach work for your use case? |
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (5)
config/crd/bases/metal.ironcore.dev_biossettingssets.yaml (1)
73-78: Same missingminimum: 0constraint as other CRDs — will auto-fix after the types.go marker correction.This is the downstream effect of the missing
+prefix in the kubebuilder marker in the corresponding*_types.gofile. No action needed on this generated file directly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/crd/bases/metal.ironcore.dev_biossettingssets.yaml` around lines 73 - 78, The CRD is missing the `minimum: 0` constraint because the kubebuilder marker in the corresponding types file lacks the required '+' prefix; locate the field FailedAutoRetryCount in the relevant *_types.go (e.g., the BIOSSettingsSetSpec struct) and change the comment marker to use the plus form (// +kubebuilder:validation:Minimum=0) for that field, then re-run the CRD generation (controller-gen / your project generate task) so the generated metal.ironcore.dev_biossettingssets.yaml includes minimum: 0.config/crd/bases/metal.ironcore.dev_bmcsettings.yaml (1)
70-75: Same missingminimum: 0constraint — downstream of the types.go marker bug.Will auto-fix after correcting the
+prefix inapi/v1alpha1/bmcsettings_types.goand re-runningmake manifests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/crd/bases/metal.ironcore.dev_bmcsettings.yaml` around lines 70 - 75, The CRD is missing the explicit `minimum: 0` for the failedAutoRetryCount because the kubebuilder validation marker in the type definition is malformed; open api/v1alpha1/bmcsettings_types.go and fix the validation comment for the FailedAutoRetryCount field to use the correct kubebuilder marker (e.g. ensure the line reads `// +kubebuilder:validation:Minimum=0` with the leading plus), then re-run `make manifests` to regenerate config/crd/.../metal.ironcore.dev_bmcsettings.yaml so the failedAutoRetryCount entry contains `minimum: 0`.internal/controller/biossettings_controller.go (1)
1073-1101:⚠️ Potential issue | 🟠 MajorReset FlowState on auto-retry to avoid stale progress.
Auto-retry sets the state back to Pending but leaves previous FlowState entries, which can skip re-application steps or retain stale conditions. Clear FlowState when scheduling auto-retry (mirroring the manual retry path).
🧹 Proposed fix
biosSettingsBase := settings.DeepCopy() settings.Status.State = metalv1alpha1.BIOSSettingsStatePending + settings.Status.FlowState = []metalv1alpha1.BIOSSettingsFlowStatus{} retryCondition, err := GetCondition(r.Conditions, settings.Status.Conditions, RetryOfFailedResourceConditionIssued)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biossettings_controller.go` around lines 1073 - 1101, The auto-retry branch resets State to metalv1alpha1.BIOSSettingsStatePending but doesn't clear prior FlowState entries, risking stale progress; when preparing for auto-retry in the block that checks retryCount and manipulates settings (e.g., where biosSettingsBase := settings.DeepCopy(), settings.Status.State = metalv1alpha1.BIOSSettingsStatePending, and AutoRetryCountRemaining is decremented), also clear the in-memory FlowState (e.g., set settings.Status.FlowState = nil or empty slice) before calling r.Status().Patch so the retry mirrors the manual retry path and removes any previous flow progress/conditions.internal/controller/bmcsettings_controller.go (2)
451-468: Misleading error message at Line 456.The error message still reads
"failed to get Condition for skip reboot post setting update"but this code path handlesInvalidBMCSettingsErrorfromCheckBMCAttributes. The message should reflect the actual context (invalid settings validation), not "skip reboot."The original wrong-variable issue is partially addressed by using
errors.Join(err, errCond), which now includes both errors.Proposed fix
if errCond != nil { - return ctrl.Result{}, fmt.Errorf("failed to get Condition for skip reboot post setting update %v", errors.Join(err, errCond)) + return ctrl.Result{}, fmt.Errorf("failed to get Condition for invalid BMC settings: %w", errors.Join(err, errCond)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller.go` around lines 451 - 468, The error message when errors.As detects an *common.InvalidBMCSettingsError is misleading; update the fmt.Errorf call that wraps errCond in the GetCondition failure branch so the message reflects invalid settings validation (e.g., "failed to get Condition for invalid BMC settings validation") instead of "skip reboot post setting update"; locate the block handling InvalidBMCSettingsError (uses variable invalidSettingsErr, calls GetCondition(r.Conditions, bmcSetting.Status.Conditions, BMCSettingsConditionWrongSettings), then r.Conditions.Update and r.updateBMCSettingsStatus) and change only the descriptive string passed to fmt.Errorf while preserving the errors.Join(err, errCond) behavior.
586-604: Local variableBMCPoweredOffConditionshadows the package-level constant.At both Lines 586 and 604, the local variable
BMCPoweredOffConditionhas the same name as the package-level constantBMCPoweredOffCondition(Line 47). While Go resolves this correctly, it can confuse readers and tooling. Rename to e.g.bmcPoweredOffCond.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller.go` around lines 586 - 604, The local variable named BMCPoweredOffCondition shadows the package-level constant BMCPoweredOffCondition; rename the local variable (e.g., to bmcPoweredOffCond) at both places where you call GetCondition and receive the result, and update all subsequent uses in that scope (the call to r.Conditions.Update and the call to r.updateBMCSettingsStatus, plus the later GetCondition usage in the default branch) to the new local name so the constant is no longer shadowed.
🧹 Nitpick comments (11)
bmc/mockup.go (1)
34-59: Optional: extract the shared task prefix to eliminate duplication.Both helpers repeat the same 5-step preamble (New→Pending→Starting→Running(10)→Running(20)). Extracting it would make the terminal conditions of each sequence more obvious at a glance.
♻️ Suggested refactor
+// upgradeTaskPrefix returns the common leading task states shared by all upgrade sequences. +func upgradeTaskPrefix() []schemas.Task { + return []schemas.Task{ + {TaskState: schemas.NewTaskState, PercentComplete: gofish.ToRef(uint(0))}, + {TaskState: schemas.PendingTaskState, PercentComplete: gofish.ToRef(uint(0))}, + {TaskState: schemas.StartingTaskState, PercentComplete: gofish.ToRef(uint(0))}, + {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(10))}, + {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(20))}, + } +} + // defaultUpgradeFailedTaskStatus returns a canonical failed upgrade task // sequence ending in ExceptionTaskState at 99 % completion. func defaultUpgradeFailedTaskStatus() []schemas.Task { - return []schemas.Task{ - {TaskState: schemas.NewTaskState, PercentComplete: gofish.ToRef(uint(0))}, - {TaskState: schemas.PendingTaskState, PercentComplete: gofish.ToRef(uint(0))}, - {TaskState: schemas.StartingTaskState, PercentComplete: gofish.ToRef(uint(0))}, - {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(10))}, - {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(20))}, - {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(98))}, - {TaskState: schemas.ExceptionTaskState, PercentComplete: gofish.ToRef(uint(99))}, - } + return append(upgradeTaskPrefix(), + schemas.Task{TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(98))}, + schemas.Task{TaskState: schemas.ExceptionTaskState, PercentComplete: gofish.ToRef(uint(99))}, + ) } // defaultUpgradeTaskStatus returns a canonical upgrade task sequence ending in CompletedTaskState at 100 % completion. func defaultUpgradeTaskStatus() []schemas.Task { - return []schemas.Task{ - {TaskState: schemas.NewTaskState, PercentComplete: gofish.ToRef(uint(0))}, - {TaskState: schemas.PendingTaskState, PercentComplete: gofish.ToRef(uint(0))}, - {TaskState: schemas.StartingTaskState, PercentComplete: gofish.ToRef(uint(0))}, - {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(10))}, - {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(20))}, - {TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(100))}, - {TaskState: schemas.CompletedTaskState, PercentComplete: gofish.ToRef(uint(100))}, - } + return append(upgradeTaskPrefix(), + schemas.Task{TaskState: schemas.RunningTaskState, PercentComplete: gofish.ToRef(uint(100))}, + schemas.Task{TaskState: schemas.CompletedTaskState, PercentComplete: gofish.ToRef(uint(100))}, + ) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bmc/mockup.go` around lines 34 - 59, Extract the common 5-step preamble used by defaultUpgradeFailedTaskStatus and defaultUpgradeTaskStatus into a helper (e.g., upgradeTaskPreamble or makeUpgradePreamble) that returns []schemas.Task for New→Pending→Starting→Running(10)→Running(20), then have defaultUpgradeFailedTaskStatus and defaultUpgradeTaskStatus call that helper and append their differing terminal steps (Exception/99 and Completed/100 respectively); update both functions to build their return slices by concatenating the preamble slice with the terminal entries to remove duplication and make the terminal conditions explicit.bmc/common/helpers.go (1)
29-29: Stale TODO —EnumerationAttributeTypeis already handled.The comment "add more types like maps and Enumerations" mentions Enumerations, but the
EnumerationAttributeTypecase is fully implemented in this same function. Consider narrowing the TODO to only the missing type(s) (e.g., maps/objects).🧹 Suggested update
- // TODO: add more types like maps and Enumerations + // TODO: add support for Map/Object attribute types🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bmc/common/helpers.go` at line 29, The TODO comment in bmc/common/helpers.go incorrectly mentions Enumerations even though EnumerationAttributeType is already implemented in the function (look for the EnumerationAttributeType case), so update the comment to reflect only the truly missing types (e.g., "add support for maps/objects") or remove the TODO entirely; ensure you keep or adjust any nearby comment to reference the actual unhandled types rather than Enumerations.internal/controller/bmcversionset_controller_test.go (1)
604-613: Assertions on retried BMCVersions use inconsistent matchers forAutoRetryCountRemaining.Line 607 asserts
AutoRetryCountRemainingisBeNil()for bmcVersion02, while line 612 assertsNot(Equal(GetPtr(int32(0))))for bmcVersion03. These check different properties (nil vs non-zero) for what should be the same logical state (resource was retried). If the intent is to verify the resource left the exhausted-failed state, consider using the same matcher for both to avoid brittleness and improve readability.Also,
.MustPassRepeatedly(1)on line 605 is the default and has no effect — it can be removed for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcversionset_controller_test.go` around lines 604 - 613, The two assertions for AutoRetryCountRemaining are inconsistent: change the bmcVersion02 check to match bmcVersion03 (verify AutoRetryCountRemaining is non-nil) — e.g., use HaveField("Status.AutoRetryCountRemaining", Not(BeNil())) for both bmcVersion02 and bmcVersion03 (or alternatively use Not(Equal(GetPtr(int32(0)))) for both) so they assert the same retried state; also remove the redundant .MustPassRepeatedly(1) on the Eventually(Object(bmcVersion02)) call to simplify the test.internal/controller/biossettingsset_controller_test.go (1)
500-510: Asymmetric intermediate-state assertions may be timing-sensitive.Lines 501-504 assert
AutoRetryCountRemainingisBeNil()forbiosSettings02, while lines 507-510 assertNot(Equal(GetPtr(int32(0))))forbiosSettings03. These capture different intermediate reconciliation snapshots. If the controller reconciles both children in a single pass (or very quickly),biosSettings02might never be observed with a nil counter, making this flaky.Consider using a consistent assertion for both (e.g., just checking state is no longer
Failed) if the nil intermediate state isn't guaranteed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biossettingsset_controller_test.go` around lines 500 - 510, The test contains asymmetric, timing-sensitive assertions for biosSettings02 and biosSettings03: biosSettings02 checks Status.AutoRetryCountRemaining is BeNil() while biosSettings03 asserts Not(Equal(GetPtr(int32(0)))), which can be flaky if reconciliations are fast; make the assertions consistent by either removing the nil check and asserting only that Status.State is not metalv1alpha1.BIOSSettingsStateFailed for both biosSettings02 and biosSettings03, or assert the same AutoRetryCountRemaining expectation for both (e.g., Not(Equal(GetPtr(int32(0))))), referencing the Status.State and Status.AutoRetryCountRemaining fields on biosSettings02 and biosSettings03 in the Eventually blocks.internal/controller/biosversionset_controller_test.go (1)
494-504: Same asymmetric intermediate-state assertion pattern as BIOSSettingsSet test.Same concern as noted in
biossettingsset_controller_test.go:biosVersion02checksBeNil()whilebiosVersion03checksNot(Equal(GetPtr(int32(0))))forAutoRetryCountRemaining. This relies on observing different reconciliation timing for the two children.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biosversionset_controller_test.go` around lines 494 - 504, The test uses asymmetric assertions for AutoRetryCountRemaining on biosVersion02 (BeNil()) and biosVersion03 (Not(Equal(GetPtr(int32(0))))), which relies on timing; make the checks consistent so they don't depend on reconciliation ordering by changing the biosVersion02 assertion to mirror biosVersion03: assert HaveField("Status.AutoRetryCountRemaining", Not(Equal(GetPtr(int32(0))))) while keeping the State check (Not(Equal(metalv1alpha1.BIOSVersionStateFailed))) for both, targeting the biosVersion02 and biosVersion03 objects and the AutoRetryCountRemaining field.internal/controller/bmcsettingsset_controller_test.go (1)
581-585: Avoid microsecond polling to reduce test flakiness.1µs polling can busy-spin and be flaky under CI load; consider a millisecond-scale poll or default polling.
♻️ Suggested adjustment
- Eventually(Object(bmcSettings01)).WithPolling(1 * time.Microsecond).MustPassRepeatedly(1).Should(SatisfyAll( + Eventually(Object(bmcSettings01)).WithPolling(10 * time.Millisecond).MustPassRepeatedly(1).Should(SatisfyAll(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettingsset_controller_test.go` around lines 581 - 585, The test is using a 1µs polling interval which is too aggressive; update the Eventually call that wraps Object(bmcSettings01) to use a millisecond-scale poll or default polling instead of 1 * time.Microsecond — e.g., replace WithPolling(1 * time.Microsecond) with WithPolling(1 * time.Millisecond) or remove WithPolling(...) entirely so Eventually uses its default interval; keep the rest of the assertion (HaveField checks on Status.State and Status.AutoRetryCountRemaining) unchanged.cmd/main.go (1)
56-61: Confirm lint/test run for Go changes.Please confirm
make lint-fixandmake testwere executed.As per coding guidelines:
**/*.go: Runmake lint-fixandmake testafter editing Go source files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/main.go` around lines 56 - 61, Ensure you ran the Go lint and tests after editing cmd/main.go's main function: execute `make lint-fix` to apply linter fixes and then `make test` to run the test suite, verify both complete without errors, and commit any auto-fixed changes before pushing.internal/controller/bmcsettings_controller.go (1)
704-777: Retry logic is duplicated nearly verbatim across four controllers.The
handleFailedStatebody here is essentially identical tohandleFailedStateinbmcversion_controller.go,processFailedStateinbiosversion_controller.go, and presumablybiossettings_controller.go. The annotation-based retry block (~25 lines) and the auto-retry block (~30 lines) are copy-pasted with only type names changed.Consider extracting a generic retry helper, e.g.:
func handleRetry[T client.Object]( ctx context.Context, c client.Client, conditions *conditionutils.Accessor, obj T, defaultRetryCount int32, getSpec func(T) *int32, getStatus func(T) (*int32, []metav1.Condition), setStatus func(T, string, *int32, []metav1.Condition), ) (retried bool, err error)This would reduce ~55 lines of duplicated code per controller to a single function call, making the retry policy easier to reason about and maintain.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller.go` around lines 704 - 777, The handleFailedState implementations duplicate annotation-based retry and auto-retry logic across controllers; extract that logic into a single generic helper (e.g., handleRetry[T client.Object]) that takes ctx, client.Client (or the reconciler), *conditionutils.Accessor (r.Conditions), the resource object (bmcSetting), the default retry count (r.DefaultFailedAutoRetryCount), and small accessors for spec/status mutation (getSpec/getStatus/setStatus or closure equivalents) so the helper can: check the OperationAnnotation retry path, update the RetryOfFailedResourceConditionIssued condition, compute/adjust AutoRetryCountRemaining, set State to BMCSettingsStatePending (or a passed-in pending state string), and perform the Status().Patch via MergeFrom; then replace the duplicated blocks in BMCSettingsReconciler.handleFailedState (and the counterparts in bmcversion_controller.go, biosversion_controller.go, biossettings_controller.go) with a single call to handleRetry and handle returned (retried,bool) and error values.internal/controller/biosversion_controller.go (2)
507-507: Verbose log value: logging the entirebiosVersionobject.Line 507 logs
"BIOSVersion", biosVersionwhich serializes the entire resource (spec, status, metadata) into the log line. Consider logging just the name:- log.V(1).Info("Failed to upgrade BIOSVersion", "BIOSVersion", biosVersion, "Server", server.Name) + log.V(1).Info("Failed to upgrade BIOSVersion", "BIOSVersion", biosVersion.Name, "Server", server.Name)The same pattern exists in
bmcsettings_controller.goLine 779 ("bmcSetting", bmcSetting). As per coding guidelines, follow Kubernetes logging conventions with key-value pairs that are concise.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biosversion_controller.go` at line 507, The log is serializing the entire resource by passing the full biosVersion object; change the log call in biosversion_controller (the line using log.V(1).Info("Failed to upgrade BIOSVersion", "BIOSVersion", biosVersion, "Server", server.Name)) to log a concise identifier instead (e.g., use biosVersion.Name or biosVersion.GetName()) so only the resource name is logged; apply the same change pattern for the bmcSetting log in bmcsettings_controller (replace "bmcSetting", bmcSetting with the bmcSetting name/identifier).
436-509: Same retry logic duplication as other controllers.
processFailedStateis the same retry pattern with only type names changed. See the refactoring suggestion onbmcsettings_controller.go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biosversion_controller.go` around lines 436 - 509, processFailedState duplicates the common "retry failed reconciliation" logic; extract that flow into a shared helper and call it from BIOSVersionReconciler.processFailedState. Create a reusable function (e.g., HandleFailedResourceRetry or handleFailedResourceRetry) that accepts context, the reconciler's Conditions helper, the resource object (BIOSVersion), its Status/Spec accessors or an interface to read/write State, AutoRetryCountRemaining and FailedAutoRetryCount, the DefaultFailedAutoRetryCount, and the RetryOfFailedResourceConditionIssued constant and operation annotation key; move the annotation-based retry block, the default/explicit retryCount logic, updating of Retry condition via r.Conditions.Update, mutation of Status.Conditions, adjustment of AutoRetryCountRemaining, and the Status().Patch behavior into that helper and return (bool, error) so processFailedState can simply forward the call (and only keep any BIOSVersion-specific logging). Update processFailedState to call this helper and remove the duplicated logic (keep references to GetCondition, RetryOfFailedResourceConditionIssued, BIOSVersionStatePending, and r.Status().Patch as needed).internal/controller/bmcversion_controller.go (1)
406-483: Same retry logic duplication as noted inbmcsettings_controller.go.This
handleFailedStateis nearly line-for-line identical to the one in the BMCSettings controller (and the BIOSVersion/BIOSSettings equivalents). See the refactoring suggestion onbmcsettings_controller.go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcversion_controller.go` around lines 406 - 483, The handleFailedState method in BMCVersionReconciler duplicates retry logic found in other controllers; extract the shared retry flow into a reusable helper (e.g., RetryFailedResource or handleFailedStateShared) that accepts the reconciler's Conditions, Status client, the resource object (with fields/methods used: Status.AutoRetryCountRemaining, Spec.FailedAutoRetryCount, Name, GetAnnotations, DeepCopy, Status.Conditions), DefaultFailedAutoRetryCount and the RetryOfFailedResourceCondition constant, then have BMCVersionReconciler.handleFailedState call that helper; the helper should perform the GetCondition/Conditions.Update logic, adjust AutoRetryCountRemaining, set State to Pending, and call r.Status().Patch (using the passed status client/mergeFrom base) so you can remove the duplicated code from BMCVersionReconciler and reuse it from BMCSettings/BIOSVersion/BIOSSettings controllers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/v1alpha1/biossettings_types.go`:
- Around line 20-23: The kubebuilder validation marker for the minimum
constraint is missing the required '+' prefix causing the Minimum=0 marker on
FailedAutoRetryCount in BIOSSettings (symbol: FailedAutoRetryCount) to be
ignored and likewise in biosversion_types.go; update the comments to add the '+'
prefix (e.g. change "// kubebuilder:validation:Minimum=0" to "//
+kubebuilder:validation:Minimum=0") for the affected fields
(FailedAutoRetryCount and the analogous field in biosversion_types.go), then run
"make manifests" and "make generate" to regenerate the CRD and code.
In `@api/v1alpha1/biosversion_types.go`:
- Around line 48-51: The kubebuilder marker for the FailedAutoRetryCount field
is missing the required '+' prefix so controller-gen won't pick up the
validation; update the comment above the FailedAutoRetryCount field (the struct
tag/field named FailedAutoRetryCount in biosversion_types.go) to use "//
+kubebuilder:validation:Minimum=0" instead of "//
kubebuilder:validation:Minimum=0" so the Minimum=0 validation is recognized.
In `@api/v1alpha1/bmcsettings_types.go`:
- Around line 22-25: The kubebuilder validation markers are missing the required
'+' prefix in several type files, so change the comment lines from "//
kubebuilder:validation:Minimum=0" to "// +kubebuilder:validation:Minimum=0" in
the four affected type definitions (update bmcsettings_types.go,
bmcversion_types.go, biosversion_types.go, biossettings_types.go) so the
Minimum=0 constraint is picked up for the corresponding retry/count fields
(e.g., FailedAutoRetryCount and the equivalent fields in the other types); after
editing those markers run make manifests and make generate to regenerate CRDs
and deepcopy files.
In `@bmc/redfish_local.go`:
- Around line 131-139: Rename the misspelled local variable taskSatus to
taskStatus in this function: change the declaration/assignment from "taskSatus
:= UnitTestMockUps.BIOSUpgradeTaskStatus", update the conditional that assigns
BIOSUpgradeTaskFailedStatus when BIOSUpgradingVersion contains "fail", and
update the subsequent uses in the bounds check (len(taskStatus)) and the task
pointer creation (&taskStatus[index]); ensure all occurrences in this snippet
match the correct spelling and compile.
In `@cmd/main.go`:
- Around line 95-96: Add validation after flag.Parse() to reject negative values
for defaultFailedAutoRetryCount: check the parsed defaultFailedAutoRetryCount
variable and if it's less than 0 return an error or log a fatal message and
exit, explaining that negative values are invalid; update any startup path that
uses defaultFailedAutoRetryCount to rely on this validation so controllers
BIOSSettings, BIOSVersion, BMCSettings and BMCVersion no longer silently disable
retries when given negative input.
In `@config/crd/bases/metal.ironcore.dev_bmcversions.yaml`:
- Around line 87-92: The kubebuilder markers for the FailedAutoRetryCount fields
are missing the required '+' prefix and are being rendered as plain text in
generated CRDs; update the comment on each FailedAutoRetryCount field in the
corresponding type definitions (BMCVersion, BMCSettings, BIOSSettings,
BIOSVersion) to change "// kubebuilder:validation:Minimum=0" to "//
+kubebuilder:validation:Minimum=0" so the marker is recognized and converted
into a proper schema constraint during CRD generation.
---
Duplicate comments:
In `@config/crd/bases/metal.ironcore.dev_biossettingssets.yaml`:
- Around line 73-78: The CRD is missing the `minimum: 0` constraint because the
kubebuilder marker in the corresponding types file lacks the required '+'
prefix; locate the field FailedAutoRetryCount in the relevant *_types.go (e.g.,
the BIOSSettingsSetSpec struct) and change the comment marker to use the plus
form (// +kubebuilder:validation:Minimum=0) for that field, then re-run the CRD
generation (controller-gen / your project generate task) so the generated
metal.ironcore.dev_biossettingssets.yaml includes minimum: 0.
In `@config/crd/bases/metal.ironcore.dev_bmcsettings.yaml`:
- Around line 70-75: The CRD is missing the explicit `minimum: 0` for the
failedAutoRetryCount because the kubebuilder validation marker in the type
definition is malformed; open api/v1alpha1/bmcsettings_types.go and fix the
validation comment for the FailedAutoRetryCount field to use the correct
kubebuilder marker (e.g. ensure the line reads `//
+kubebuilder:validation:Minimum=0` with the leading plus), then re-run `make
manifests` to regenerate config/crd/.../metal.ironcore.dev_bmcsettings.yaml so
the failedAutoRetryCount entry contains `minimum: 0`.
In `@internal/controller/biossettings_controller.go`:
- Around line 1073-1101: The auto-retry branch resets State to
metalv1alpha1.BIOSSettingsStatePending but doesn't clear prior FlowState
entries, risking stale progress; when preparing for auto-retry in the block that
checks retryCount and manipulates settings (e.g., where biosSettingsBase :=
settings.DeepCopy(), settings.Status.State =
metalv1alpha1.BIOSSettingsStatePending, and AutoRetryCountRemaining is
decremented), also clear the in-memory FlowState (e.g., set
settings.Status.FlowState = nil or empty slice) before calling r.Status().Patch
so the retry mirrors the manual retry path and removes any previous flow
progress/conditions.
In `@internal/controller/bmcsettings_controller.go`:
- Around line 451-468: The error message when errors.As detects an
*common.InvalidBMCSettingsError is misleading; update the fmt.Errorf call that
wraps errCond in the GetCondition failure branch so the message reflects invalid
settings validation (e.g., "failed to get Condition for invalid BMC settings
validation") instead of "skip reboot post setting update"; locate the block
handling InvalidBMCSettingsError (uses variable invalidSettingsErr, calls
GetCondition(r.Conditions, bmcSetting.Status.Conditions,
BMCSettingsConditionWrongSettings), then r.Conditions.Update and
r.updateBMCSettingsStatus) and change only the descriptive string passed to
fmt.Errorf while preserving the errors.Join(err, errCond) behavior.
- Around line 586-604: The local variable named BMCPoweredOffCondition shadows
the package-level constant BMCPoweredOffCondition; rename the local variable
(e.g., to bmcPoweredOffCond) at both places where you call GetCondition and
receive the result, and update all subsequent uses in that scope (the call to
r.Conditions.Update and the call to r.updateBMCSettingsStatus, plus the later
GetCondition usage in the default branch) to the new local name so the constant
is no longer shadowed.
---
Nitpick comments:
In `@bmc/common/helpers.go`:
- Line 29: The TODO comment in bmc/common/helpers.go incorrectly mentions
Enumerations even though EnumerationAttributeType is already implemented in the
function (look for the EnumerationAttributeType case), so update the comment to
reflect only the truly missing types (e.g., "add support for maps/objects") or
remove the TODO entirely; ensure you keep or adjust any nearby comment to
reference the actual unhandled types rather than Enumerations.
In `@bmc/mockup.go`:
- Around line 34-59: Extract the common 5-step preamble used by
defaultUpgradeFailedTaskStatus and defaultUpgradeTaskStatus into a helper (e.g.,
upgradeTaskPreamble or makeUpgradePreamble) that returns []schemas.Task for
New→Pending→Starting→Running(10)→Running(20), then have
defaultUpgradeFailedTaskStatus and defaultUpgradeTaskStatus call that helper and
append their differing terminal steps (Exception/99 and Completed/100
respectively); update both functions to build their return slices by
concatenating the preamble slice with the terminal entries to remove duplication
and make the terminal conditions explicit.
In `@cmd/main.go`:
- Around line 56-61: Ensure you ran the Go lint and tests after editing
cmd/main.go's main function: execute `make lint-fix` to apply linter fixes and
then `make test` to run the test suite, verify both complete without errors, and
commit any auto-fixed changes before pushing.
In `@internal/controller/biossettingsset_controller_test.go`:
- Around line 500-510: The test contains asymmetric, timing-sensitive assertions
for biosSettings02 and biosSettings03: biosSettings02 checks
Status.AutoRetryCountRemaining is BeNil() while biosSettings03 asserts
Not(Equal(GetPtr(int32(0)))), which can be flaky if reconciliations are fast;
make the assertions consistent by either removing the nil check and asserting
only that Status.State is not metalv1alpha1.BIOSSettingsStateFailed for both
biosSettings02 and biosSettings03, or assert the same AutoRetryCountRemaining
expectation for both (e.g., Not(Equal(GetPtr(int32(0))))), referencing the
Status.State and Status.AutoRetryCountRemaining fields on biosSettings02 and
biosSettings03 in the Eventually blocks.
In `@internal/controller/biosversion_controller.go`:
- Line 507: The log is serializing the entire resource by passing the full
biosVersion object; change the log call in biosversion_controller (the line
using log.V(1).Info("Failed to upgrade BIOSVersion", "BIOSVersion", biosVersion,
"Server", server.Name)) to log a concise identifier instead (e.g., use
biosVersion.Name or biosVersion.GetName()) so only the resource name is logged;
apply the same change pattern for the bmcSetting log in bmcsettings_controller
(replace "bmcSetting", bmcSetting with the bmcSetting name/identifier).
- Around line 436-509: processFailedState duplicates the common "retry failed
reconciliation" logic; extract that flow into a shared helper and call it from
BIOSVersionReconciler.processFailedState. Create a reusable function (e.g.,
HandleFailedResourceRetry or handleFailedResourceRetry) that accepts context,
the reconciler's Conditions helper, the resource object (BIOSVersion), its
Status/Spec accessors or an interface to read/write State,
AutoRetryCountRemaining and FailedAutoRetryCount, the
DefaultFailedAutoRetryCount, and the RetryOfFailedResourceConditionIssued
constant and operation annotation key; move the annotation-based retry block,
the default/explicit retryCount logic, updating of Retry condition via
r.Conditions.Update, mutation of Status.Conditions, adjustment of
AutoRetryCountRemaining, and the Status().Patch behavior into that helper and
return (bool, error) so processFailedState can simply forward the call (and only
keep any BIOSVersion-specific logging). Update processFailedState to call this
helper and remove the duplicated logic (keep references to GetCondition,
RetryOfFailedResourceConditionIssued, BIOSVersionStatePending, and
r.Status().Patch as needed).
In `@internal/controller/biosversionset_controller_test.go`:
- Around line 494-504: The test uses asymmetric assertions for
AutoRetryCountRemaining on biosVersion02 (BeNil()) and biosVersion03
(Not(Equal(GetPtr(int32(0))))), which relies on timing; make the checks
consistent so they don't depend on reconciliation ordering by changing the
biosVersion02 assertion to mirror biosVersion03: assert
HaveField("Status.AutoRetryCountRemaining", Not(Equal(GetPtr(int32(0))))) while
keeping the State check (Not(Equal(metalv1alpha1.BIOSVersionStateFailed))) for
both, targeting the biosVersion02 and biosVersion03 objects and the
AutoRetryCountRemaining field.
In `@internal/controller/bmcsettings_controller.go`:
- Around line 704-777: The handleFailedState implementations duplicate
annotation-based retry and auto-retry logic across controllers; extract that
logic into a single generic helper (e.g., handleRetry[T client.Object]) that
takes ctx, client.Client (or the reconciler), *conditionutils.Accessor
(r.Conditions), the resource object (bmcSetting), the default retry count
(r.DefaultFailedAutoRetryCount), and small accessors for spec/status mutation
(getSpec/getStatus/setStatus or closure equivalents) so the helper can: check
the OperationAnnotation retry path, update the
RetryOfFailedResourceConditionIssued condition, compute/adjust
AutoRetryCountRemaining, set State to BMCSettingsStatePending (or a passed-in
pending state string), and perform the Status().Patch via MergeFrom; then
replace the duplicated blocks in BMCSettingsReconciler.handleFailedState (and
the counterparts in bmcversion_controller.go, biosversion_controller.go,
biossettings_controller.go) with a single call to handleRetry and handle
returned (retried,bool) and error values.
In `@internal/controller/bmcsettingsset_controller_test.go`:
- Around line 581-585: The test is using a 1µs polling interval which is too
aggressive; update the Eventually call that wraps Object(bmcSettings01) to use a
millisecond-scale poll or default polling instead of 1 * time.Microsecond —
e.g., replace WithPolling(1 * time.Microsecond) with WithPolling(1 *
time.Millisecond) or remove WithPolling(...) entirely so Eventually uses its
default interval; keep the rest of the assertion (HaveField checks on
Status.State and Status.AutoRetryCountRemaining) unchanged.
In `@internal/controller/bmcversion_controller.go`:
- Around line 406-483: The handleFailedState method in BMCVersionReconciler
duplicates retry logic found in other controllers; extract the shared retry flow
into a reusable helper (e.g., RetryFailedResource or handleFailedStateShared)
that accepts the reconciler's Conditions, Status client, the resource object
(with fields/methods used: Status.AutoRetryCountRemaining,
Spec.FailedAutoRetryCount, Name, GetAnnotations, DeepCopy, Status.Conditions),
DefaultFailedAutoRetryCount and the RetryOfFailedResourceCondition constant,
then have BMCVersionReconciler.handleFailedState call that helper; the helper
should perform the GetCondition/Conditions.Update logic, adjust
AutoRetryCountRemaining, set State to Pending, and call r.Status().Patch (using
the passed status client/mergeFrom base) so you can remove the duplicated code
from BMCVersionReconciler and reuse it from BMCSettings/BIOSVersion/BIOSSettings
controllers.
In `@internal/controller/bmcversionset_controller_test.go`:
- Around line 604-613: The two assertions for AutoRetryCountRemaining are
inconsistent: change the bmcVersion02 check to match bmcVersion03 (verify
AutoRetryCountRemaining is non-nil) — e.g., use
HaveField("Status.AutoRetryCountRemaining", Not(BeNil())) for both bmcVersion02
and bmcVersion03 (or alternatively use Not(Equal(GetPtr(int32(0)))) for both) so
they assert the same retried state; also remove the redundant
.MustPassRepeatedly(1) on the Eventually(Object(bmcVersion02)) call to simplify
the test.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (8)
dist/chart/templates/crd/metal.ironcore.dev_biossettings.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biossettingssets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biosversions.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biosversionsets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcsettings.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcsettingssets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcversions.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcversionsets.yamlis excluded by!**/dist/**
📒 Files selected for processing (35)
api/v1alpha1/biossettings_types.goapi/v1alpha1/biosversion_types.goapi/v1alpha1/bmcsettings_types.goapi/v1alpha1/bmcversion_types.gobmc/common/helpers.gobmc/mockup.gobmc/oem/dell.gobmc/redfish_kube.gobmc/redfish_local.gocmd/main.goconfig/crd/bases/metal.ironcore.dev_biossettings.yamlconfig/crd/bases/metal.ironcore.dev_biossettingssets.yamlconfig/crd/bases/metal.ironcore.dev_biosversions.yamlconfig/crd/bases/metal.ironcore.dev_biosversionsets.yamlconfig/crd/bases/metal.ironcore.dev_bmcsettings.yamlconfig/crd/bases/metal.ironcore.dev_bmcsettingssets.yamlconfig/crd/bases/metal.ironcore.dev_bmcversions.yamlconfig/crd/bases/metal.ironcore.dev_bmcversionsets.yamldocs/api-reference/api.mdinternal/controller/biossettings_controller.gointernal/controller/biossettings_controller_test.gointernal/controller/biossettingsset_controller.gointernal/controller/biossettingsset_controller_test.gointernal/controller/biosversion_controller.gointernal/controller/biosversion_controller_test.gointernal/controller/biosversionset_controller.gointernal/controller/biosversionset_controller_test.gointernal/controller/bmcsettings_controller.gointernal/controller/bmcsettings_controller_test.gointernal/controller/bmcsettingsset_controller.gointernal/controller/bmcsettingsset_controller_test.gointernal/controller/bmcversion_controller.gointernal/controller/bmcversion_controller_test.gointernal/controller/bmcversionset_controller.gointernal/controller/bmcversionset_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- api/v1alpha1/bmcversion_types.go
- config/crd/bases/metal.ironcore.dev_bmcversionsets.yaml
- config/crd/bases/metal.ironcore.dev_bmcsettingssets.yaml
- config/crd/bases/metal.ironcore.dev_biosversions.yaml
- internal/controller/biosversionset_controller.go
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/bmcsettings_controller.go (1)
704-777:⚠️ Potential issue | 🟠 MajorReset AutoRetryCountRemaining on spec changes; ConditionTrue issue is reversed in suggested fix.
Two concerns raised, with one requiring correction:
AutoRetryCountRemaining reset (valid concern): When
Spec.FailedAutoRetryCountchanges,Status.AutoRetryCountRemainingis never reset. The *set_controller patterns correctly handle this (e.g.,bmcsettingsset_controller.go:350):bmcSettings.Status.AutoRetryCountRemaining = bmcSettings.Spec.FailedAutoRetryCount. Add spec-change detection to reset the retry counter inbmcsettings_controller.goto comply with idempotent reconciliation.ConditionTrue type (review comment's suggestion is incorrect): Line 722 uses
metav1.ConditionTruewhile the rest of the controller usescorev1.ConditionTrue. However, the fix direction in the review is backwards. SincebmcSetting.Status.Conditionsis of type[]metav1.Condition, which requiresmetav1.ConditionStatus, line 722 is correct and the rest of the codebase should be updated to usemetav1.ConditionTrue, not the other way around. The 18+ occurrences ofcorev1.ConditionTruein condition updates are the inconsistency that should be corrected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller.go` around lines 704 - 777, handle the missing reset of AutoRetryCountRemaining and normalize condition status types: detect spec changes to FailedAutoRetryCount in handleFailedState (or earlier in reconciliation) and when bmcSetting.Spec.FailedAutoRetryCount differs from the current Status.AutoRetryCountRemaining reset Status.AutoRetryCountRemaining to the spec value (follow the pattern used in bmcsettingsset_controller.go where bmcSettings.Status.AutoRetryCountRemaining = bmcSettings.Spec.FailedAutoRetryCount); also keep metav1.ConditionStatus types consistent by using metav1.ConditionTrue (not corev1.ConditionTrue) when updating or comparing the RetryOfFailedResourceConditionIssued via GetCondition and r.Conditions.Update so all condition updates (e.g., the block around GetCondition/conditionutils.UpdateStatus) use metav1.ConditionTrue across the controller.
♻️ Duplicate comments (3)
internal/controller/bmcversionset_controller_test.go (1)
609-613: Previously flagged bugs (wrong object + wrong type) are now resolved.
bmcVersion03is now the observed object andmetalv1alpha1.BMCVersionStateFailedis the correct named type — both past issues are fixed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcversionset_controller_test.go` around lines 609 - 613, The test already uses the correct observed object (bmcVersion03) and the proper enum type (metalv1alpha1.BMCVersionStateFailed), so no code changes are required; ensure the assertions in Eventually(Object(bmcVersion03)).WithPolling(...) still reference HaveField("Status.State", Not(Equal(metalv1alpha1.BMCVersionStateFailed))) and HaveField("Status.AutoRetryCountRemaining", Not(Equal(GetPtr(int32(0))))) and then mark the change as approved/mergeable.bmc/redfish_local.go (1)
131-139: Previously flaggedtaskSatustypo is now corrected totaskStatus.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bmc/redfish_local.go` around lines 131 - 139, The typo has been corrected: ensure the code consistently uses the variable and struct fields UnitTestMockUps.BIOSUpgradeTaskStatus, UnitTestMockUps.BIOSUpgradingVersion and UnitTestMockUps.BIOSUpgradeTaskFailedStatus and that the logic computing index and taking task := &taskStatus[index] remains correct (guarding index >= len(taskStatus) as shown); no further changes are needed beyond keeping these exact identifiers consistent.internal/controller/bmcsettings_controller.go (1)
576-619: Avoid shadowing theBMCPoweredOffConditionconstant with a local variable.The
BMCPoweredOffCondition, err := GetCondition(..., BMCPoweredOffCondition)pattern is easy to misread and has already been a source of confusion.Possible cleanup
- BMCPoweredOffCondition, err := GetCondition(r.Conditions, bmcSetting.Status.Conditions, BMCPoweredOffCondition) + bmcPoweredOffCond, err := GetCondition(r.Conditions, bmcSetting.Status.Conditions, BMCPoweredOffCondition) if err != nil { return fmt.Errorf("failed to get Condition for powered off BMC state %v", err) } - if BMCPoweredOffCondition.Status == metav1.ConditionTrue { + if bmcPoweredOffCond.Status == metav1.ConditionTrue { if err := r.Conditions.Update( - BMCPoweredOffCondition, + bmcPoweredOffCond, conditionutils.UpdateStatus(corev1.ConditionFalse), conditionutils.UpdateReason("BMCPoweredBackOn"), conditionutils.UpdateMessage(fmt.Sprintf("BMC in Powered On, Power State: %v", BMC.Status.PowerState)), ); err != nil { return fmt.Errorf("failed to update Pending BMCSetting update condition: %w", err) } - err = r.updateBMCSettingsStatus(ctx, bmcSetting, bmcSetting.Status.State, BMCPoweredOffCondition) + err = r.updateBMCSettingsStatus(ctx, bmcSetting, bmcSetting.Status.State, bmcPoweredOffCond) return err }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller.go` around lines 576 - 619, The function handleBMCPowerState shadows the BMCPoweredOffCondition constant by declaring a local variable with the same name; change the local variable to a non-conflicting identifier (e.g., poweredOffCond or bmcpoweredOffCond) in both places where you call GetCondition and use it in subsequent r.Conditions.Update and r.updateBMCSettingsStatus calls so the constant remains unshadowed and all references in handleBMCPowerState use the new local variable name.
🧹 Nitpick comments (2)
internal/controller/bmcsettings_controller_test.go (1)
614-617: Use a less aggressive polling interval in this assertion.Line 617 polls every 1ms, which is unnecessarily tight for envtest and can cause avoidable flakiness. A coarser interval (e.g., 25–50ms) is usually more stable.
Proposed tweak
- }).WithPolling((1 * time.Millisecond)).Should(BeTrue()) + }).WithPolling(50 * time.Millisecond).Should(BeTrue())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller_test.go` around lines 614 - 617, The test's Eventually call uses an overly aggressive polling interval; update the WithPolling argument in the Eventually containing Get(bmcSettings)() and the check on bmcSettings.Status.AutoRetryCountRemaining so it uses a coarser interval (for example 25*time.Millisecond or 50*time.Millisecond) instead of 1*time.Millisecond to reduce flakiness in envtest while keeping the same condition and timeout.internal/controller/bmcsettings_controller.go (1)
57-58: Nice:InvalidBMCSettingsErrorbecomes an actionable condition; consider skipping auto-retry for this failure type.Surfacing “wrong settings” as
BMCSettingsConditionWrongSettingsis great for users. One follow-up: if auto-retry is enabled, deterministic spec errors will still burn through retries (and potentially flap) even though the only fix is changing the spec.Also applies to: 451-468
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller.go` around lines 57 - 58, The controller currently surfaces deterministic spec errors via BMCSettingsConditionWrongSettings; update the reconcile error handling so that when InvalidBMCSettingsError is returned (or detected) you set the BMCSettingsConditionWrongSettings / BMCSettingsReasonWrongSettings condition and mark the result as non-requeue/non-retry (i.e., do not trigger exponential requeue) instead of treating it as a transient error; specifically modify the reconcile/error-handling path that processes errors from functions returning InvalidBMCSettingsError (and any helpers that call setCondition) to return a non-retryable result (requeue=false) while still recording the condition so the user can act.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/main.go`:
- Around line 211-214: Validate that the flag variable
defaultFailedAutoRetryCount is <= math.MaxInt32 before any int32 cast: add an
upper-bound check (<= math.MaxInt32) alongside the existing non-negative check
and return a clear Kubernetes-style error log message starting with a capital
letter (e.g., "Default failed auto retry count cannot exceed MaxInt32") using
setupLog.Error, then os.Exit(1); replace unchecked casts
int32(defaultFailedAutoRetryCount) wherever they occur (references: the int32
casts of defaultFailedAutoRetryCount used later in main.go) with safe conversion
after the new validation so overflow cannot occur.
In `@internal/controller/biossettings_controller.go`:
- Around line 1061-1104: The code fails to reinitialize AutoRetryCountRemaining
after it reaches 0 on later spec updates; modify the reconcile logic in the
BIOSSettings controller (around the block that reads FailedAutoRetryCount,
DefaultFailedAutoRetryCount and manipulates
settings.Status.AutoRetryCountRemaining) to detect a spec update and reset the
retry budget: specifically, when retryCount > 0, treat the case where
settings.Status.AutoRetryCountRemaining != nil &&
*settings.Status.AutoRetryCountRemaining == 0 as eligible for reinitialization
if the resource spec has changed (e.g., compare settings.Generation to
settings.Status.ObservedGeneration or compare spec hash stored on status); if a
spec change is detected set settings.Status.AutoRetryCountRemaining =
retryCount-1 (or retryCount if you prefer consistent decrement logic), update
ObservedGeneration/spec-hash on status, then Patch status via r.Status().Patch
(same MergeFrom flow used currently) so the resource can be retried after a spec
update.
In `@internal/controller/biosversion_controller_test.go`:
- Around line 430-437: The test currently deletes biosVersion before attempting
to list/delete ServerMaintenance, which races with the controller's cascade
deletion; modify the cleanup in biosversion_controller_test.go so you first list
ServerMaintenance (use the serverMaintenanceList variable and
ObjectList(&serverMaintenanceList)), delete each maintenance via
k8sClient.Delete(&maintenance), wait until the list is empty with
Eventually(ObjectList(&serverMaintenanceList)).Should(HaveField("Items",
BeEmpty())), and only then delete biosVersion with Expect(k8sClient.Delete(ctx,
biosVersion)).To(Succeed()); ensure you reuse the existing serverMaintenanceList
and k8sClient symbols and preserve the Eventually checks for emptiness.
In `@internal/controller/bmcsettings_controller_test.go`:
- Around line 636-641: The cleanup currently forces a non-empty
serverMaintenanceList via
Eventually(ObjectList(&serverMaintenanceList)).Should(HaveField("Items",
Not(BeEmpty()))) which makes the test fail when there are no maintenance
objects; change this to wait for the ObjectList to succeed (e.g.,
Eventually(ObjectList(&serverMaintenanceList)).Should(Succeed())) and then
conditionally delete any items only if serverMaintenanceList.Items is non-empty
(use if len(serverMaintenanceList.Items) > 0 { for _, maintenance := range
serverMaintenanceList.Items { Expect(k8sClient.Delete(ctx,
&maintenance)).To(Succeed()) } }). This keeps the cleanup robust when there are
zero maintenance objects.
In `@internal/controller/bmcsettings_controller.go`:
- Around line 438-442: The power-off handling in handleBMCPowerState currently
sets BMCSettingsStateFailed in the default branch but returns nil, so callers
(e.g., the reconcile block that calls r.handleBMCPowerState and then
updateSettingsAndVerify) continue processing and may apply settings despite a
failure; change handleBMCPowerState to return an explicit proceed signal (e.g.,
bool or a custom result type) or an error to indicate whether reconciliation
should continue, update the caller sites (the reconcile path around
handleBMCPowerState and the code that calls updateSettingsAndVerify) to check
that signal and short-circuit (requeue or return) when proceed is false, and
ensure BMCSettingsStateFailed is only set when you stop further processing to
keep reconcile idempotent and avoid applying settings after a failed power-state
transition.
In `@internal/controller/bmcversion_controller_test.go`:
- Around line 430-437: The test currently deletes bmcVersion before asserting
and deleting ServerMaintenance objects which can race with the controller's
cascade deletion; update the test to fetch and delete ServerMaintenance items
prior to deleting bmcVersion (or perform a conditional fetch-and-delete that
tolerates an empty list) by using the existing serverMaintenanceList variable
and ObjectList(&serverMaintenanceList) to collect items and
Expect(k8sClient.Delete(...)) for each maintenance before calling the code that
deletes bmcVersion, or wrap the deletion loop with a conditional that proceeds
only if serverMaintenanceList.Items is non-empty; ensure UpdateStatus(server,
...) and other status resets remain after maintenance cleanup.
In `@internal/controller/bmcversionset_controller_test.go`:
- Around line 679-684: The cleanup assertions are verifying the wrong servers:
the test selector matches bmc02/bmc03 (server02/server03) but the code checks
server01 and server02, leaving server03 unverified; update the assertions to
check server02 and server03 instead (replace the reference to server01 with
server03) so that both servers matched by the label
("metal.ironcore.dev/Manufacturer": "bar") are asserted to have exited
Maintenance (use the existing variables server02 and server03 in the
Eventually(Object(...)).Should(...) checks).
---
Outside diff comments:
In `@internal/controller/bmcsettings_controller.go`:
- Around line 704-777: handle the missing reset of AutoRetryCountRemaining and
normalize condition status types: detect spec changes to FailedAutoRetryCount in
handleFailedState (or earlier in reconciliation) and when
bmcSetting.Spec.FailedAutoRetryCount differs from the current
Status.AutoRetryCountRemaining reset Status.AutoRetryCountRemaining to the spec
value (follow the pattern used in bmcsettingsset_controller.go where
bmcSettings.Status.AutoRetryCountRemaining =
bmcSettings.Spec.FailedAutoRetryCount); also keep metav1.ConditionStatus types
consistent by using metav1.ConditionTrue (not corev1.ConditionTrue) when
updating or comparing the RetryOfFailedResourceConditionIssued via GetCondition
and r.Conditions.Update so all condition updates (e.g., the block around
GetCondition/conditionutils.UpdateStatus) use metav1.ConditionTrue across the
controller.
---
Duplicate comments:
In `@bmc/redfish_local.go`:
- Around line 131-139: The typo has been corrected: ensure the code consistently
uses the variable and struct fields UnitTestMockUps.BIOSUpgradeTaskStatus,
UnitTestMockUps.BIOSUpgradingVersion and
UnitTestMockUps.BIOSUpgradeTaskFailedStatus and that the logic computing index
and taking task := &taskStatus[index] remains correct (guarding index >=
len(taskStatus) as shown); no further changes are needed beyond keeping these
exact identifiers consistent.
In `@internal/controller/bmcsettings_controller.go`:
- Around line 576-619: The function handleBMCPowerState shadows the
BMCPoweredOffCondition constant by declaring a local variable with the same
name; change the local variable to a non-conflicting identifier (e.g.,
poweredOffCond or bmcpoweredOffCond) in both places where you call GetCondition
and use it in subsequent r.Conditions.Update and r.updateBMCSettingsStatus calls
so the constant remains unshadowed and all references in handleBMCPowerState use
the new local variable name.
In `@internal/controller/bmcversionset_controller_test.go`:
- Around line 609-613: The test already uses the correct observed object
(bmcVersion03) and the proper enum type (metalv1alpha1.BMCVersionStateFailed),
so no code changes are required; ensure the assertions in
Eventually(Object(bmcVersion03)).WithPolling(...) still reference
HaveField("Status.State", Not(Equal(metalv1alpha1.BMCVersionStateFailed))) and
HaveField("Status.AutoRetryCountRemaining", Not(Equal(GetPtr(int32(0))))) and
then mark the change as approved/mergeable.
---
Nitpick comments:
In `@internal/controller/bmcsettings_controller_test.go`:
- Around line 614-617: The test's Eventually call uses an overly aggressive
polling interval; update the WithPolling argument in the Eventually containing
Get(bmcSettings)() and the check on bmcSettings.Status.AutoRetryCountRemaining
so it uses a coarser interval (for example 25*time.Millisecond or
50*time.Millisecond) instead of 1*time.Millisecond to reduce flakiness in
envtest while keeping the same condition and timeout.
In `@internal/controller/bmcsettings_controller.go`:
- Around line 57-58: The controller currently surfaces deterministic spec errors
via BMCSettingsConditionWrongSettings; update the reconcile error handling so
that when InvalidBMCSettingsError is returned (or detected) you set the
BMCSettingsConditionWrongSettings / BMCSettingsReasonWrongSettings condition and
mark the result as non-requeue/non-retry (i.e., do not trigger exponential
requeue) instead of treating it as a transient error; specifically modify the
reconcile/error-handling path that processes errors from functions returning
InvalidBMCSettingsError (and any helpers that call setCondition) to return a
non-retryable result (requeue=false) while still recording the condition so the
user can act.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (8)
dist/chart/templates/crd/metal.ironcore.dev_biossettings.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biossettingssets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biosversions.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biosversionsets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcsettings.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcsettingssets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcversions.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcversionsets.yamlis excluded by!**/dist/**
📒 Files selected for processing (25)
api/v1alpha1/biossettings_types.goapi/v1alpha1/biosversion_types.goapi/v1alpha1/bmcsettings_types.goapi/v1alpha1/bmcversion_types.gobmc/common/helpers.gobmc/redfish.gobmc/redfish_local.gocmd/main.goconfig/crd/bases/metal.ironcore.dev_biossettings.yamlconfig/crd/bases/metal.ironcore.dev_biossettingssets.yamlconfig/crd/bases/metal.ironcore.dev_biosversions.yamlconfig/crd/bases/metal.ironcore.dev_biosversionsets.yamlconfig/crd/bases/metal.ironcore.dev_bmcsettings.yamlconfig/crd/bases/metal.ironcore.dev_bmcsettingssets.yamlconfig/crd/bases/metal.ironcore.dev_bmcversions.yamlconfig/crd/bases/metal.ironcore.dev_bmcversionsets.yamldocs/api-reference/api.mdinternal/controller/biossettings_controller.gointernal/controller/biosversion_controller.gointernal/controller/biosversion_controller_test.gointernal/controller/bmcsettings_controller.gointernal/controller/bmcsettings_controller_test.gointernal/controller/bmcsettingsset_controller_test.gointernal/controller/bmcversion_controller_test.gointernal/controller/bmcversionset_controller_test.go
✅ Files skipped from review due to trivial changes (1)
- bmc/redfish.go
🚧 Files skipped from review as they are similar to previous changes (6)
- internal/controller/bmcsettingsset_controller_test.go
- config/crd/bases/metal.ironcore.dev_bmcversions.yaml
- api/v1alpha1/biossettings_types.go
- config/crd/bases/metal.ironcore.dev_bmcsettings.yaml
- config/crd/bases/metal.ironcore.dev_biossettingssets.yaml
- api/v1alpha1/bmcversion_types.go
There was a problem hiding this comment.
Actionable comments posted: 9
♻️ Duplicate comments (1)
internal/controller/bmcversionset_controller_test.go (1)
515-515: Nit: renameretryCount→failedAutoRetryCountfor clarity
retryCountis ambiguous — it could mean the annotation-driven retry count or the auto-retry budget.failedAutoRetryCountaligns with the field nameFailedAutoRetryCountand makes the test self-documenting.♻️ Proposed rename
- retryCount := 2 + failedAutoRetryCount := 2Update all subsequent references accordingly (e.g.,
GetPtr(int32(retryCount))→GetPtr(int32(failedAutoRetryCount))).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcversionset_controller_test.go` at line 515, Rename the local test variable retryCount to failedAutoRetryCount and update all its uses accordingly (e.g., change GetPtr(int32(retryCount)) → GetPtr(int32(failedAutoRetryCount))) so the name aligns with the struct field FailedAutoRetryCount and clarifies this is the auto-retry budget in the test; ensure all occurrences in internal/controller/bmcversionset_controller_test.go that reference retryCount are updated to the new identifier.
🧹 Nitpick comments (1)
internal/controller/bmcversion_controller_test.go (1)
408-411: Relax the polling/assertion windows to reduce flakiness.Line 411 uses
1 * time.Millisecondpolling and Line 423 checks consistency for only25ms; these are too tight for reconciliation timing and can create flaky/weak assertions.Suggested adjustment
- }).WithPolling((1 * time.Millisecond)).Should(BeTrue()) + }).WithPolling(50 * time.Millisecond).Should(BeTrue()) - Consistently(Object(bmcVersion), "25ms").Should(SatisfyAll( + Consistently(Object(bmcVersion), "1s", "50ms").Should(SatisfyAll( HaveField("Status.State", metalv1alpha1.BMCVersionStateFailed), HaveField("Status.AutoRetryCountRemaining", Equal(GetPtr(int32(0)))), ))Also applies to: 423-423
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcversion_controller_test.go` around lines 408 - 411, The test uses too-tight timing in the Eventually polling for bmcVersion.Status.AutoRetryCountRemaining; update the polling and consistency windows to be more forgiving: in the Eventually call that wraps Get(bmcVersion)() change WithPolling(1 * time.Millisecond) to something like WithPolling(50 * time.Millisecond) (or 100ms) and where a 25ms consistency check is used change it to ~200-500ms; keep referencing the same Eventually(Get(bmcVersion)()) check and the bmcVersion.Status.AutoRetryCountRemaining condition so the test waits longer and reduces flakiness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/api-reference/api.md`:
- Line 163: Update the wording for failedAutoRetryCount and similar entries to
use "settings update/apply" instead of "upgrade": locate the
`failedAutoRetryCount` table row and any other BIOSSettings/BMCSettings
descriptions (references noted around the same file, e.g., the entries that
currently say "upgrade") and replace "upgrade" with "settings update/apply" (or
"settings update" where appropriate) so the description reads that the
controller will retry the settings update/apply operation before giving up;
ensure this change is applied consistently for BIOSSettings and BMCSettings
entries.
In `@internal/controller/biossettings_controller_test.go`:
- Around line 888-893: The test currently deletes all ServerMaintenance items;
narrow deletion to only those created by this BIOSSettings by listing
ServerMaintenance into serverMaintenanceList with a selector/filter tied to the
BIOSSettings under test (e.g., use k8sClient.List(ctx, &serverMaintenanceList,
client.MatchingLabels{"biossettings": biosSettings.Name}) or filter the returned
serverMaintenanceList.Items for OwnerReferences matching the BIOSSettings
UID/Name) and then delete only those filtered items; keep the existing variables
k8sClient, serverMaintenanceList and ObjectList usage but replace the
unconditional List/Delete loop with a targeted list or post-list filter so
neighboring specs are not affected.
In `@internal/controller/biossettings_controller.go`:
- Around line 1039-1054: The retry path leaves old failure conditions when
RetryOfFailedResourceConditionIssued is already True; update the logic in the
manual retry handling (use GetCondition, RetryOfFailedResourceConditionIssued,
r.Conditions.Update and settings.Status.Conditions) to always normalize/reset
conditions for idempotency: when manual retry is triggered, regardless of
retryCondition.Status, replace settings.Status.Conditions with a fresh slice
containing only the normalized retryCondition (with ConditionTrue,
RetryOfFailedResourceReasonIssued and the operation message) and persist the
Update call to ensure prior failure conditions are cleared.
In `@internal/controller/biosversion_controller.go`:
- Around line 445-460: GetCondition is used to fetch the retry condition but
current logic only collapses conditions when Status != True, leaving old failure
conditions when it's already True; always collapse biosVersion.Status.Conditions
to only the retry condition to avoid carrying stale state. After calling
GetCondition(...) and after the r.Conditions.Update(...) branch (whether you
update or skip updating because Status is already True), set
biosVersion.Status.Conditions = []metav1.Condition{*retryCondition}
unconditionally so the retry state replaces any previous conditions; reference
GetCondition, RetryOfFailedResourceConditionIssued, r.Conditions.Update and
biosVersion.Status.Conditions when making this change.
In `@internal/controller/bmcsettings_controller_test.go`:
- Around line 637-641: The test currently deletes every ServerMaintenance
returned by k8sClient.List; change the cleanup to only delete ServerMaintenance
items owned by this test resource by filtering the ServerMaintenanceList.Items
(or using a List with a matching label selector) for OwnerReferences containing
the test resource's UID (or the test-specific label) before calling
k8sClient.Delete(ctx, &maintenance); update the code around
ServerMaintenanceList, maintenance, k8sClient.List and k8sClient.Delete to
perform that ownership check so only objects owned by this test are removed.
In `@internal/controller/bmcversion_controller_test.go`:
- Around line 430-435: The cleanup currently lists all ServerMaintenance
resources with k8sClient.List(ctx, &serverMaintenanceList) and may delete
resources from other tests; restrict the List to the current test namespace by
passing the namespace option (use client.InNamespace(ns.Name)) when calling
k8sClient.List, e.g., call k8sClient.List(ctx, &serverMaintenanceList,
client.InNamespace(ns.Name)) so the subsequent loop that deletes items only
affects ServerMaintenance objects in this test's namespace.
In `@internal/controller/bmcversion_controller.go`:
- Around line 420-434: The retry flow must deterministically reset the
BMCVersion condition set so stale failure conditions cannot persist; in
GetCondition/Retry handling (symbols: GetCondition,
RetryOfFailedResourceConditionIssued, r.Conditions.Update,
conditionutils.UpdateStatus/UpdateReason/UpdateMessage and
bmcVersion.Status.Conditions) ensure that whenever a manual retry is detected or
the retry condition ends up true you overwrite bmcVersion.Status.Conditions with
only the retry condition (not just when it was previously false) and still call
r.Conditions.Update to refresh reason/message as needed; in short, always
replace the Status.Conditions slice with []metav1.Condition{*retryCondition}
whenever retry is issued/true so reconciliation is idempotent.
In `@internal/controller/bmcversionset_controller_test.go`:
- Around line 605-613: The test uses WithPolling(10 * time.Microsecond) which is
too short; update the two Eventually assertions that reference bmcVersion02 and
bmcVersion03 to use WithPolling(10 * time.Millisecond) instead (or remove the
WithPolling call to rely on the suite default). Specifically change the
WithPolling argument on the Eventually(Object(bmcVersion02)) and
Eventually(Object(bmcVersion03)) assertions so they poll at a millisecond
granularity to avoid excessive API-server load and flakiness.
- Around line 604-613: The two assertions for AutoRetryCountRemaining are
asymmetric: bmcVersion02 uses BeNil() while bmcVersion03 uses
Not(Equal(GetPtr(int32(0)))) after applying
OperationAnnotationRetryChildAndSelf; make them consistent by updating the
matcher for either bmcVersion02 or bmcVersion03 so both use the same expectation
(e.g., both BeNil() or both Not(Equal(GetPtr(int32(0))))) or, if the difference
is intentional, add a clarifying By(...) message explaining why bmcVersion02 and
bmcVersion03 are expected to differ; locate the checks referencing bmcVersion02
and bmcVersion03 in the test and change the matcher or add the descriptive By
text accordingly.
---
Duplicate comments:
In `@internal/controller/bmcversionset_controller_test.go`:
- Line 515: Rename the local test variable retryCount to failedAutoRetryCount
and update all its uses accordingly (e.g., change GetPtr(int32(retryCount)) →
GetPtr(int32(failedAutoRetryCount))) so the name aligns with the struct field
FailedAutoRetryCount and clarifies this is the auto-retry budget in the test;
ensure all occurrences in internal/controller/bmcversionset_controller_test.go
that reference retryCount are updated to the new identifier.
---
Nitpick comments:
In `@internal/controller/bmcversion_controller_test.go`:
- Around line 408-411: The test uses too-tight timing in the Eventually polling
for bmcVersion.Status.AutoRetryCountRemaining; update the polling and
consistency windows to be more forgiving: in the Eventually call that wraps
Get(bmcVersion)() change WithPolling(1 * time.Millisecond) to something like
WithPolling(50 * time.Millisecond) (or 100ms) and where a 25ms consistency check
is used change it to ~200-500ms; keep referencing the same
Eventually(Get(bmcVersion)()) check and the
bmcVersion.Status.AutoRetryCountRemaining condition so the test waits longer and
reduces flakiness.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
dist/chart/templates/crd/metal.ironcore.dev_biossettings.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biosversions.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcsettings.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcversions.yamlis excluded by!**/dist/**
📒 Files selected for processing (19)
api/v1alpha1/biossettings_types.goapi/v1alpha1/biosversion_types.goapi/v1alpha1/bmcsettings_types.goapi/v1alpha1/bmcversion_types.gocmd/main.goconfig/crd/bases/metal.ironcore.dev_biossettings.yamlconfig/crd/bases/metal.ironcore.dev_biosversions.yamlconfig/crd/bases/metal.ironcore.dev_bmcsettings.yamlconfig/crd/bases/metal.ironcore.dev_bmcversions.yamldocs/api-reference/api.mdinternal/controller/biossettings_controller.gointernal/controller/biossettings_controller_test.gointernal/controller/biosversion_controller.gointernal/controller/biosversion_controller_test.gointernal/controller/bmcsettings_controller.gointernal/controller/bmcsettings_controller_test.gointernal/controller/bmcversion_controller.gointernal/controller/bmcversion_controller_test.gointernal/controller/bmcversionset_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
- api/v1alpha1/biosversion_types.go
- internal/controller/biosversion_controller_test.go
- api/v1alpha1/bmcversion_types.go
- config/crd/bases/metal.ironcore.dev_biossettings.yaml
- config/crd/bases/metal.ironcore.dev_bmcversions.yaml
- api/v1alpha1/bmcsettings_types.go
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (1)
internal/controller/biossettings_controller_test.go (1)
887-895:⚠️ Potential issue | 🟠 MajorCleanup check is still globally scoped and can be flaky across specs.
Line 895 asserts the full
ServerMaintenancelist is empty, even though this test only deletes objects controlled bybiosSettings. Keep the list/query scoped (namespace + owner) and assert owned resources are gone.🔧 Proposed scope-safe cleanup assertion
+ "sigs.k8s.io/controller-runtime/pkg/client" @@ - Expect(k8sClient.List(ctx, &serverMaintenanceList)).To(Succeed()) - for _, maintenance := range serverMaintenanceList.Items { - if metav1.IsControlledBy(&maintenance, biosSettings) { - Expect(k8sClient.Delete(ctx, &maintenance)).To(Succeed()) - } - } - Eventually(ObjectList(&serverMaintenanceList)).Should(HaveField("Items", BeEmpty())) + Expect(k8sClient.List(ctx, &serverMaintenanceList, client.InNamespace(ns.Name))).To(Succeed()) + for i := range serverMaintenanceList.Items { + maintenance := &serverMaintenanceList.Items[i] + if metav1.IsControlledBy(maintenance, biosSettings) { + Expect(k8sClient.Delete(ctx, maintenance)).To(Succeed()) + } + } + Eventually(func(g Gomega) int { + g.Expect(k8sClient.List(ctx, &serverMaintenanceList, client.InNamespace(ns.Name))).To(Succeed()) + owned := 0 + for i := range serverMaintenanceList.Items { + if metav1.IsControlledBy(&serverMaintenanceList.Items[i], biosSettings) { + owned++ + } + } + return owned + }).Should(Equal(0))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biossettings_controller_test.go` around lines 887 - 895, The cleanup currently asserts the global ServerMaintenance list is empty which is flaky; change the List and assertion to be scoped to the test's namespace and only the resources owned by biosSettings: use k8sClient.List with a namespace-scoped ListOptions (or client.InNamespace(biosSettings.Namespace)) to fetch metalv1alpha1.ServerMaintenanceList, filter by metav1.IsControlledBy(&maintenance, biosSettings) (or list by an owner/label selector if available) and then assert that the filtered/owned slice is empty (replace Eventually(ObjectList(&serverMaintenanceList)).Should(HaveField("Items", BeEmpty())) with an assertion that only the owned items are gone).
🧹 Nitpick comments (2)
internal/controller/biossettingsset_controller_test.go (1)
501-508: Avoid ultra-tight polling inEventuallyto reduce flakiness.At Line 501 and Line 507, 1ms/10ms polling is overly aggressive for envtest and can make CI timing-sensitive.
🔧 Suggested adjustment
- Eventually(Object(biosSettings02)).WithPolling(1 * time.Millisecond).Should(SatisfyAll( + Eventually(Object(biosSettings02)).WithPolling(25 * time.Millisecond).Should(SatisfyAll( HaveField("Status.State", Not(Equal(metalv1alpha1.BIOSSettingsStateFailed))), HaveField("Status.AutoRetryCountRemaining", Not(Equal(GetPtr(int32(0))))), )) @@ - Eventually(Object(biosSettings03)).WithPolling(10 * time.Millisecond).Should(SatisfyAll( + Eventually(Object(biosSettings03)).WithPolling(25 * time.Millisecond).Should(SatisfyAll( HaveField("Status.State", Not(Equal(metalv1alpha1.BIOSSettingsStateFailed))), HaveField("Status.AutoRetryCountRemaining", Not(Equal(GetPtr(int32(0))))), ))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biossettingsset_controller_test.go` around lines 501 - 508, The test uses ultra-tight polling intervals in the Eventually checks for biosSettings02 and biosSettings03 (calls to Eventually(Object(biosSettings02)).WithPolling(1 * time.Millisecond) and Eventually(Object(biosSettings03)).WithPolling(10 * time.Millisecond)), which can cause flakiness; change those WithPolling durations to a more realistic interval (e.g., 50–200ms) or remove WithPolling to use the default, so update the two Eventually(...) invocations referencing biosSettings02 and biosSettings03 accordingly.internal/controller/bmcsettings_controller_test.go (1)
614-617: Increase polling interval in retry-start assertion to reduce flakiness.At Line 617,
1 * time.Millisecondis very tight for envtest timing and can make this test unstable.🔧 Suggested adjustment
- }).WithPolling((1 * time.Millisecond)).Should(BeTrue()) + }).WithPolling(25 * time.Millisecond).Should(BeTrue())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettings_controller_test.go` around lines 614 - 617, The polling interval in the Eventually assertion that checks Get(bmcSettings)() and bmcSettings.Status.AutoRetryCountRemaining is currently set to 1 * time.Millisecond which is too tight; update the WithPolling call (the Eventually(...).WithPolling((1 * time.Millisecond)).Should(BeTrue()) expression) to a larger interval such as 50 * time.Millisecond or 100 * time.Millisecond to reduce flakiness in envtest, leaving the rest of the Eventually block (Get, bmcSettings, and the boolean check) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/api-reference/api.md`:
- Line 322: The documentation uses "retry applying BIOSVersion/BMCVersion" which
is imprecise; update all descriptions referencing failedAutoRetryCount for
BIOSVersion and BMCVersion (and the same instances at the listed locations) to
use upgrade terminology such as "retry firmware upgrade for
BIOSVersion/BMCVersion" or "retry firmware upgrade operation" and ensure the
field name `failedAutoRetryCount` is described as "number of automatic firmware
upgrade retries before giving up" for both BIOSVersion and BMCVersion entries
(refer to `failedAutoRetryCount`, `BIOSVersion`, and `BMCVersion` to locate and
update the text).
In `@internal/controller/biossettings_controller.go`:
- Around line 1044-1054: The code currently skips calling r.Conditions.Update
when retryCondition.Status is already metav1.ConditionTrue, which leaves
retryCondition.Message stale and can cause repeated retries; modify the logic
around retryCondition and r.Conditions.Update in the BIOSSettings reconciliation
so the Update call is performed (or at least the metadata fields are refreshed)
regardless of the existing Status value — e.g., remove the conditional branch
that prevents calling r.Conditions.Update when retryCondition.Status ==
metav1.ConditionTrue and always invoke r.Conditions.Update(retryCondition,
conditionutils.UpdateStatus(...),
conditionutils.UpdateReason(RetryOfFailedResourceReasonIssued),
conditionutils.UpdateMessage(annotations[metalv1alpha1.OperationAnnotation]))
before assigning settings.Status.Conditions =
[]metav1.Condition{*retryCondition} so the Message/Reason are kept up-to-date
idempotently.
In `@internal/controller/biosversion_controller_test.go`:
- Around line 431-439: The test currently deletes only ServerMaintenance items
controlled by biosVersion but then globally asserts that
serverMaintenanceList.Items is empty, causing flakes if other ServerMaintenance
objects exist; change the final assertion to verify that no remaining items are
controlled by biosVersion instead of asserting the entire list is empty — locate
the block using serverMaintenanceList, metav1.IsControlledBy and biosVersion and
replace the
Eventually(ObjectList(&serverMaintenanceList)).Should(HaveField("Items",
BeEmpty())) check with an assertion that filters serverMaintenanceList.Items for
metav1.IsControlledBy(&item, biosVersion) and expects that filtered slice to be
empty (or Eventually the filtered result to be empty) using the existing
ObjectList/Eventually pattern.
In `@internal/controller/biosversion_controller.go`:
- Around line 466-477: The code determines effective retryCount from
biosVersion.Spec.FailedAutoRetryCount and r.DefaultFailedAutoRetryCount but
skips status mutation when retryCount == 0; add normalization so that when
effective retryCount resolves to 0 you explicitly set
biosVersion.Status.AutoRetryCountRemaining = 0 and persist that change (status
patch/update) so stale positive remaining counts are cleared; update the logic
near the retryCount handling in the reconciler (use the same method that patches
status elsewhere in this file) and apply the same fix at the second occurrence
referenced around the 513-515 area.
In `@internal/controller/biosversionset_controller_test.go`:
- Around line 563-568: After deleting the parent biosVersionSet, the subsequent
direct Delete calls for biosVersion02 and biosVersion03 can race with GC and
return NotFound; update the cleanup to tolerate missing children by checking for
apierrors.IsNotFound and treating that as success (or skip Delete if Get reports
NotFound) instead of failing the test — modify the code around
k8sClient.Delete(ctx, biosVersion02)/biosVersion03 to either call Delete and
ignore NotFound errors (using apierrors.IsNotFound) or perform a Get first and
only Delete if present, and keep the Existing
Eventually(Get(...)).Should(Satisfy(apierrors.IsNotFound)) assertions to
validate removal.
In `@internal/controller/bmcsettings_controller.go`:
- Around line 720-730: The code only updates retryCondition when Status !=
metav1.ConditionTrue, which leaves stale reason/message/annotation data; always
normalize the retry condition metadata by invoking r.Conditions.Update for
retryCondition when the existing Reason or Message differs (or unconditionally)
so the condition's Reason (RetryOfFailedResourceReasonIssued) and Message
(annotations[metalv1alpha1.OperationAnnotation]) are kept current; adjust the
logic around retryCondition, r.Conditions.Update, and
conditionutils.UpdateMessage/UpdateReason so the update occurs even if Status is
already True, and then set bmcSetting.Status.Conditions =
[]metav1.Condition{*retryCondition} as before.
---
Duplicate comments:
In `@internal/controller/biossettings_controller_test.go`:
- Around line 887-895: The cleanup currently asserts the global
ServerMaintenance list is empty which is flaky; change the List and assertion to
be scoped to the test's namespace and only the resources owned by biosSettings:
use k8sClient.List with a namespace-scoped ListOptions (or
client.InNamespace(biosSettings.Namespace)) to fetch
metalv1alpha1.ServerMaintenanceList, filter by
metav1.IsControlledBy(&maintenance, biosSettings) (or list by an owner/label
selector if available) and then assert that the filtered/owned slice is empty
(replace
Eventually(ObjectList(&serverMaintenanceList)).Should(HaveField("Items",
BeEmpty())) with an assertion that only the owned items are gone).
---
Nitpick comments:
In `@internal/controller/biossettingsset_controller_test.go`:
- Around line 501-508: The test uses ultra-tight polling intervals in the
Eventually checks for biosSettings02 and biosSettings03 (calls to
Eventually(Object(biosSettings02)).WithPolling(1 * time.Millisecond) and
Eventually(Object(biosSettings03)).WithPolling(10 * time.Millisecond)), which
can cause flakiness; change those WithPolling durations to a more realistic
interval (e.g., 50–200ms) or remove WithPolling to use the default, so update
the two Eventually(...) invocations referencing biosSettings02 and
biosSettings03 accordingly.
In `@internal/controller/bmcsettings_controller_test.go`:
- Around line 614-617: The polling interval in the Eventually assertion that
checks Get(bmcSettings)() and bmcSettings.Status.AutoRetryCountRemaining is
currently set to 1 * time.Millisecond which is too tight; update the WithPolling
call (the Eventually(...).WithPolling((1 * time.Millisecond)).Should(BeTrue())
expression) to a larger interval such as 50 * time.Millisecond or 100 *
time.Millisecond to reduce flakiness in envtest, leaving the rest of the
Eventually block (Get, bmcSettings, and the boolean check) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6bfdc7b5-96da-4812-b3af-3f29d2ddc9f9
⛔ Files ignored due to path filters (8)
dist/chart/templates/crd/metal.ironcore.dev_biossettings.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biossettingssets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biosversions.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_biosversionsets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcsettings.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcsettingssets.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcversions.yamlis excluded by!**/dist/**dist/chart/templates/crd/metal.ironcore.dev_bmcversionsets.yamlis excluded by!**/dist/**
📒 Files selected for processing (24)
api/v1alpha1/biossettings_types.goapi/v1alpha1/biosversion_types.goapi/v1alpha1/bmcsettings_types.goapi/v1alpha1/bmcversion_types.goconfig/crd/bases/metal.ironcore.dev_biossettings.yamlconfig/crd/bases/metal.ironcore.dev_biossettingssets.yamlconfig/crd/bases/metal.ironcore.dev_biosversions.yamlconfig/crd/bases/metal.ironcore.dev_biosversionsets.yamlconfig/crd/bases/metal.ironcore.dev_bmcsettings.yamlconfig/crd/bases/metal.ironcore.dev_bmcsettingssets.yamlconfig/crd/bases/metal.ironcore.dev_bmcversions.yamlconfig/crd/bases/metal.ironcore.dev_bmcversionsets.yamldocs/api-reference/api.mdinternal/controller/biossettings_controller.gointernal/controller/biossettings_controller_test.gointernal/controller/biossettingsset_controller_test.gointernal/controller/biosversion_controller.gointernal/controller/biosversion_controller_test.gointernal/controller/biosversionset_controller_test.gointernal/controller/bmcsettings_controller.gointernal/controller/bmcsettings_controller_test.gointernal/controller/bmcsettingsset_controller_test.gointernal/controller/bmcversion_controller_test.gointernal/controller/bmcversionset_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- internal/controller/bmcversionset_controller_test.go
- api/v1alpha1/biosversion_types.go
- config/crd/bases/metal.ironcore.dev_biosversionsets.yaml
- internal/controller/bmcsettingsset_controller_test.go
- api/v1alpha1/bmcversion_types.go
| | `updatePolicy` _[UpdatePolicy](#updatepolicy)_ | UpdatePolicy indicates whether the server's upgrade service should bypass vendor update policies. | | | | ||
| | `image` _[ImageSpec](#imagespec)_ | Image specifies the image to use to upgrade to the given BIOS version. | | | | ||
| | `serverMaintenancePolicy` _[ServerMaintenancePolicy](#servermaintenancepolicy)_ | ServerMaintenancePolicy is a maintenance policy to be enforced on the server. | | | | ||
| | `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BIOSVersion in case of failure before giving up. | | Minimum: 0 <br /> | |
There was a problem hiding this comment.
Use upgrade terminology for BIOSVersion/BMCVersion retry descriptions.
These version resources perform firmware upgrades; “retry applying BIOSVersion/BMCVersion” is awkward and less precise for API consumers.
🔧 Suggested wording
-| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BIOSVersion in case of failure before giving up. | | Minimum: 0 <br /> |
+| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry the BIOS upgrade in case of failure before giving up. | | Minimum: 0 <br /> |
-| `autoRetryCountRemaining` _integer_ | AutoRetryCountRemaining is the number of remaining times the controller will automatically retry applying BIOSVersion in case of failure before giving up. | | |
+| `autoRetryCountRemaining` _integer_ | AutoRetryCountRemaining is the number of remaining times the controller will automatically retry the BIOS upgrade in case of failure before giving up. | | |
-| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BMCVersion in case of failure before giving up. | | Minimum: 0 <br /> |
+| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry the BMC upgrade in case of failure before giving up. | | Minimum: 0 <br /> |
-| `autoRetryCountRemaining` _integer_ | AutoRetryCountRemaining is the number of remaining times the controller will automatically retry applying BMCVersion in case of failure before giving up. | | |
+| `autoRetryCountRemaining` _integer_ | AutoRetryCountRemaining is the number of remaining times the controller will automatically retry the BMC upgrade in case of failure before giving up. | | |Also applies to: 361-361, 384-384, 839-839, 879-879, 901-901
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/api-reference/api.md` at line 322, The documentation uses "retry
applying BIOSVersion/BMCVersion" which is imprecise; update all descriptions
referencing failedAutoRetryCount for BIOSVersion and BMCVersion (and the same
instances at the listed locations) to use upgrade terminology such as "retry
firmware upgrade for BIOSVersion/BMCVersion" or "retry firmware upgrade
operation" and ensure the field name `failedAutoRetryCount` is described as
"number of automatic firmware upgrade retries before giving up" for both
BIOSVersion and BMCVersion entries (refer to `failedAutoRetryCount`,
`BIOSVersion`, and `BMCVersion` to locate and update the text).
There was a problem hiding this comment.
I think we should always include the crd names and keep the overall flow but could add the upgrade keyword for those operations, e.g. like this:
| Current | Improved |
|---|---|
retry applying BIOSVersion |
retry applying BIOSVersion upgrade |
retry applying BMCVersion |
retry applying BMCVersion upgrade |
retry applying BIOSSettings |
✅ |
retry applying BMCSettings |
✅ |
what do you think?
There was a problem hiding this comment.
@xkonni, that's a great compromise! Keeping the CRD names provides clear API consumer context while adding upgrade removes the ambiguity. Here's the updated diff applying that approach to all affected entries:
-| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BIOSVersion in case of failure before giving up. | | Minimum: 0 <br /> |
+| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BIOSVersion upgrade in case of failure before giving up. | | Minimum: 0 <br /> |
-| `autoRetryCountRemaining` _integer_ | AutoRetryCountRemaining is the number of remaining times the controller will automatically retry applying BIOSVersion in case of failure before giving up. | | |
+| `autoRetryCountRemaining` _integer_ | AutoRetryCountRemaining is the number of remaining times the controller will automatically retry applying BIOSVersion upgrade in case of failure before giving up. | | |
-| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BIOSVersion in case of failure before giving up. | | Minimum: 0 <br /> |
+| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BIOSVersion upgrade in case of failure before giving up. | | Minimum: 0 <br /> |
-| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BMCVersion in case of failure before giving up. | | Minimum: 0 <br /> |
+| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BMCVersion upgrade in case of failure before giving up. | | Minimum: 0 <br /> |
-| `autoRetryCountRemaining` _integer_ | AutoRetryCountRemaining is the number of remaining times the controller will automatically retry applying BMCVersion in case of failure before giving up. | | |
+| `autoRetryCountRemaining` _integer_ | AutoRetryCountRemaining is the number of remaining times the controller will automatically retry applying BMCVersion upgrade in case of failure before giving up. | | |
-| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BMCVersion in case of failure before giving up. | | Minimum: 0 <br /> |
+| `failedAutoRetryCount` _integer_ | FailedAutoRetryCount is the number of times the controller should automatically retry applying BMCVersion upgrade in case of failure before giving up. | | Minimum: 0 <br /> |Note that since docs/api-reference/api.md is typically auto-generated (e.g. via make generate), you may also want to update the source Go struct field comments in api/v1alpha1/biosversion_types.go and api/v1alpha1/bmcversion_types.go so the change persists across regenerations.
🧠 Learnings used
Learnt from: nagadeesh-nagaraja
Repo: ironcore-dev/metal-operator PR: 695
File: internal/controller/bmcsettings_controller.go:720-730
Timestamp: 2026-03-04T10:32:24.468Z
Learning: In `internal/controller/bmcsettings_controller.go` (and sibling controllers), the `RetryOfFailedResourceConditionIssued` condition is intentionally left unchanged when its Status is already `metav1.ConditionTrue`. The guard `if retryCondition.Status != metav1.ConditionTrue` is by design — existing condition metadata (reason/message) is preserved on re-entrant reconciliations to avoid redundant updates.
xkonni
left a comment
There was a problem hiding this comment.
yes, we need that. I like implemented the hybrid approach.
Proposed Changes
provides option to auto retry the failed resources X number of times.
Also fixes the infinite auto retry issue through SET CRD
Fixes #693
fix: #623
fix: #580
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation