Skip to content

Add BMC certificate management with cert-manager integration#765

Open
stefanhipfel wants to merge 8 commits intomainfrom
feature/issue-218-bmc-certificate-management
Open

Add BMC certificate management with cert-manager integration#765
stefanhipfel wants to merge 8 commits intomainfrom
feature/issue-218-bmc-certificate-management

Conversation

@stefanhipfel
Copy link
Copy Markdown
Contributor

@stefanhipfel stefanhipfel commented Mar 30, 2026

Implement complete certificate lifecycle management for BMC resources using cert-manager for automated TLS certificate provisioning, renewal, and installation.

API Changes:

  • Add BMCCertificateSpec to BMC CRD for cert-manager integration
  • Add certificate status fields and CertificateReady condition
  • Support issuer references, duration, CN, DNS names, IP addresses

Summary by CodeRabbit

  • New Features

    • BMC certificate management via cert-manager: CSR generation (BMC or operator), CertificateRequest lifecycle, automated installation/renewal, status (Issued/Pending/Failed), CLI flag to enable and retry interval.
  • Documentation

    • New certificate management guide with setup, examples, migration, and troubleshooting.
  • Tests

    • Added unit, integration, and end-to-end tests covering provisioning, renewal, CSR fallback, and edge cases.
  • Chores

    • CRD/status fields, RBAC for cert-manager, dependency updates, TLS secret loading, and a sample manifest for certificate-enabled BMCs.

Implement complete certificate lifecycle management for BMC resources
using cert-manager for automated TLS certificate provisioning, renewal,
and installation.

API Changes:
- Add BMCCertificateSpec to BMC CRD for cert-manager integration
- Add certificate status fields and CertificateReady condition
- Support issuer references, duration, CN, DNS names, IP addresses

Implementation:
- Extend BMC interface with ReplaceCertificate() method
- Implement Redfish certificate upload via CertificateService
- Add certificate reconciliation logic to BMC controller
- Support hybrid CSR generation (BMC-native preferred, operator fallback)
- Automatic certificate renewal based on expiration (30-day buffer)
- Graceful degradation on certificate errors

Testing:
- Unit tests for certificate operations (685 lines)
- Integration tests with mock cert-manager (720 lines)
- E2E tests with real cert-manager and kind cluster
- Test fixtures for self-signed issuer scenarios
- Note: One integration test marked Pending due to mock server timing issues

Documentation:
- Complete user guide (docs/certificate-management.md)
- Example BMC resource with certificate configuration
- Multiple issuer type examples (self-signed, CA, ACME/Let's Encrypt)
- Installation instructions and troubleshooting guide

Feature is opt-in via --enable-certificate-management flag.
Requires cert-manager v1.14.0+ to be installed separately.

Fixes #218
@stefanhipfel stefanhipfel requested a review from a team as a code owner March 30, 2026 08:14
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds cert-manager-driven BMC certificate management: new API types/CRD fields, BMC interface CSR/certificate methods, Redfish CSR/install implementations, controller reconciliation for CertificateRequest lifecycle and installation, RBAC and CLI flags, tests, docs, and TLS secret loading utilities.

Changes

Cohort / File(s) Summary
API & CRD
api/v1alpha1/bmc_types.go, api/v1alpha1/zz_generated.deepcopy.go, config/crd/bases/metal.ironcore.dev_bmcs.yaml
Add BMCCertificateSpec and CertificateIssuerRef, certificate-ready condition constants/reasons, spec.certificate, status.certificateRequestName and status.certificateSecretRef, and generated deepcopy methods/CRD schema updates.
BMC interface & Redfish client
bmc/bmc.go, bmc/redfish.go, bmc/redfish_local.go, bmc/mockup.go
Extend bmc.BMC with CSR/certificate methods and types (GenerateCSR, ReplaceCertificate, GetCertificateInfo, CSRParameters, CertificateInfo); implement Redfish CSR generation and certificate replacement; add mock flag SimulateCSRGenerationFailure.
Controller core & wiring
internal/controller/bmc_controller.go, internal/controller/bmc_certificate.go, internal/controller/suite_test.go
Add certificate reconciliation flow (CSR generation, CertificateRequest lifecycle, install/renew, secret handling), gates and flags on reconciler, conditional watch for CertificateRequest, test scheme/CRD bootstrapping, and enqueue helper.
Controller tests & integration
internal/controller/bmc_certificate_test.go, internal/controller/bmc_certificate_integration_test.go
Add unit and integration tests covering CSR generation, CertificateRequest creation/handling, secret lifecycle, condition transitions, SAN/IP handling, renewal, owner refs, and error paths.
E2E & testdata
test/e2e/bmc_certificate_e2e_test.go, test/e2e/testdata/selfsigned-issuer.yaml
Add E2E suite validating cert-manager integration and lifecycle scenarios; include self-signed Issuer manifest.
RBAC, CLI, samples
config/rbac/role.yaml, cmd/main.go, config/samples/bmc_with_certificate.yaml
Grant cert-manager CertificateRequest RBAC, add CLI flags --enable-bmc-certificate-management and --certificate-retry-interval, and add a sample BMC manifest with spec.certificate.
Utilities & TLS handling
internal/bmcutils/bmcutils.go, tools.go
Load TLS config from referenced certificate Secret for BMC clients (helper functions); add tools import to pin cert-manager API dependency.
Docs & API reference, deps
docs/certificate-management.md, docs/api-reference/api.md, go.mod, config/crd/kustomization.yaml
New certificate management user guide and API docs; add cert-manager dependency and kustomize entry for cert-manager CRDs.
Server controller tweak
internal/controller/server_controller.go
Adjust Server deletion/reconciliation flow to better handle missing BMC references and requeue-after-delete behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler as BMC Reconciler
    participant K8s as Kubernetes API
    participant CertMgr as cert-manager
    participant BMCDev as BMC Device

    Reconciler->>Reconciler: Inspect spec.certificate
    alt certificate not set
        Reconciler->>Reconciler: Skip certificate reconciliation
    else certificate set
        Reconciler->>K8s: Read TLS Secret (status.certificateSecretRef)
        alt valid certificate present
            Reconciler->>Reconciler: Early exit (no action)
        else needs issuance/renewal
            Reconciler->>K8s: Lookup CertificateRequest by status.CertificateRequestName
            alt existing pending request
                Reconciler->>CertMgr: Observe CertificateRequest status
            else create/request absent
                Reconciler->>BMCDev: Try GenerateCSR (native)
                alt native CSR returned
                    BMCDev-->>Reconciler: CSR bytes
                else native CSR failed
                    Reconciler->>Reconciler: Generate operator CSR + private key
                end
                Reconciler->>K8s: Create CertificateRequest (with CSR)
            end
            CertMgr->>CertMgr: Issue certificate
            CertMgr->>K8s: Populate CertificateRequest.status.certificate
            Reconciler->>K8s: Read issued certificate from CertificateRequest
            Reconciler->>BMCDev: ReplaceCertificate (install cert ± key)
            Reconciler->>K8s: Create/Update TLS Secret and patch status.certificateSecretRef
            Reconciler->>Reconciler: Set BMCCertificateReadyCondition = Issued
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • afritzler
  • davidgrun
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete; it lacks the structured format specified in the repository template, which requires 'Proposed Changes' bullet points and a 'Fixes #' section. Rewrite the description following the template format: add a 'Proposed Changes' section with detailed bullet points covering API changes, configuration, controller logic, and documentation; add a 'Fixes #' section referencing the related issue.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding BMC certificate management with cert-manager integration, which is the core objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/issue-218-bmc-certificate-management

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added size/XXL api-change documentation Improvements or additions to documentation enhancement New feature or request labels Mar 30, 2026
@afritzler afritzler changed the title Add BMC certificate management with cert-manager integration (#218) Add BMC certificate management with cert-manager integration Mar 30, 2026
Merge latest changes from main (52d7860) into the BMC certificate
management feature branch. This brings in:

- BMCSettings reconciler simplification (#723)
- Dynamic BMC settings support (#746)
- BMC event subscription error handling (#754)
- Registry timeout improvements (#742)
- ObjectReference deprecation preparation (#748)
- Deletion logic improvements (#744)
- Vendor-specific firmware upgrade checks (#733)
- Multiple dependency updates (grpc, crypto, docker actions)
- Go 1.26.1 update

Conflict resolutions:
- bmc/bmc.go: Merged certificate methods with CheckBMCPendingComponentUpgrade
- bmc/redfish_local.go: Merged GenerateCSR with CheckBMCPendingComponentUpgrade
- go.mod/go.sum: Updated to latest dependency versions from main

All certificate management features preserved:
- BMC certificate lifecycle with cert-manager integration
- CSR generation via Redfish CertificateService
- Automatic certificate provisioning and renewal
- Feature flag: --enable-bmc-certificate-management
- Complete test coverage maintained

Verified:
- Build successful: go build ./...
- Manifests regenerated: make manifests generate
- Dependencies resolved: go mod tidy
- Certificate functionality intact
…r CRDs

Three CI checks were failing in PR #765:

1. Code generation check - Helm chart was out of sync
   - Regenerated with 'make helm'
   - Now includes certificate spec/status in BMC CRD template
   - Now includes cert-manager RBAC in role template

2. Docs build - Dead link in certificate-management.md
   - Fixed link: ../api-reference/api → ./api-reference/api
   - VitePress now builds successfully

3. Test suite - Missing cert-manager CRDs for envtest
   - Added config/crd/cert-manager/cert-manager.crds.yaml (v1.19.4)
   - Test suite now loads cert-manager CRDs via envtest
   - Required for integration tests that mock cert-manager behavior

All changes are non-functional - they only fix CI checks and test setup.
The certificate management feature functionality remains unchanged.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bmc/redfish.go (1)

91-107: ⚠️ Potential issue | 🟠 Major

Don't infer Insecure from TLSConfig == nil.

Line 96 disables certificate verification whenever no custom TLS config is injected. Callers still pass insecure=false through the public API, so HTTPS connections with system-trusted certificates—and any upstream TLS-secret load failure—now silently fail open. Keep an explicit insecure flag in Options and map ClientConfig.Insecure from that instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bmc/redfish.go` around lines 91 - 107, The code is incorrectly setting
gofish.ClientConfig.Insecure based on options.TLSConfig == nil; update the
implementation to use an explicit Insecure flag on the public Options struct and
map that flag into clientConfig.Insecure instead. Concretely, add/verify an
Insecure bool field on Options and change the newRedfishBaseBMCClient function
to set clientConfig.Insecure = options.Insecure (remove the TLSConfig==nil
inference), while keeping the existing TLSConfig-based custom HTTP client
assignment (TLSClientConfig stays as-is).
🧹 Nitpick comments (7)
test/e2e/testdata/selfsigned-issuer.yaml (1)

8-10: Avoid hardcoding the issuer namespace in test fixture.

Pinning namespace: default makes this fixture less reusable and can desync from E2E namespace configuration. Prefer templating this or applying with explicit namespace in the test flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/testdata/selfsigned-issuer.yaml` around lines 8 - 10, The test
fixture currently hardcodes metadata.namespace: default for the resource named
selfsigned-issuer; remove the hardcoded namespace from metadata.namespace (or
replace it with a template variable like ${NAMESPACE} or {{ .Namespace }}) so
the fixture is namespace-agnostic, and ensure the test harness or apply step
sets the actual namespace when creating the Issuer (e.g., by rendering the
template or passing an explicit namespace to kubectl/apply in the test flow).
test/e2e/bmc_certificate_e2e_test.go (1)

298-301: Consider using os.CreateTemp for safer temp file handling.

Writing to predictable /tmp/bmc-invalid-issuer.yaml could conflict with parallel test runs. Using os.CreateTemp provides unique filenames.

-tmpFile := "/tmp/bmc-invalid-issuer.yaml"
-err := os.WriteFile(tmpFile, []byte(invalidBMCYAML), 0644)
+tmpFile, err := os.CreateTemp("", "bmc-invalid-issuer-*.yaml")
 Expect(err).NotTo(HaveOccurred())
-defer func() { _ = os.Remove(tmpFile) }()
+defer func() { _ = os.Remove(tmpFile.Name()) }()
+_, err = tmpFile.Write([]byte(invalidBMCYAML))
+Expect(err).NotTo(HaveOccurred())
+Expect(tmpFile.Close()).To(Succeed())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/bmc_certificate_e2e_test.go` around lines 298 - 301, Replace the
fixed-path temp file creation with os.CreateTemp to avoid collisions: call
os.CreateTemp to create a unique temp file, write invalidBMCYAML into the
returned *os.File (or use its Name() with os.WriteFile), ensure you close the
file, and defer os.Remove using the file's Name() instead of the hard-coded
"/tmp/bmc-invalid-issuer.yaml" (update references to tmpFile and invalidBMCYAML
accordingly).
internal/controller/bmc_certificate.go (3)

249-261: Missing Group field in IssuerRef could cause issues with external issuers.

The IssuerRef is created without the Group field. While cert-manager defaults to cert-manager.io, this should be explicitly set or pulled from the BMC spec if you later add Group support to CertificateIssuerRef.

 			IssuerRef: cmmeta.ObjectReference{
 				Name: bmcObj.Spec.Certificate.IssuerRef.Name,
 				Kind: bmcObj.Spec.Certificate.IssuerRef.Kind,
+				Group: "cert-manager.io",
 			},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 249 - 261, The IssuerRef
used when building the certmanagerv1.CertificateRequestSpec omits the Group
field; update the cmmeta.ObjectReference construction (the IssuerRef inside
CertificateRequestSpec) to explicitly set Group — either to the explicit value
from bmcObj.Spec.Certificate.IssuerRef.Group (if present) or default to
"cert-manager.io" — so IssuerRef includes Name, Kind and Group when creating the
CertificateRequest.

91-93: Inconsistent error logging when deleting failed CertificateRequest.

The code logs the error but the log call uses log.Error which is appropriate. However, consider whether the reconciliation should continue or return an error if deletion fails, as the subsequent create might fail with "already exists".

The current behavior is acceptable for graceful degradation, but you might want to add a requeue to retry the deletion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 91 - 93, The delete of a
failed CertificateRequest (r.Delete(ctx, certReq)) currently logs errors via
log.Error but then continues, which can cause the subsequent create to fail with
"already exists"; change the reconcile behavior so that on Delete error you
either return the error (so controller-runtime will requeue with backoff) or
explicitly requeue (e.g., return ctrl.Result{Requeue: true}, nil) after logging;
locate the delete call and its surrounding reconcile logic in the reconcile
method in bmc_certificate.go and update the control flow to stop and requeue
when r.Delete fails rather than proceeding.

674-691: Consider using configurable retry interval for certificate errors.

The fixed 5-minute retry interval may be too long for transient network errors or too short for persistent configuration issues. Consider:

  1. Using the reconciler's configured retry interval (r.BMCClientRetryInterval) for consistency
  2. Or implementing different intervals based on error type (transient vs permanent)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 674 - 691, Replace the
hard-coded 5 minute requeue in handleCertificateError with the reconciler's
configurable retry interval: use r.BMCClientRetryInterval (or a sensible default
if it's zero) instead of 5*time.Minute to determine RequeueAfter; optionally
branch on the error type (transient vs permanent) inside handleCertificateError
to choose between r.BMCClientRetryInterval for transient errors and a
longer/shorter interval for permanent errors, and ensure the method returns the
chosen interval in the ctrl.Result.
internal/controller/bmc_certificate_integration_test.go (1)

704-730: Consider extracting shared test helper to avoid duplication.

This generateTestCertificate function is nearly identical to the one in bmc_certificate_test.go (lines 640-665). Consider extracting to a shared test utilities file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate_integration_test.go` around lines 704 -
730, The generateTestCertificate function is duplicated; extract it into a
shared test helper (e.g., a new test utilities file/package) and replace both
copies with calls to that single helper to avoid duplication. Create a helper
function with the same signature (generateTestCertificate(validityDuration
time.Duration, commonName string) []byte) in a common test file (e.g.,
testhelpers or internal/testutil), move the implementation there, update
imports/tests to reference the shared helper, and remove the duplicate
implementation in bmc_certificate_integration_test.go and
bmc_certificate_test.go so both tests call the centralized function.
api/v1alpha1/bmc_types.go (1)

103-114: Consider adding Group field to CertificateIssuerRef to support external cert-manager issuers, or explicitly document that only built-in cert-manager issuers are supported.

cert-manager's ObjectReference type includes a Group field (defaulting to cert-manager.io) for referencing issuers from different API groups, enabling support for external issuer implementations (e.g., acme-dns01-solver). The current CertificateIssuerRef struct lacks this field, which limits users to built-in cert-manager issuers. Given that the controller's getCACertificateFromIssuer method is a placeholder, this limitation should be either addressed alongside full controller implementation or documented in the certificate management guide.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/bmc_types.go` around lines 103 - 114, The CertificateIssuerRef
struct currently omits an API group field, limiting references to built-in
cert-manager issuers; add a Group string `json:"group,omitempty"` field to
CertificateIssuerRef (alongside Name and Kind) to mirror cert-manager's
ObjectReference semantics and allow non-default issuer API groups, and update
any validation/defaulting logic if present; alternatively, if you decide not to
support external issuers, document this limitation clearly in the cert
management docs and note it next to the getCACertificateFromIssuer method so
future implementers know the intended behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bmc/redfish.go`:
- Around line 1163-1175: If certLocations.Certificates() returns an empty list
the code leaves certURI blank and proceeds to build and call ReplaceCertificate,
which yields an opaque downstream error; add a guard after retrieving
certificates that checks len(certificates)==0 and return a clear error (e.g.,
fmt.Errorf or wrapped error) indicating "no certificate URI found from
certLocations.Certificates()" before constructing
schemas.CertificateServiceReplaceCertificateParameters or invoking
ReplaceCertificate, referencing the certURI variable and the ReplaceCertificate
call to locate where to add the check.

In `@docs/certificate-management.md`:
- Around line 43-49: Add a fence language to the fenced code block labeled
"Expected output:" that contains the cert-manager pod lines (the block with
cert-manager-7d9f6d9d8c-xxxxx, cert-manager-cainjector-5c7d9f5f5-xxxxx,
cert-manager-webhook-6b8f9f5d5-xxxxx) by changing the opening ``` to ```text so
the code block has a language (e.g., ```text) and satisfies markdownlint MD040.
- Around line 345-348: Update the example commands and text that reference the
wrong cert-manager resource: replace instructions that tell users to
inspect/delete Certificate resources with commands that target
CertificateRequest resources using the tracked status.certificateRequestName
(and the field certificateRequestName shown in examples). Ensure any example
that mentions checking or deleting a Certificate (e.g., where
certificateSecretRef and certificateRequestName are shown) instead uses
CertificateRequest and the name from status.certificateRequestName so
troubleshooting/cleanup points at the actual objects created by the controller.
- Around line 608-613: The "metal-operator API reference" link currently points
to a dead URL ('../api-reference/api') causing Pages build failures; locate the
markdown list item containing the link text "metal-operator API reference" and
either update its href to the correct generated API reference path (replace
'../api-reference/api' with the actual generated URL) or remove that list item
entirely until the target page exists so the build no longer fails.

In `@internal/bmcutils/bmcutils.go`:
- Around line 207-215: The current code expects secret.Data["ca.crt"] to always
exist but getCACertificateFromIssuer (in bmc_certificate.go) is a stub so ca.crt
is never populated; update the implementation so ca.crt is correctly extracted
for supported issuer types (implement getCACertificateFromIssuer to return the
issuer CA PEM when available), and adjust the caller in bmcutils.go to treat
ca.crt as optional for ACME issuers (allow nil/empty and fall back to insecure
or explicit behavior), or alternatively return an explicit error for unsupported
issuer classes; locate getCACertificateFromIssuer, the secret.Data["ca.crt"]
usage, and the TLS setup code and either implement CA extraction, add
conditional handling when issuer is ACME to skip requiring ca.crt, or surface a
clear rejection for unsupported issuers.

In `@internal/controller/bmc_certificate.go`:
- Around line 335-339: The placeholder getCACertificateFromIssuer implementation
currently returns an empty CA, causing ca.crt to be empty and triggering
loadTLSConfigFromSecret to fall back to insecure connections in
GetBMCClientFromBMC; replace the placeholder in r.getCACertificateFromIssuer
with logic to fetch the CA from cert-manager (e.g., read certRequest.Status.CA
or extract the issuer CA from certRequest.Status.Certificate chain) and return
it (string, error), and if the CA cannot be obtained either return a non-nil
error so the caller fails explicitly or add a clear configuration/feature-flag
to refuse insecure fallback; ensure any changes propagate the error to the
caller and update logging around GetBMCClientFromBMC to avoid silent insecure
connections (references: getCACertificateFromIssuer,
r.getCACertificateFromIssuer, ca.crt, loadTLSConfigFromSecret,
GetBMCClientFromBMC).
- Around line 400-412: The Secret is currently always created with
corev1.SecretTypeTLS which will fail validation when privateKeyPEM is empty
(BMC-generated CSR); update the creation logic around secret (the variable named
"secret" in bmc_certificate.go) to set secret.Type conditionally: use
corev1.SecretTypeTLS only when privateKeyPEM is non-empty and otherwise use
corev1.SecretTypeOpaque (or always use Opaque if you prefer), and keep adding
tls.crt/tls.key into secret.Data as before based on certPEM and privateKeyPEM
presence.
- Around line 199-205: When getCertificateIPAddresses(ctx, bmcObj) returns an
error the code currently logs and continues, producing a cert without IP SANs;
change this to propagate the error so handleCertificateError can run (or
otherwise ensure the cert still contains IP SANs). Specifically, in the block
around getCertificateIPAddresses make the error path return the error (or call
the existing handleCertificateError flow) instead of only logging, so the caller
of the function that sets template.IPAddresses receives the failure and can
trigger the existing error handling; update references to template.IPAddresses
only after successful getCertificateIPAddresses.

In `@internal/controller/bmc_controller.go`:
- Around line 143-146: reconcileCertificate mutates status/conditions on the BMC
object but the controller continues using the old bmcObj which can cause patch
conflicts; after calling reconcileCertificate and confirming result.IsZero(),
re-fetch the latest BMC into bmcObj (call r.Get(ctx, req.NamespacedName,
bmcObj)) before proceeding with further reconciliation/patching so subsequent
updates use the current resource state and avoid clobbering certificate changes.

In `@test/e2e/bmc_certificate_e2e_test.go`:
- Line 138: The test currently calls Eventually(verifySecretExists,
certificatePolling, certificatePolling) so the timeout equals the poll interval
and only one attempt occurs; change these calls (the ones invoking
verifySecretExists) to use a longer timeout than the polling interval—either
replace the first certificatePolling with a new longer value (e.g.,
certificateTimeout or 5*certificatePolling) or define certificateTimeout and use
Eventually(verifySecretExists, certificateTimeout, certificatePolling); update
all occurrences (the calls referencing verifySecretExists with two
certificatePolling args) so the timeout is larger than the interval to allow
multiple retries.
- Around line 153-162: The test's JSONPath is querying the non-existent field
.status.certificateRef.name in the verifyCertificateRef function; update the
kubectl jsonpath to .status.certificateSecretRef.name so it matches the API's
CertificateSecretRef (used with bmcWithCertName and bmcTestNamespace) and keep
the existing expectation of Equal(fmt.Sprintf("%s-cert", bmcWithCertName)).
- Around line 42-55: Update the misleading comment: change the line that says
"creating self-signed ClusterIssuer for testing" to accurately state "creating
self-signed Issuer for testing" (the resource created is a namespaced Issuer
referenced by selfSignedIssuerFile and checked via kubectl get issuer
selfsigned-issuer in namespace bmcTestNamespace), so the comment matches the
actual resource type used by the test (verifyIssuerReady and
selfSignedIssuerFile).
- Line 114: The call to Eventually uses the same value for timeout and poll
interval so it only runs once; change the Eventually(verifyCertificateIssuer,
certificatePolling, certificatePolling) invocation to provide a larger timeout
than the polling interval (e.g. use a dedicated timeout variable or multiply
certificatePolling) so verifyCertificateIssuer is retried multiple times —
update the Eventually call that references verifyCertificateIssuer and
certificatePolling to pass a longer timeout value.

---

Outside diff comments:
In `@bmc/redfish.go`:
- Around line 91-107: The code is incorrectly setting
gofish.ClientConfig.Insecure based on options.TLSConfig == nil; update the
implementation to use an explicit Insecure flag on the public Options struct and
map that flag into clientConfig.Insecure instead. Concretely, add/verify an
Insecure bool field on Options and change the newRedfishBaseBMCClient function
to set clientConfig.Insecure = options.Insecure (remove the TLSConfig==nil
inference), while keeping the existing TLSConfig-based custom HTTP client
assignment (TLSClientConfig stays as-is).

---

Nitpick comments:
In `@api/v1alpha1/bmc_types.go`:
- Around line 103-114: The CertificateIssuerRef struct currently omits an API
group field, limiting references to built-in cert-manager issuers; add a Group
string `json:"group,omitempty"` field to CertificateIssuerRef (alongside Name
and Kind) to mirror cert-manager's ObjectReference semantics and allow
non-default issuer API groups, and update any validation/defaulting logic if
present; alternatively, if you decide not to support external issuers, document
this limitation clearly in the cert management docs and note it next to the
getCACertificateFromIssuer method so future implementers know the intended
behavior.

In `@internal/controller/bmc_certificate_integration_test.go`:
- Around line 704-730: The generateTestCertificate function is duplicated;
extract it into a shared test helper (e.g., a new test utilities file/package)
and replace both copies with calls to that single helper to avoid duplication.
Create a helper function with the same signature
(generateTestCertificate(validityDuration time.Duration, commonName string)
[]byte) in a common test file (e.g., testhelpers or internal/testutil), move the
implementation there, update imports/tests to reference the shared helper, and
remove the duplicate implementation in bmc_certificate_integration_test.go and
bmc_certificate_test.go so both tests call the centralized function.

In `@internal/controller/bmc_certificate.go`:
- Around line 249-261: The IssuerRef used when building the
certmanagerv1.CertificateRequestSpec omits the Group field; update the
cmmeta.ObjectReference construction (the IssuerRef inside
CertificateRequestSpec) to explicitly set Group — either to the explicit value
from bmcObj.Spec.Certificate.IssuerRef.Group (if present) or default to
"cert-manager.io" — so IssuerRef includes Name, Kind and Group when creating the
CertificateRequest.
- Around line 91-93: The delete of a failed CertificateRequest (r.Delete(ctx,
certReq)) currently logs errors via log.Error but then continues, which can
cause the subsequent create to fail with "already exists"; change the reconcile
behavior so that on Delete error you either return the error (so
controller-runtime will requeue with backoff) or explicitly requeue (e.g.,
return ctrl.Result{Requeue: true}, nil) after logging; locate the delete call
and its surrounding reconcile logic in the reconcile method in
bmc_certificate.go and update the control flow to stop and requeue when r.Delete
fails rather than proceeding.
- Around line 674-691: Replace the hard-coded 5 minute requeue in
handleCertificateError with the reconciler's configurable retry interval: use
r.BMCClientRetryInterval (or a sensible default if it's zero) instead of
5*time.Minute to determine RequeueAfter; optionally branch on the error type
(transient vs permanent) inside handleCertificateError to choose between
r.BMCClientRetryInterval for transient errors and a longer/shorter interval for
permanent errors, and ensure the method returns the chosen interval in the
ctrl.Result.

In `@test/e2e/bmc_certificate_e2e_test.go`:
- Around line 298-301: Replace the fixed-path temp file creation with
os.CreateTemp to avoid collisions: call os.CreateTemp to create a unique temp
file, write invalidBMCYAML into the returned *os.File (or use its Name() with
os.WriteFile), ensure you close the file, and defer os.Remove using the file's
Name() instead of the hard-coded "/tmp/bmc-invalid-issuer.yaml" (update
references to tmpFile and invalidBMCYAML accordingly).

In `@test/e2e/testdata/selfsigned-issuer.yaml`:
- Around line 8-10: The test fixture currently hardcodes metadata.namespace:
default for the resource named selfsigned-issuer; remove the hardcoded namespace
from metadata.namespace (or replace it with a template variable like
${NAMESPACE} or {{ .Namespace }}) so the fixture is namespace-agnostic, and
ensure the test harness or apply step sets the actual namespace when creating
the Issuer (e.g., by rendering the template or passing an explicit namespace to
kubectl/apply in the test flow).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf8a557f-fa18-408e-852b-1b237c53edbe

📥 Commits

Reviewing files that changed from the base of the PR and between 52d7860 and f46e3ad.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (21)
  • api/v1alpha1/bmc_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bmc/bmc.go
  • bmc/mockup.go
  • bmc/redfish.go
  • bmc/redfish_local.go
  • cmd/main.go
  • config/crd/bases/metal.ironcore.dev_bmcs.yaml
  • config/rbac/role.yaml
  • config/samples/bmc_with_certificate.yaml
  • docs/certificate-management.md
  • go.mod
  • internal/bmcutils/bmcutils.go
  • internal/controller/bmc_certificate.go
  • internal/controller/bmc_certificate_integration_test.go
  • internal/controller/bmc_certificate_test.go
  • internal/controller/bmc_controller.go
  • internal/controller/suite_test.go
  • test/e2e/bmc_certificate_e2e_test.go
  • test/e2e/testdata/selfsigned-issuer.yaml
  • tools.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
docs/certificate-management.md (4)

544-548: ⚠️ Potential issue | 🟡 Minor

Rollback cleanup targets the wrong cert-manager resource.

Lines 546-547 delete Certificate resources, but this feature path documents and tracks CertificateRequest in BMC status. Cleanup steps should target CertificateRequest objects.

Suggested fix
-# Delete Certificate resources
-kubectl delete certificate -l app.kubernetes.io/managed-by=metal-operator
+# Delete CertificateRequest resources created for BMC cert flow
+kubectl delete certificaterequest -l app.kubernetes.io/managed-by=metal-operator
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/certificate-management.md` around lines 544 - 548, The cleanup step
currently deletes Certificate resources; change it to delete CertificateRequest
objects instead because the rollback path and BMC status track
CertificateRequest. Update the command that references "kubectl delete
certificate -l app.kubernetes.io/managed-by=metal-operator" to target
"CertificateRequest" (i.e., kubectl delete certificaterequest ...) and keep the
same label selector so the cleanup removes the correct cert-manager resources
referenced by the BMC status.

360-368: ⚠️ Potential issue | 🟡 Minor

Use CertificateRequest commands in this section.

Line 360 says “Check Certificate Request” but commands on Lines 363-367 still inspect Certificate. Point users to CertificateRequest (ideally using status.certificateRequestName) to match controller behavior.

Suggested fix
-View the underlying cert-manager Certificate resource:
+View the underlying cert-manager CertificateRequest resource:

 ```bash
-# List certificates
-kubectl get certificate
-
-# Describe specific certificate
-kubectl describe certificate my-bmc-cert
+# Get CertificateRequest name from BMC status
+CERT_REQUEST=$(kubectl get bmc my-bmc -o jsonpath='{.status.certificateRequestName}')
+
+# Describe specific CertificateRequest
+kubectl describe certificaterequest "$CERT_REQUEST"
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/certificate-management.md around lines 360 - 368, The commands under
"Check Certificate Request" currently inspect Certificate resources; change them
to target CertificateRequest using the BMC status.certificateRequestName.
Replace the kubectl get/describe certificate lines with steps to read the
CertificateRequest name from the BMC (e.g., using kubectl get bmc my-bmc -o
jsonpath='{.status.certificateRequestName}') and then run kubectl describe
certificaterequest ; reference CertificateRequest,
status.certificateRequestName, and the BMC resource name (my-bmc) when updating
the docs.


</details>

---

`417-423`: _⚠️ Potential issue_ | _🟡 Minor_

**Diagnosis commands should not check `Certificate` here.**

Line 417 currently checks `Certificate`, but the flow tracks `status.certificateRequestName`. Use `CertificateRequest` for consistent troubleshooting.  
 
<details>
<summary>Suggested fix</summary>

```diff
-# Check Certificate status
-kubectl describe certificate my-bmc-cert
-
 # Check CertificateRequest
 CERT_REQUEST=$(kubectl get bmc my-bmc -o jsonpath='{.status.certificateRequestName}')
-kubectl describe certificaterequest $CERT_REQUEST
+kubectl describe certificaterequest "$CERT_REQUEST"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/certificate-management.md` around lines 417 - 423, The docs currently
instruct to run "kubectl describe certificate my-bmc-cert" which is inconsistent
with the flow that reads status.certificateRequestName; replace that Certificate
check with a CertificateRequest check by describing the CertificateRequest
referenced by status.certificateRequestName (use the CERT_REQUEST variable and
run "kubectl describe certificaterequest $CERT_REQUEST"), and remove or update
the standalone Certificate describe line so troubleshooting consistently uses
CertificateRequest.

44-49: ⚠️ Potential issue | 🟡 Minor

Add a language to the fenced “Expected output” block.

Line 44 should use a language tag (e.g., text) to satisfy markdownlint MD040.

Suggested fix
-```
+```text
 NAME                                       READY   STATUS    RESTARTS   AGE
 cert-manager-7d9f6d9d8c-xxxxx              1/1     Running   0          1m
 cert-manager-cainjector-5c7d9f5f5-xxxxx    1/1     Running   0          1m
 cert-manager-webhook-6b8f9f5d5-xxxxx       1/1     Running   0          1m
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/certificate-management.md around lines 44 - 49, The fenced "Expected
output" code block containing the lines beginning with "NAME
READY STATUS ..." is missing a language tag which triggers markdownlint MD040;
update that fenced block by adding a language identifier (for example text)
immediately after the opening backticks (change totext) so the block
becomes text ... and the linter will accept it.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @docs/certificate-management.md:

  • Around line 544-548: The cleanup step currently deletes Certificate resources;
    change it to delete CertificateRequest objects instead because the rollback path
    and BMC status track CertificateRequest. Update the command that references
    "kubectl delete certificate -l app.kubernetes.io/managed-by=metal-operator" to
    target "CertificateRequest" (i.e., kubectl delete certificaterequest ...) and
    keep the same label selector so the cleanup removes the correct cert-manager
    resources referenced by the BMC status.
  • Around line 360-368: The commands under "Check Certificate Request" currently
    inspect Certificate resources; change them to target CertificateRequest using
    the BMC status.certificateRequestName. Replace the kubectl get/describe
    certificate lines with steps to read the CertificateRequest name from the BMC
    (e.g., using kubectl get bmc my-bmc -o
    jsonpath='{.status.certificateRequestName}') and then run kubectl describe
    certificaterequest ; reference CertificateRequest,
    status.certificateRequestName, and the BMC resource name (my-bmc) when updating
    the docs.
  • Around line 417-423: The docs currently instruct to run "kubectl describe
    certificate my-bmc-cert" which is inconsistent with the flow that reads
    status.certificateRequestName; replace that Certificate check with a
    CertificateRequest check by describing the CertificateRequest referenced by
    status.certificateRequestName (use the CERT_REQUEST variable and run "kubectl
    describe certificaterequest $CERT_REQUEST"), and remove or update the standalone
    Certificate describe line so troubleshooting consistently uses
    CertificateRequest.
  • Around line 44-49: The fenced "Expected output" code block containing the
    lines beginning with "NAME READY STATUS
    ..." is missing a language tag which triggers markdownlint MD040; update that
    fenced block by adding a language identifier (for example text) immediately
    after the opening backticks (change totext) so the block becomes text ... and the linter will accept it.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `8e2f4841-91c3-4413-9b01-9eefe38e36bd`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between f46e3ad5df58b28bbaf4ab4ccd672c9bd27394a2 and c995673775829769df9c37f3de6390746baeb566.

</details>

<details>
<summary>⛔ Files ignored due to path filters (2)</summary>

* `dist/chart/templates/crd/metal.ironcore.dev_bmcs.yaml` is excluded by `!**/dist/**`
* `dist/chart/templates/rbac/role.yaml` is excluded by `!**/dist/**`

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `config/crd/cert-manager/cert-manager.crds.yaml`
* `docs/api-reference/api.md`
* `docs/certificate-management.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

…ert-manager CRDs

Two test failures were blocking PR #765:

1. Integration test cleanup timeout:
   - Server deletion was blocked when BMC was deleted first
   - ServerReconciler would keep reconciling and failing to get BMC client
   - Added graceful handling in reconcileExists(): when BMC is not found,
     remove finalizer and explicitly delete the Server
   - This allows test cleanup to proceed without timeout

2. E2E test missing CRDs:
   - cert-manager CRDs were not installed by 'make install'
   - E2E tests could not create CertificateRequest resources
   - Added cert-manager/cert-manager.crds.yaml to config/crd/kustomization.yaml
   - Now 'make install' includes both metal-operator and cert-manager CRDs

Both fixes are test infrastructure improvements with no production impact.
The Server cleanup fix is defensive and handles the edge case where a BMC
is deleted but its owned Servers have finalizers preventing cascade deletion.

Fixes: #765
Consolidate BMC existence checking into shouldDelete() and delete() functions
to remove duplication between reconcileExists() and shouldDelete().

Changes:
- Moved BMC existence check from reconcileExists() to shouldDelete()
- Enhanced shouldDelete() to detect missing BMC even when Server
  doesn't have DeletionTimestamp yet (handles cascade deletion)
- Enhanced delete() to initiate deletion if Server lacks DeletionTimestamp
- Simplified reconcileExists() back to simple delegation pattern

This maintains the same behavior as before while improving code organization:
- Single source of truth for "should Server be deleted?" decision
- Clearer separation of concerns
- More maintainable codebase

Fixes the cleanup timeout in BMC certificate integration tests by properly
handling the case where BMC is deleted before Server receives DeletionTimestamp.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/server_controller.go`:
- Around line 163-167: The reconcile currently postpones deletion whenever
controllerutil.ContainsFinalizer(server, ServerFinalizer) && server.Status.State
== metalv1alpha1.ServerStateMaintenance, which can wedge the Server if the BMC
is gone; modify that condition to only postpone when the BMC is still reachable
(e.g., include the existing BMC presence/reachability check you use elsewhere
such as bmc != nil or isBMCAccessible(ctx, bmcClient)). Update the branch in the
block that references controllerutil.ContainsFinalizer, ServerFinalizer,
server.Status.State and the similar check at the other location (lines
referenced around the missing-BMC early return) so that delete() can proceed
when the BMC is missing/unreachable, ensuring finalizer cleanup remains
idempotent. Ensure you keep delete() invocation and finalizer removal behavior
unchanged otherwise.
- Around line 180-187: The deletion initiation is currently guarded by the
finalizer check which skips deleting Servers that lack the finalizer when the
first BMC lookup fails; update the delete() flow so the block that checks
server.DeletionTimestamp.IsZero() and calls r.Client.Delete(ctx, server) (and
requeues) runs regardless of whether ServerFinalizer is present. Concretely,
move or duplicate the DeletionTimestamp.IsZero() / r.Client.Delete(...) logic
out from behind the finalizer guard (or add a branch that performs the
missing-BMC delete path when shouldDelete() is true but ServerFinalizer is
absent), ensuring that missing-BMC servers are deleted (or deletion retried)
even if the finalizer hasn't been added.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 90e3b4e8-5ccc-450a-987e-6e8e21448a04

📥 Commits

Reviewing files that changed from the base of the PR and between c995673 and 51c1efc.

📒 Files selected for processing (2)
  • config/crd/kustomization.yaml
  • internal/controller/server_controller.go

@afritzler
Copy link
Copy Markdown
Member

@coderabbitai instead of creating our or certificate handling, wouldn't it be for a first start easier to just rely on TLS Secret ref in the BMC type. This secret is an effective secret generated out of a cert-manager managed Certificate. The BMC controller must just be made aware of changes (proper watch setup) and update the corresponding TLS certificate via Redfish.

@afritzler
Copy link
Copy Markdown
Member

afritzler commented Mar 30, 2026

@stefanhipfel can we squash/rebase that PR. Those +/- additions are off due to the GH update branch trigger.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Tip

For best results, initiate chat on the files or code changes.

@afritzler That's a compelling simplification. Here's a breakdown of what that approach would look like versus the current implementation:


Proposed simpler architecture

cert-manager Certificate CR  →  TLS Secret (auto-renewed)
                                      ↓
                              BMC controller watches Secret
                                      ↓
                              ReplaceCertificate via Redfish

What would change in the API:

spec:
  # Instead of the full BMCCertificateSpec (issuerRef, duration, CSR params...)
  tlsCertificateRef:
    name: my-bmc-tls-secret   # a pre-existing cert-manager-managed TLS Secret

What the controller needs to do:

  1. Watch the referenced corev1.Secret for changes (via Owns or a custom EnqueueRequestsFromMapFunc on Secrets with a label).
  2. On change, call ReplaceCertificate(ctx, cert, privateKey) via Redfish.
  3. Update a status condition (CertificateReady) accordingly.

This is largely already possible with the ReplaceCertificate method introduced in bmc/redfish.go — the heavy lifting is already there.


Trade-offs vs. current PR

Aspect Current PR Proposed approach
Operator complexity High (~700 LOC cert controller) Low (watch + push)
User responsibility Low (operator manages full lifecycle) Medium (user creates Certificate CR)
CSR-based key gen (BMC generates its own key) ✅ Supported ❌ Not possible (key is generated externally)
Cert renewal automation Fully automatic Automatic (cert-manager renews → Secret updates → controller pushes)
Issuer flexibility Via issuerRef in spec Full flexibility (any issuer, any cert-manager config)
Separation of concerns Mixed Clean
Test surface Very large Small

Main thing lost

The current PR supports a CSR-based flow where the BMC generates its own private key (via GenerateCSR), which means the private key never leaves the BMC hardware — a meaningful security property for production PKI. The simpler approach would always have the private key generated outside the BMC (by cert-manager) and pushed in.

If that security property is not a requirement for the initial scope, the simpler Secret-ref approach is significantly easier to implement, maintain, and reason about, and it composes well with whatever cert-manager setup users already have.

Would you like to log this as a design decision / issue for the PR author to consider refactoring toward before merging?

…uerRef, update CRD, and improve documentation. Refactor test certificate generation for consistency.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
internal/controller/bmc_certificate.go (1)

392-413: ⚠️ Potential issue | 🔴 Critical

The BMC-generated CSR path still writes an invalid TLS secret.

When the private key stays on the BMC, privateKeyPEM is empty, but this code always stores kubernetes.io/tls. That means the success path tries to persist a TLS secret without tls.key, so certificate installation can succeed and secret storage then fails. Use Opaque when the key is absent, or always store this payload as Opaque.

In Kubernetes v1.35, does API validation for a Secret of type `kubernetes.io/tls` require both `tls.crt` and `tls.key`? Please cite the relevant validation rule or official documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 392 - 413, The Secret is
being created as corev1.SecretTypeTLS even when privateKeyPEM is empty which
violates TLS secret validation; change the creation logic for the Secret (the
secret variable) to set Type conditionally: use corev1.SecretTypeTLS only when
privateKeyPEM is non-empty (and include both tls.crt and tls.key), otherwise set
Type to corev1.SecretTypeOpaque (and only write tls.crt or a different key name
if desired); update places referencing secret.Data population (tls.crt/tls.key
and any callers expecting TLS type) to match the conditional Type and ensure no
empty tls.key is stored when privateKeyPEM == "" (symbols: secret,
privateKeyPEM, certPEM, corev1.SecretTypeTLS, corev1.SecretTypeOpaque).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1alpha1/bmc_types.go`:
- Around line 103-120: CertificateIssuerRef is ambiguous for cluster-scoped BMC
because Kind defaults to "Issuer" but there's no namespace to locate a
namespaced Issuer; update the type to either require namespace when
Kind=="Issuer" or restrict Kind to ClusterIssuer—prefer adding an optional
Namespace string `Namespace string json:"namespace,omitempty"` to
CertificateIssuerRef and add kubebuilder validation so Namespace is required
when Kind==Issuer (or alternatively change the default/Enum to only
ClusterIssuer if you choose to disallow Issuer); update the doc comment for
CertificateIssuerRef and the kubebuilder markers on Kind to reflect the chosen
behavior so the controller can deterministically resolve issuers.

In `@internal/controller/bmc_certificate_test.go`:
- Around line 568-586: The test "should renew certificate when it expires soon"
asserts the wrong behavior by expecting an error; instead, update the test to
verify that reconciler.reconcileCertificate(ctx, bmcObj) advanced the BMC
object's certificate state into the pending/requested path (e.g., check
bmcObj.Status for the pending/requested condition or CertificateRequest
reference) rather than using Expect(err).To(HaveOccurred()); locate the test and
replace the error assertion with an assertion that
reconciler.reconcileCertificate invoked the renewal flow by inspecting the BMC
status fields (CertificateRequest/CertificateRequestStatus/CertificateSecretRef)
or any pending/requested condition updated by reconcileCertificate.

In `@internal/controller/bmc_certificate.go`:
- Around line 274-279: The current flow creates a new private key then calls
storePrivateKey which ignores AlreadyExists, so a retry can leave the old
${bmc}-key secret and later loadPrivateKey will pair a new certificate with
stale key material; modify storePrivateKey (and the other occurrence around the
second block) to detect an existing secret and either update its contents with
the newly generated privateKey or write a request-scoped secret name (e.g.,
append a request UID or timestamp) and return that name so subsequent
loadPrivateKey calls use the correct secret; ensure the change updates code
paths that call storePrivateKey and loadPrivateKey so they agree on the secret
naming/versioning strategy and handle conflict errors safely.
- Around line 668-683: buildCSRParameters currently only adds
spec.certificate.dnsNames and ipAddresses to AlternativeNames, creating a SAN
mismatch with the operator-generated path which also injects status.IP and
CN/Hostname fallbacks; refactor by centralizing SAN calculation into a single
helper (e.g. computeSANs or getCertificateSANs) that collects
spec.Certificate.DNSNames, spec.Certificate.IPAddresses, the BMC status.IP, and
any Hostname/CN fallback values derived by getCertificateCommonName, then have
buildCSRParameters and the operator CSR-generation path both call that helper so
both CSR strategies use the exact same DNS/IP SAN set.
- Around line 133-145: The current logic in the CSR selection falls back to
operator-generated keys on any error from tryBMCGeneratedCSR; change this so
only an ErrNotSupported sentinel triggers the fallback: define a package-level
var ErrNotSupported error and update tryBMCGeneratedCSR to return
ErrNotSupported when the BMC/vendor explicitly does not support CSR generation,
then modify the block in the controller (the tryBMCGeneratedCSR /
generateOperatorCSR flow) to check for errors.Is(err, ErrNotSupported) before
falling back to generateOperatorCSR; for any other error call
handleCertificateError(ctx, bmcObj, err) and return (nil, nil, "", err) (or the
value handleCertificateError expects) instead of silently switching ownership.
- Around line 57-68: The controller currently short-circuits when an existing
certificate/request exists without comparing desired certificate inputs; compute
a stable hash (e.g., SHA256) of the desired certificate spec fields (issuerRef,
duration, commonName, dnsNames, ipAddresses, and any CSR params) during
reconciliation, store it in the BMC object's status (e.g.,
Status.CertificateSpecHash), and compare it to the stored value: if the hash
differs, treat the spec as changed and delete/recreate the CertificateRequest
and/or Certificate secret (and clear Status.CertificateSecretRef) so a new
request is issued; update the logic around verifyCertificateValidity,
setCertificateCondition and the existing CertificateRequest fast-path (the block
that reuses a stable-named CertificateRequest) to use this hash check so spec
edits trigger renewal rather than being ignored.

---

Duplicate comments:
In `@internal/controller/bmc_certificate.go`:
- Around line 392-413: The Secret is being created as corev1.SecretTypeTLS even
when privateKeyPEM is empty which violates TLS secret validation; change the
creation logic for the Secret (the secret variable) to set Type conditionally:
use corev1.SecretTypeTLS only when privateKeyPEM is non-empty (and include both
tls.crt and tls.key), otherwise set Type to corev1.SecretTypeOpaque (and only
write tls.crt or a different key name if desired); update places referencing
secret.Data population (tls.crt/tls.key and any callers expecting TLS type) to
match the conditional Type and ensure no empty tls.key is stored when
privateKeyPEM == "" (symbols: secret, privateKeyPEM, certPEM,
corev1.SecretTypeTLS, corev1.SecretTypeOpaque).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 41e98c13-2a60-4688-b4a8-b034e484e991

📥 Commits

Reviewing files that changed from the base of the PR and between 51c1efc and 2fc47fe.

📒 Files selected for processing (7)
  • api/v1alpha1/bmc_types.go
  • config/crd/bases/metal.ironcore.dev_bmcs.yaml
  • docs/certificate-management.md
  • internal/controller/bmc_certificate.go
  • internal/controller/bmc_certificate_integration_test.go
  • internal/controller/bmc_certificate_test.go
  • internal/controller/suite_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/certificate-management.md

Comment on lines +103 to +120
// CertificateIssuerRef defines a reference to a cert-manager Issuer or ClusterIssuer.
type CertificateIssuerRef struct {
// Name is the name of the Issuer or ClusterIssuer.
// +required
Name string `json:"name"`

// Kind is the type of the issuer (Issuer or ClusterIssuer).
// +kubebuilder:validation:Enum=Issuer;ClusterIssuer
// +kubebuilder:default=Issuer
// +optional
Kind string `json:"kind,omitempty"`

// Group is the API group of the Issuer or ClusterIssuer.
// Defaults to cert-manager.io if not specified.
// This field allows referencing issuers from external cert-manager implementations.
// +kubebuilder:default=cert-manager.io
// +optional
Group string `json:"group,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Issuer refs are ambiguous on a cluster-scoped BMC.

Because BMC is cluster-scoped, this type has no way to identify which namespace a namespaced Issuer should come from, yet Kind defaults to Issuer. That forces the controller to resolve namespaced issuers from an implicit namespace, which is surprising and impossible to override per BMC. Either add a namespace field/validation for Kind=Issuer, or restrict this API to ClusterIssuer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/bmc_types.go` around lines 103 - 120, CertificateIssuerRef is
ambiguous for cluster-scoped BMC because Kind defaults to "Issuer" but there's
no namespace to locate a namespaced Issuer; update the type to either require
namespace when Kind=="Issuer" or restrict Kind to ClusterIssuer—prefer adding an
optional Namespace string `Namespace string json:"namespace,omitempty"` to
CertificateIssuerRef and add kubebuilder validation so Namespace is required
when Kind==Issuer (or alternatively change the default/Enum to only
ClusterIssuer if you choose to disallow Issuer); update the doc comment for
CertificateIssuerRef and the kubebuilder markers on Kind to reflect the chosen
behavior so the controller can deterministically resolve issuers.

Comment on lines +568 to +586
It("should renew certificate when it expires soon", func() {
// Create certificate that expires in 20 days
cert := generateTestCertificate(20*24*time.Hour, "test-bmc.example.com")
secret := createCertificateSecret(cert)

bmcObj.Status.CertificateSecretRef = &corev1.LocalObjectReference{Name: "test-secret"}

k8sClient = fake.NewClientBuilder().
WithScheme(scheme).
WithObjects(bmcObj, secret).
WithStatusSubresource(bmcObj).
Build()
reconciler.Client = k8sClient

// Should proceed with renewal (not return early)
_, err := reconciler.reconcileCertificate(ctx, bmcObj)
// Will error trying to connect to BMC, but that's expected in unit test
// We're just verifying it doesn't return early
Expect(err).To(HaveOccurred()) // Expected: BMC connection error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This test asserts the wrong contract for renewal.

Once the existing certificate is near expiry, reconcileCertificate continues into request creation / graceful-degradation logic; it does not return the raw connection error from this path. As written, the test will fail even when renewal is working. Assert that reconcile moved into a pending/requested state instead of expecting HaveOccurred().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate_test.go` around lines 568 - 586, The test
"should renew certificate when it expires soon" asserts the wrong behavior by
expecting an error; instead, update the test to verify that
reconciler.reconcileCertificate(ctx, bmcObj) advanced the BMC object's
certificate state into the pending/requested path (e.g., check bmcObj.Status for
the pending/requested condition or CertificateRequest reference) rather than
using Expect(err).To(HaveOccurred()); locate the test and replace the error
assertion with an assertion that reconciler.reconcileCertificate invoked the
renewal flow by inspecting the BMC status fields
(CertificateRequest/CertificateRequestStatus/CertificateSecretRef) or any
pending/requested condition updated by reconcileCertificate.

Comment on lines +57 to +68
// Check if existing certificate is valid
if bmcObj.Status.CertificateSecretRef != nil {
valid, err := r.verifyCertificateValidity(ctx, bmcObj)
if err != nil {
log.Error(err, "Failed to verify certificate validity")
// Continue with renewal on verification error
} else if valid {
log.V(1).Info("Existing certificate is valid")
return ctrl.Result{}, r.setCertificateCondition(ctx, bmcObj, corev1.ConditionTrue,
metalv1alpha1.BMCCertificateReadyReasonIssued, "Certificate is valid")
}
log.V(1).Info("Certificate expired or invalid, renewing")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Certificate spec edits are ignored once a cert or request exists.

The valid-certificate fast path only checks expiry, and the existing CertificateRequest fast path reuses any object with the stable name without comparing the desired CSR inputs. Changing issuerRef, duration, commonName, dnsNames, or ipAddresses will therefore keep serving the old request/certificate until it expires or fails. Please persist a hash/generation of the desired certificate inputs and recreate when it changes.

Also applies to: 228-237

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 57 - 68, The controller
currently short-circuits when an existing certificate/request exists without
comparing desired certificate inputs; compute a stable hash (e.g., SHA256) of
the desired certificate spec fields (issuerRef, duration, commonName, dnsNames,
ipAddresses, and any CSR params) during reconciliation, store it in the BMC
object's status (e.g., Status.CertificateSpecHash), and compare it to the stored
value: if the hash differs, treat the spec as changed and delete/recreate the
CertificateRequest and/or Certificate secret (and clear
Status.CertificateSecretRef) so a new request is issued; update the logic around
verifyCertificateValidity, setCertificateCondition and the existing
CertificateRequest fast-path (the block that reuses a stable-named
CertificateRequest) to use this hash check so spec edits trigger renewal rather
than being ignored.

Comment on lines +133 to +145
// Try BMC-generated CSR first (more secure)
csrPEM, err = r.tryBMCGeneratedCSR(ctx, bmcObj)
if err == nil && len(csrPEM) > 0 {
log.V(1).Info("Successfully generated CSR from BMC")
return csrPEM, nil, "BMC", nil
}
log.V(1).Info("BMC CSR generation failed or unsupported, falling back to operator generation", "error", err)

// Fallback to operator-generated CSR
csrPEM, privateKey, err = r.generateOperatorCSR(ctx, bmcObj)
if err != nil {
return nil, nil, "", fmt.Errorf("failed to generate operator CSR: %w", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only fall back to operator-generated CSRs on unsupported hardware.

Any error from tryBMCGeneratedCSR currently switches to operator-side key generation. Temporary auth/network errors should go through handleCertificateError; otherwise a transient outage silently changes the key-ownership model and can issue unnecessary requests. Gate the fallback on a dedicated ErrNotSupported sentinel.

Based on learnings: In internal/controller/*.go, implement and utilize a dedicated ErrNotSupported sentinel error to distinguish vendor-not-supported scenarios from transient errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 133 - 145, The current
logic in the CSR selection falls back to operator-generated keys on any error
from tryBMCGeneratedCSR; change this so only an ErrNotSupported sentinel
triggers the fallback: define a package-level var ErrNotSupported error and
update tryBMCGeneratedCSR to return ErrNotSupported when the BMC/vendor
explicitly does not support CSR generation, then modify the block in the
controller (the tryBMCGeneratedCSR / generateOperatorCSR flow) to check for
errors.Is(err, ErrNotSupported) before falling back to generateOperatorCSR; for
any other error call handleCertificateError(ctx, bmcObj, err) and return (nil,
nil, "", err) (or the value handleCertificateError expects) instead of silently
switching ownership.

Comment on lines +274 to +279
// Store private key in a temporary secret if generated by operator
if privateKey != nil {
if err := r.storePrivateKey(ctx, bmcObj, privateKey); err != nil {
return nil, fmt.Errorf("failed to store private key: %w", err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retries can pair a new certificate with an old private key.

storePrivateKey ignores AlreadyExists, but each operator-side retry generates a fresh key before creating a new CertificateRequest. If the old ${bmc}-key secret is still around from a failed/deleted request, loadPrivateKey will later install the newly signed certificate with stale key material. Update the existing secret contents or version the key secret per request.

Also applies to: 477-479

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 274 - 279, The current
flow creates a new private key then calls storePrivateKey which ignores
AlreadyExists, so a retry can leave the old ${bmc}-key secret and later
loadPrivateKey will pair a new certificate with stale key material; modify
storePrivateKey (and the other occurrence around the second block) to detect an
existing secret and either update its contents with the newly generated
privateKey or write a request-scoped secret name (e.g., append a request UID or
timestamp) and return that name so subsequent loadPrivateKey calls use the
correct secret; ensure the change updates code paths that call storePrivateKey
and loadPrivateKey so they agree on the secret naming/versioning strategy and
handle conflict errors safely.

Comment on lines +566 to +574
// Check if certificate is expired or will expire soon
now := time.Now()
renewalTime := cert.NotAfter.Add(-DefaultCertificateRenewalBuffer)
if now.After(cert.NotAfter) {
return false, nil // Certificate expired
}
if now.After(renewalTime) {
return false, nil // Certificate will expire soon
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The fixed 30-day buffer causes endless renewals for short-lived certs.

If spec.certificate.duration is set below 30 days, renewalTime is already in the past as soon as the cert is issued, so every fresh certificate is treated as expiring and the controller loops forever. Clamp the renewal window to the actual certificate lifetime instead of a fixed constant.

Comment on lines +668 to +683
// buildCSRParameters builds CSR parameters from BMC spec.
func (r *BMCReconciler) buildCSRParameters(bmcObj *metalv1alpha1.BMC) bmc.CSRParameters {
params := bmc.CSRParameters{
CommonName: r.getCertificateCommonName(bmcObj),
KeyPairAlgorithm: "RSA",
KeyBitLength: DefaultRSAKeySize,
}

// Add alternative names (DNS + IPs)
if len(bmcObj.Spec.Certificate.DNSNames) > 0 {
params.AlternativeNames = append(params.AlternativeNames, bmcObj.Spec.Certificate.DNSNames...)
}
if len(bmcObj.Spec.Certificate.IPAddresses) > 0 {
params.AlternativeNames = append(params.AlternativeNames, bmcObj.Spec.Certificate.IPAddresses...)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The BMC CSR path builds a different SAN set than the operator path.

buildCSRParameters only forwards spec.certificate.dnsNames and ipAddresses, while the operator-generated path also injects status.IP. If the secure/BMC-generated path succeeds, the issued cert can miss the address the operator actually connects to, and Hostname/CN fallbacks are never mirrored into SANs. Centralize SAN calculation and feed the same DNS/IP set into both CSR strategies.

Also applies to: 687-705

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 668 - 683,
buildCSRParameters currently only adds spec.certificate.dnsNames and ipAddresses
to AlternativeNames, creating a SAN mismatch with the operator-generated path
which also injects status.IP and CN/Hostname fallbacks; refactor by centralizing
SAN calculation into a single helper (e.g. computeSANs or getCertificateSANs)
that collects spec.Certificate.DNSNames, spec.Certificate.IPAddresses, the BMC
status.IP, and any Hostname/CN fallback values derived by
getCertificateCommonName, then have buildCSRParameters and the operator
CSR-generation path both call that helper so both CSR strategies use the exact
same DNS/IP SAN set.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (7)
internal/controller/server_controller.go (1)

165-181: ⚠️ Potential issue | 🔴 Critical

Deletion can still hang in Maintenance when the BMC exists but is unreachable.

This branch only proves the BMC CR exists. If Redfish is down, shouldDelete() returns false here and the BMCUnAvailableError path on Lines 256-259 just requeues, so the finalizer can still stick forever; the nil-BMCRef fallthrough has the same effect. Only postpone deletion after a real reachability check, otherwise proceed with delete().

As per coding guidelines, internal/controller/**/*_controller.go should implement idempotent reconciliation logic - safe to run the same reconciliation multiple times without side effects, and implement finalizers to clean up external resources before resource deletion.

Also applies to: 260-264

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/server_controller.go` around lines 165 - 181, The current
shouldDelete() logic only verifies the BMC CR exists (r.Get) and thus can hang
if the BMC CR exists but Redfish is unreachable; change the branch that checks
ServerFinalizer + server.Status == ServerStateMaintenance to perform a real
reachability check against the BMC (e.g., call your redfish reachability helper
or the same check used elsewhere) using server.Spec.BMCRef, and if the BMC is
unreachable (timeout/connection error) log that and return true to allow
deletion to proceed; only return false (postpone) when the reachability check
confirms the BMC is reachable. Apply the same reachable-vs-unreachable semantics
to the BMCUnAvailableError path so finalizer removal isn’t indefinitely
postponed when Redfish is down.
internal/controller/bmc_certificate.go (6)

144-155: ⚠️ Potential issue | 🟠 Major

Only fall back to operator-side keys for explicit "not supported" errors.

Transient connection or auth failures from tryBMCGeneratedCSR currently switch the flow to operator-generated keys, which silently changes the key-ownership model instead of surfacing a retryable certificate failure. Based on learnings, "implement and utilize a dedicated ErrNotSupported sentinel error to distinguish vendor-not-supported scenarios from transient errors."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 144 - 155, Change the
fallback logic so transient errors from r.tryBMCGeneratedCSR do not trigger
operator-key generation: introduce/consume the sentinel error ErrNotSupported
from tryBMCGeneratedCSR and only fall back when err == ErrNotSupported (and
csrPEM empty); for any other non-nil err return that error (or wrap it) so
callers can retry/handle transient failures; only call r.generateOperatorCSR
when ErrNotSupported is returned, and update the log messages to reflect this
distinction.

637-640: ⚠️ Potential issue | 🟠 Major

Self-signed issuers still need trust material in the stored secret.

Returning an empty CA for SelfSigned means the managed secret usually has no ca.crt, even though the issued self-signed certificate is the trust anchor. That leaves later secure clients without enough material to verify the BMC's new certificate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 637 - 640, The code
currently returns an empty CA when spec.SelfSigned != nil which causes the
managed secret to lack a ca.crt; instead, when detecting a SelfSigned issuer
(check spec.SelfSigned and issuerName in the same function), extract the
issuer's certificate bytes (the self-signed cert / trust anchor produced for the
BMC) and return those as the CA value so the secret includes ca.crt; update the
path that builds the secret to write this CA bytes into the ca.crt entry rather
than returning "".

697-703: ⚠️ Potential issue | 🟠 Major

Native and operator CSR flows request different SANs.

buildCSRParameters only forwards the explicit SANs from spec.certificate, while the operator-generated path separately derives additional IP SANs. The certificate contents therefore depend on whether native CSR generation succeeds, which makes the secure/native path miss addresses the fallback path can include.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 697 - 703,
buildCSRParameters currently only forwards explicit SANs from
bmcObj.Spec.Certificate (DNSNames and IPAddresses) into params.AlternativeNames,
causing a mismatch with the operator-generated CSR path which also derives extra
IP SANs; update buildCSRParameters to also include the same derived IP SANs the
operator path adds (i.e., call or replicate the operator-derived IP collection
logic used elsewhere) so both native and operator CSR flows produce identical
SAN sets—modify buildCSRParameters to merge bmcObj.Spec.Certificate.DNSNames,
bmcObj.Spec.Certificate.IPAddresses and the derived IP list into
params.AlternativeNames.

587-595: ⚠️ Potential issue | 🟠 Major

Clamp the renewal buffer for short-lived certificates.

A fixed 30-day buffer makes any certificate with a lifetime below 30 days look "expiring soon" immediately after issuance, so the controller will keep re-entering renewal logic for every fresh short-lived cert.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 587 - 595, Clamp the
renewal buffer for short-lived certificates so the controller doesn't treat
freshly-issued short-lived certs as "expiring soon": compute lifetime :=
cert.NotAfter.Sub(cert.NotBefore) and derive effectiveBuffer :=
DefaultCertificateRenewalBuffer but if lifetime <=
DefaultCertificateRenewalBuffer set effectiveBuffer = lifetime/2 (or lifetime -
time.Minute if you prefer a fixed margin), then use renewalTime :=
cert.NotAfter.Add(-effectiveBuffer) in place of the current renewalTime; update
the checks that reference renewalTime (the block using now, renewalTime,
cert.NotAfter) so short-lived certs are not immediately flagged for renewal.

57-68: ⚠️ Potential issue | 🔴 Critical

Renewal and spec changes are pinned to the first issued request.

Lines 57-67 short-circuit on any still-valid stored secret without comparing the desired certificate inputs. Once that secret is no longer valid, Lines 84-87 just reinstall the certificate from the old ready CertificateRequest instead of creating a fresh one. Spec edits and real renewals therefore never generate a new CSR.

Also applies to: 72-87


498-500: ⚠️ Potential issue | 🟠 Major

Don't keep stale private keys across reissues.

Every operator-generated retry creates a fresh keypair, but Line 498 preserves any old ${bmc}-key secret. The next install can then combine a newly issued certificate with stale private key material from the previous attempt.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_certificate.go` around lines 498 - 500, The current
r.Create(ctx, secret) call can leave a stale `${bmc}-key` secret present and
reused across reissues; before creating the new operator-generated private key
secret (the variable secret used with r.Create), explicitly remove or replace
any existing secret with the same name (e.g., query for a Secret with
secret.Name and delete it if found, or use a replace semantics such as
controllerutil.CreateOrUpdate to overwrite) so that each reissue always uses the
freshly generated keypair rather than a leftover secret; ensure to handle
NotFound and other errors appropriately around r.Delete/r.Get or CreateOrUpdate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bmc/redfish.go`:
- Around line 1105-1115: GenerateCSR fails because
schemas.CertificateServiceGenerateCSRParameters uses OrganizationalUnit and
KeyPairAlgorithm/KeyBitLength while bmc.CSRParameters exposes OrganizationUnit
and doesn't declare KeyPairAlgorithm/KeyBitLength; update the mapping and types
so they match: in GenerateCSR map bmc.CSRParameters.OrganizationUnit to
schemas.CertificateServiceGenerateCSRParameters.OrganizationalUnit (or rename
the schema field to OrganizationUnit consistently), and add KeyPairAlgorithm and
KeyBitLength to the bmc.CSRParameters definition in bmc/bmc.go (and the builder
in internal/controller/bmc_certificate.go) or remove those fields from the
schema mapping so the structs align and the code compiles.

In `@internal/controller/bmc_controller.go`:
- Around line 143-146: The current early return from r.reconcileCertificate(ctx,
bmcObj) prevents BMC connectivity, server discovery, and event subscription when
certificate handling only requests a requeue; change this to call
r.reconcileCertificate and capture its (result, err) into local vars (e.g.,
certResult, certErr) without returning immediately, continue with the normal BMC
flow (connectivity/discovery/subscription), and at the end of reconcile
merge/propagate the certificate outcome: if certErr != nil return certResult,
certErr; otherwise if certResult requests a requeue ensure the final returned
ctrl.Result includes that requeue directive (or combine with any other reconcile
result) so certificate retries do not block the rest of the reconcile.

In `@internal/controller/server_controller.go`:
- Around line 146-154: The code currently treats any NotFound from r.Get(ctx,
client.ObjectKey{Name: server.Spec.BMCRef.Name}, bmcObj) as a signal to delete
the Server (inside the server.DeletionTimestamp.IsZero() branch), which can
remove Servers created before their BMC exists; instead, when a BMC lookup
returns NotFound do not immediately return true/delete — either requeue the
reconcile to wait for the BMC to appear or only treat the missing BMC as a
deletion trigger if you can prove the Server was owned by that BMC (e.g., check
for an owner reference/UID that matches the expected BMC controller instead of
blindly deleting). Update reconcileExists()/delete() call site to requeue on
NotFound and add logic to rely on proper owner references (use
SetControllerReference when creating BMCs so garbage-collection via owner refs
is available) rather than forcing immediate deletion.

In `@test/e2e/bmc_certificate_e2e_test.go`:
- Around line 165-182: The test It("should have certificate Secret with expected
keys") wrongly asserts that the Secret contains tls.key even though
BMC-generated CSRs never store the private key; update the test to only assert
the presence of tls.crt (use the existing kubectl/jsonpath call that queries
{.data.tls\.crt} with bmcWithCertName and bmcTestNamespace via utils.Run) and
remove or stop asserting on the tls.key jsonpath (or make that check conditional
based on a flag if you must support both controller-generated and BMC-native CSR
flows).
- Around line 93-114: The test is asserting for cert-manager Certificate objects
named "<bmc>-cert" but the controller actually creates a CertificateRequest
"bmc-certreq-<bmc>" and stores the installed cert in "bmc-cert-<bmc>"; update
the test functions (e.g., verifyCertificateExists, verifyCertificateIssuer and
other similar checks mentioned) to query for the CertificateRequest resource
name format "bmc-certreq-%s" (using bmcWithCertName and bmcTestNamespace) and
validate its status/approved/issued fields, and then check the secret or
Certificate data under the actual installed cert name "bmc-cert-%s" rather than
"<bmc>-cert" so the assertions match internal/controller/bmc_certificate.go
behavior.

---

Duplicate comments:
In `@internal/controller/bmc_certificate.go`:
- Around line 144-155: Change the fallback logic so transient errors from
r.tryBMCGeneratedCSR do not trigger operator-key generation: introduce/consume
the sentinel error ErrNotSupported from tryBMCGeneratedCSR and only fall back
when err == ErrNotSupported (and csrPEM empty); for any other non-nil err return
that error (or wrap it) so callers can retry/handle transient failures; only
call r.generateOperatorCSR when ErrNotSupported is returned, and update the log
messages to reflect this distinction.
- Around line 637-640: The code currently returns an empty CA when
spec.SelfSigned != nil which causes the managed secret to lack a ca.crt;
instead, when detecting a SelfSigned issuer (check spec.SelfSigned and
issuerName in the same function), extract the issuer's certificate bytes (the
self-signed cert / trust anchor produced for the BMC) and return those as the CA
value so the secret includes ca.crt; update the path that builds the secret to
write this CA bytes into the ca.crt entry rather than returning "".
- Around line 697-703: buildCSRParameters currently only forwards explicit SANs
from bmcObj.Spec.Certificate (DNSNames and IPAddresses) into
params.AlternativeNames, causing a mismatch with the operator-generated CSR path
which also derives extra IP SANs; update buildCSRParameters to also include the
same derived IP SANs the operator path adds (i.e., call or replicate the
operator-derived IP collection logic used elsewhere) so both native and operator
CSR flows produce identical SAN sets—modify buildCSRParameters to merge
bmcObj.Spec.Certificate.DNSNames, bmcObj.Spec.Certificate.IPAddresses and the
derived IP list into params.AlternativeNames.
- Around line 587-595: Clamp the renewal buffer for short-lived certificates so
the controller doesn't treat freshly-issued short-lived certs as "expiring
soon": compute lifetime := cert.NotAfter.Sub(cert.NotBefore) and derive
effectiveBuffer := DefaultCertificateRenewalBuffer but if lifetime <=
DefaultCertificateRenewalBuffer set effectiveBuffer = lifetime/2 (or lifetime -
time.Minute if you prefer a fixed margin), then use renewalTime :=
cert.NotAfter.Add(-effectiveBuffer) in place of the current renewalTime; update
the checks that reference renewalTime (the block using now, renewalTime,
cert.NotAfter) so short-lived certs are not immediately flagged for renewal.
- Around line 498-500: The current r.Create(ctx, secret) call can leave a stale
`${bmc}-key` secret present and reused across reissues; before creating the new
operator-generated private key secret (the variable secret used with r.Create),
explicitly remove or replace any existing secret with the same name (e.g., query
for a Secret with secret.Name and delete it if found, or use a replace semantics
such as controllerutil.CreateOrUpdate to overwrite) so that each reissue always
uses the freshly generated keypair rather than a leftover secret; ensure to
handle NotFound and other errors appropriately around r.Delete/r.Get or
CreateOrUpdate.

In `@internal/controller/server_controller.go`:
- Around line 165-181: The current shouldDelete() logic only verifies the BMC CR
exists (r.Get) and thus can hang if the BMC CR exists but Redfish is
unreachable; change the branch that checks ServerFinalizer + server.Status ==
ServerStateMaintenance to perform a real reachability check against the BMC
(e.g., call your redfish reachability helper or the same check used elsewhere)
using server.Spec.BMCRef, and if the BMC is unreachable (timeout/connection
error) log that and return true to allow deletion to proceed; only return false
(postpone) when the reachability check confirms the BMC is reachable. Apply the
same reachable-vs-unreachable semantics to the BMCUnAvailableError path so
finalizer removal isn’t indefinitely postponed when Redfish is down.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d6698db7-07e5-40d6-83aa-40f9126d1145

📥 Commits

Reviewing files that changed from the base of the PR and between 2fc47fe and 3b966b6.

📒 Files selected for processing (7)
  • bmc/redfish.go
  • internal/controller/bmc_certificate.go
  • internal/controller/bmc_certificate_integration_test.go
  • internal/controller/bmc_controller.go
  • internal/controller/server_controller.go
  • test/e2e/bmc_certificate_e2e_test.go
  • test/e2e/testdata/selfsigned-issuer.yaml
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/testdata/selfsigned-issuer.yaml

Comment on lines +1105 to +1115
csrParams := &schemas.CertificateServiceGenerateCSRParameters{
CommonName: params.CommonName,
AlternativeNames: params.AlternativeNames,
Organization: params.Organization,
OrganizationalUnit: params.OrganizationalUnit,
City: params.City,
State: params.State,
Country: params.Country,
KeyPairAlgorithm: params.KeyPairAlgorithm,
KeyBitLength: params.KeyBitLength,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

GenerateCSR won't compile against bmc.CSRParameters.

bmc.CSRParameters exposes OrganizationUnit, not OrganizationalUnit, and the current contract in bmc/bmc.go does not define KeyPairAlgorithm or KeyBitLength. This block, plus the matching builder in internal/controller/bmc_certificate.go, will not compile until the field set is aligned.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bmc/redfish.go` around lines 1105 - 1115, GenerateCSR fails because
schemas.CertificateServiceGenerateCSRParameters uses OrganizationalUnit and
KeyPairAlgorithm/KeyBitLength while bmc.CSRParameters exposes OrganizationUnit
and doesn't declare KeyPairAlgorithm/KeyBitLength; update the mapping and types
so they match: in GenerateCSR map bmc.CSRParameters.OrganizationUnit to
schemas.CertificateServiceGenerateCSRParameters.OrganizationalUnit (or rename
the schema field to OrganizationUnit consistently), and add KeyPairAlgorithm and
KeyBitLength to the bmc.CSRParameters definition in bmc/bmc.go (and the builder
in internal/controller/bmc_certificate.go) or remove those fields from the
schema mapping so the structs align and the code compiles.

Comment on lines +143 to +146
// Certificate reconciliation (before BMC operations)
if result, err := r.reconcileCertificate(ctx, bmcObj); err != nil || !result.IsZero() {
return result, err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't return early for retry-only certificate states.

Line 144 exits the whole reconcile whenever certificate management asks for a requeue. Pending issuance and graceful-degradation errors then skip the normal BMC connectivity, status, server-discovery, and event-subscription path instead of letting those operations continue alongside certificate retries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_controller.go` around lines 143 - 146, The current
early return from r.reconcileCertificate(ctx, bmcObj) prevents BMC connectivity,
server discovery, and event subscription when certificate handling only requests
a requeue; change this to call r.reconcileCertificate and capture its (result,
err) into local vars (e.g., certResult, certErr) without returning immediately,
continue with the normal BMC flow (connectivity/discovery/subscription), and at
the end of reconcile merge/propagate the certificate outcome: if certErr != nil
return certResult, certErr; otherwise if certResult requests a requeue ensure
the final returned ctrl.Result includes that requeue directive (or combine with
any other reconcile result) so certificate retries do not block the rest of the
reconcile.

Comment on lines +146 to +154
// Special case: If Server doesn't have DeletionTimestamp yet but BMC is gone,
// the Server should be deleted (handles cascade deletion via OwnerReference)
if server.DeletionTimestamp.IsZero() {
if server.Spec.BMCRef != nil {
bmcObj := &metalv1alpha1.BMC{}
if err := r.Get(ctx, client.ObjectKey{Name: server.Spec.BMCRef.Name}, bmcObj); err != nil {
if apierrors.IsNotFound(err) {
log.V(1).Info("BMC not found, Server should be deleted", "BMC", server.Spec.BMCRef.Name, "Server", server.Name)
return true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't turn an unresolved BMCRef into an immediate delete.

This treats any NotFound for server.Spec.BMCRef as "the parent was deleted", but it also fires for a Server created before its BMC exists. Because reconcileExists() now goes straight into delete(), the Server can disappear without any explicit delete request. If this is only meant to emulate owner-ref cascading, gate it on an existing BMC controller owner reference or requeue instead of deleting.

🐛 Possible guard
 if server.DeletionTimestamp.IsZero() {
 	if server.Spec.BMCRef != nil {
+		owner := metav1.GetControllerOf(server)
+		if owner == nil || owner.Kind != "BMC" || owner.Name != server.Spec.BMCRef.Name {
+			return false
+		}
+
 		bmcObj := &metalv1alpha1.BMC{}
 		if err := r.Get(ctx, client.ObjectKey{Name: server.Spec.BMCRef.Name}, bmcObj); err != nil {
 			if apierrors.IsNotFound(err) {
 				log.V(1).Info("BMC not found, Server should be deleted", "BMC", server.Spec.BMCRef.Name, "Server", server.Name)
 				return true

As per coding guidelines, internal/controller/**/*_controller.go should use owner references with SetControllerReference to enable automatic garbage collection of managed resources.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/server_controller.go` around lines 146 - 154, The code
currently treats any NotFound from r.Get(ctx, client.ObjectKey{Name:
server.Spec.BMCRef.Name}, bmcObj) as a signal to delete the Server (inside the
server.DeletionTimestamp.IsZero() branch), which can remove Servers created
before their BMC exists; instead, when a BMC lookup returns NotFound do not
immediately return true/delete — either requeue the reconcile to wait for the
BMC to appear or only treat the missing BMC as a deletion trigger if you can
prove the Server was owned by that BMC (e.g., check for an owner reference/UID
that matches the expected BMC controller instead of blindly deleting). Update
reconcileExists()/delete() call site to requeue on NotFound and add logic to
rely on proper owner references (use SetControllerReference when creating BMCs
so garbage-collection via owner refs is available) rather than forcing immediate
deletion.

Comment on lines +93 to +114
It("should create a Certificate resource for the BMC", func() {
By("checking if Certificate resource exists")
verifyCertificateExists := func(g Gomega) {
cmd := exec.Command("kubectl", "get", "certificate",
fmt.Sprintf("%s-cert", bmcWithCertName),
"-n", bmcTestNamespace)
_, err := utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Certificate resource should be created")
}
Eventually(verifyCertificateExists, reconciliationTimeout, reconciliationPolling).Should(Succeed())

By("verifying Certificate has correct issuerRef")
verifyCertificateIssuer := func(g Gomega) {
cmd := exec.Command("kubectl", "get", "certificate",
fmt.Sprintf("%s-cert", bmcWithCertName),
"-n", bmcTestNamespace,
"-o", "jsonpath={.spec.issuerRef.name}")
output, err := utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred())
g.Expect(output).To(Equal("selfsigned-issuer"))
}
Eventually(verifyCertificateIssuer, certificateTimeout, certificatePolling).Should(Succeed())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These assertions target objects the controller never creates.

The implementation in internal/controller/bmc_certificate.go creates a CertificateRequest named bmc-certreq-<bmc> and stores the installed cert in bmc-cert-<bmc>. It does not create a cert-manager Certificate named <bmc>-cert, so the kubectl get certificate ... checks and the later <bmc>-cert secret/reference expectations will fail against the current controller behavior.

Also applies to: 130-162, 186-258, 324-336, 374-382, 404-425

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/bmc_certificate_e2e_test.go` around lines 93 - 114, The test is
asserting for cert-manager Certificate objects named "<bmc>-cert" but the
controller actually creates a CertificateRequest "bmc-certreq-<bmc>" and stores
the installed cert in "bmc-cert-<bmc>"; update the test functions (e.g.,
verifyCertificateExists, verifyCertificateIssuer and other similar checks
mentioned) to query for the CertificateRequest resource name format
"bmc-certreq-%s" (using bmcWithCertName and bmcTestNamespace) and validate its
status/approved/issued fields, and then check the secret or Certificate data
under the actual installed cert name "bmc-cert-%s" rather than "<bmc>-cert" so
the assertions match internal/controller/bmc_certificate.go behavior.

Comment on lines +165 to +182
It("should have certificate Secret with expected keys", func() {
By("verifying Secret contains tls.crt")
cmd := exec.Command("kubectl", "get", "secret",
fmt.Sprintf("%s-cert", bmcWithCertName),
"-n", bmcTestNamespace,
"-o", "jsonpath={.data.tls\\.crt}")
output, err := utils.Run(cmd)
Expect(err).NotTo(HaveOccurred())
Expect(output).NotTo(BeEmpty(), "tls.crt should exist")

By("verifying Secret contains tls.key")
cmd = exec.Command("kubectl", "get", "secret",
fmt.Sprintf("%s-cert", bmcWithCertName),
"-n", bmcTestNamespace,
"-o", "jsonpath={.data.tls\\.key}")
output, err = utils.Run(cmd)
Expect(err).NotTo(HaveOccurred())
Expect(output).NotTo(BeEmpty(), "tls.key should exist")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

tls.key is not guaranteed in the managed secret.

When the CSR is generated on the BMC, the private key never leaves the device. The controller intentionally stores only tls.crt in that path, so Lines 175-182 make the test fail on the native/BMC-generated CSR flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/bmc_certificate_e2e_test.go` around lines 165 - 182, The test
It("should have certificate Secret with expected keys") wrongly asserts that the
Secret contains tls.key even though BMC-generated CSRs never store the private
key; update the test to only assert the presence of tls.crt (use the existing
kubectl/jsonpath call that queries {.data.tls\.crt} with bmcWithCertName and
bmcTestNamespace via utils.Run) and remove or stop asserting on the tls.key
jsonpath (or make that check conditional based on a flag if you must support
both controller-generated and BMC-native CSR flows).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change area/metal-automation documentation Improvements or additions to documentation enhancement New feature or request size/XXL

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants