Secure discovery boot data with jwt signed tokens#750
Secure discovery boot data with jwt signed tokens#750stefanhipfel wants to merge 5 commits intomainfrom
Conversation
WalkthroughAdds HMAC-SHA256 signed discovery tokens: controller creates or ensures a cluster signing secret, generates signed JWT tokens per server and injects them into ignition; metalprobe includes the token when calling registry endpoints; registry loads the signing secret and validates tokens for /register and /bootstate. Changes
Sequence DiagramsequenceDiagram
participant K8s as Kubernetes
participant Ctrl as Controller
participant Registry as Registry Server
participant Probe as Metalprobe
Ctrl->>K8s: getOrCreateSigningSecret()
K8s-->>Ctrl: signing-key (32 bytes)
Ctrl->>Ctrl: GenerateSignedDiscoveryToken(signing-key, systemUUID)
Ctrl-->>K8s: Apply ignition with --discovery-token=token
K8s-->>Probe: Boot with token flag
Probe->>Registry: POST /register (DiscoveryToken)
Registry->>Registry: VerifySignedDiscoveryToken(signing-key, token)
alt Token Valid
Registry-->>Probe: 200 OK
else Token Invalid/Missing
Registry-->>Probe: 401 Unauthorized
end
Probe->>Registry: POST /bootstate (DiscoveryToken)
Registry->>Registry: VerifySignedDiscoveryToken(signing-key, token)
alt Token Valid
Registry-->>Probe: 200 OK
else Token Invalid/Missing
Registry-->>Probe: 401 Unauthorized
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (7)
cmd/main.go (1)
627-633: Consider extracting the hardcoded secret name to a shared constant.The secret name
"discovery-token-signing-secret"is duplicated here and inserver_controller.go(line 723). Extracting it to a shared constant would prevent accidental drift between the controller and registry server configurations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/main.go` around lines 627 - 633, The secret name "discovery-token-signing-secret" is duplicated; extract it to a shared constant and use that constant wherever the literal is used. Define a package-level constant (e.g., DiscoveryTokenSigningSecret) in a common package/file imported by both the registry setup and the controller, then replace the literal in the registry.NewServer call and the usage in server_controller.go (and any other occurrences) with that constant to ensure a single source of truth.internal/token/token_test.go (2)
163-173: Test comment is misleading - this doesn't verify constant-time behavior.The comment says "HMAC provides constant-time comparison internally" but the test only verifies that token verification works, not timing properties. Timing-safe comparison cannot be tested this way.
Consider renaming or updating the test description to reflect what it actually tests.
📝 Suggested description improvement
- It("should use constant-time comparison (HMAC)", func() { - // HMAC provides constant-time comparison internally - // This test verifies the signature mechanism works + It("should verify tokens using HMAC signature mechanism", func() { + // Note: HMAC uses hmac.Equal internally for constant-time comparison + // This test verifies basic signature verification works🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/token/token_test.go` around lines 163 - 173, The test description is misleading about timing; update the It description under Context("Token Security") to reflect that it verifies signature correctness rather than constant-time behavior: change the It("should use constant-time comparison (HMAC)") text to something like It("verifies signed discovery token using HMAC signature") and/or update the comment above the test; keep the test body using GenerateSignedDiscoveryToken and VerifySignedDiscoveryToken unchanged so it still asserts successful generation and valid verification.
67-76: Consider using explicit time manipulation instead of relying on timing.The
Eventuallypattern here relies on wall-clock time passing to generate a different timestamp. This could be flaky in fast CI environments or if the timestamp resolution changes. Consider either:
- Mocking time in the token package
- Simply accepting that same-second tokens may be identical (expected behavior)
The current test is technically correct but may cause intermittent failures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/token/token_test.go` around lines 67 - 76, Test relies on wall-clock progress and can flake; update the code or test to control time deterministically: either add a time provider to GenerateSignedDiscoveryToken (e.g., inject a nowFunc or accept a timestamp parameter) and in the test call GenerateSignedDiscoveryToken with two different timestamps to assert tokens differ, or change the test to assert tokens may be equal within the same second and avoid Eventually. Locate GenerateSignedDiscoveryToken and the token_test.go case and implement the time-injection approach (introduce nowFunc or timestamp arg) or alter the test to compare tokens produced with explicit different times instead of using Eventually.internal/controller/server_controller_test.go (2)
437-446: Fragile string parsing for token extraction.This manual string parsing with
IndexAnyand hardcoded delimiters is brittle. If the ignition format changes slightly (e.g., different quoting or whitespace), this could break silently.Consider using a regex pattern for more robust extraction, or better yet, parse the YAML structure to get the systemd unit contents and then extract the flag value.
♻️ Suggested regex-based extraction
- // Extract token using regex or string parsing - tokenStart := strings.Index(contents, "--discovery-token=") - Expect(tokenStart).To(BeNumerically(">", 0)) - tokenValueStart := tokenStart + len("--discovery-token=") - tokenEnd := strings.IndexAny(contents[tokenValueStart:], " \n\t\"'") - var actualToken string - if tokenEnd == -1 { - actualToken = contents[tokenValueStart:] - } else { - actualToken = contents[tokenValueStart : tokenValueStart+tokenEnd] - } - Expect(actualToken).ToNot(BeEmpty()) + // Extract token using regex for more robust parsing + tokenRegex := regexp.MustCompile(`--discovery-token=([A-Za-z0-9_-]+)`) + matches := tokenRegex.FindStringSubmatch(contents) + Expect(matches).To(HaveLen(2), "should find discovery token in ignition") + actualToken := matches[1] + Expect(actualToken).ToNot(BeEmpty())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/server_controller_test.go` around lines 437 - 446, The current brittle manual parsing of contents (variables tokenStart, tokenValueStart, tokenEnd, actualToken) using strings.Index and IndexAny should be replaced with a more robust extraction: use a regex to locate --discovery-token=<value> (or parse the YAML/ignition to get the systemd unit content first) and capture the token value, or parse the YAML structure to retrieve the unit and then extract the flag; update the test to use that regex/capture group (or YAML unmarshalling) instead of the Index/IndexAny logic so variations in quoting/whitespace won’t break the extraction.
271-289: Consider extracting repeated signing secret retrieval into a helper function.The pattern of fetching the signing secret and generating a token is duplicated across multiple tests (here, lines 506-522, and lines 853-867). This could be extracted into a test helper to reduce duplication and improve maintainability.
♻️ Suggested helper function
// Add this helper near the top of the test file or in a test utilities file func getSignedDiscoveryToken(ctx context.Context, k8sClient client.Client, ns, systemUUID string) string { signingSecret := &v1.Secret{} Eventually(func() error { return k8sClient.Get(ctx, client.ObjectKey{ Name: "discovery-token-signing-secret", Namespace: ns, }, signingSecret) }).Should(Succeed()) signingKey := signingSecret.Data["signing-key"] Expect(signingKey).To(HaveLen(32)) token, err := metaltoken.GenerateSignedDiscoveryToken(signingKey, systemUUID) Expect(err).NotTo(HaveOccurred()) return token }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/server_controller_test.go` around lines 271 - 289, The test duplicates code to fetch the "discovery-token-signing-secret" and generate a signed discovery token; extract that logic into a helper function (e.g., getSignedDiscoveryToken(ctx context.Context, k8sClient client.Client, ns string, systemUUID string) string) placed near the top of the test file or a test utilities file, have it perform the Eventually k8sClient.Get into a v1.Secret, assert signingKey length and GenerateSignedDiscoveryToken (metaltoken.GenerateSignedDiscoveryToken), return the token, and then replace the inline blocks in the tests (the creation of signingSecret, signingKey check, metaltoken.GenerateSignedDiscoveryToken call, and probe.NewAgent usage sites) with calls to getSignedDiscoveryToken to reduce duplication and keep assertions intact.internal/token/token.go (1)
86-97: Manual delimiter parsing could use strings.Index or strings.Split.The manual loop to find
||can be simplified usingstrings.Index:♻️ Simplified delimiter parsing
- // Parse by splitting on "||" - firstDelim := -1 - for i := 0; i < len(payload)-1; i++ { - if payload[i] == '|' && payload[i+1] == '|' { - firstDelim = i - break - } - } - - if firstDelim == -1 { + // Parse by finding "||" delimiter + firstDelim := strings.Index(payload, "||") + if firstDelim == -1 { return "", 0, false, nil // Invalid format }Note: You would need to add
"strings"to imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/token/token.go` around lines 86 - 97, Replace the manual loop that searches for the "||" delimiter on the payload with strings.Index(payload, "||") (or strings.SplitN(payload, "||", 2)) to simplify and clarify parsing; update the code paths that use firstDelim (the existing early-return when delimiter not found and subsequent substring logic) to use the Index/Split result, and add "strings" to the imports. This keeps the same behavior (returning "", 0, false, nil when not found) but removes the manual byte-by-byte scan and improves readability around the payload delimiter handling.docs/discovery-token-security.md (1)
13-15: Add language specifiers to fenced code blocks.Static analysis flagged these code blocks as missing language specifiers. Adding
textorplaintextfor the token format and ASCII diagram would satisfy the linter and improve rendering in some viewers.📝 Suggested fix
-``` +```text token = base64url(systemUUID||timestamp||HMAC-SHA256(systemUUID||timestamp, secret))And for the sequence diagram: ```diff -``` +```text ┌──────────────┐ ┌──────────┐ ┌──────────┐Also applies to: 52-77
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/discovery-token-security.md` around lines 13 - 15, Update the fenced code blocks that show the token format (the block containing "token = base64url(systemUUID||timestamp||HMAC-SHA256(systemUUID||timestamp, secret))") and the ASCII sequence diagram (the block starting with the box characters "┌──────────────┐ ⟨...⟩") to include a language specifier such as "text" or "plaintext" after the opening backticks; apply the same change to the other code fences noted in the comment range (the blocks around lines 52–77) so the linter accepts them and viewers render them as plain text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/discovery-token-security.md`:
- Around line 317-321: The manual command uses "openssl rand -base64 32" which
base64-encodes the 32 bytes and causes the stored signing-key to be ~44 bytes
after Kubernetes base64-ing it; instead generate 32 raw bytes, write them to a
temporary file, create the Kubernetes secret discovery-token-signing-secret
using --from-file=signing-key=<tempfile> so the exact 32-byte value is stored,
and then remove the temporary file; references: discovery-token-signing-secret,
signing-key, and avoid using openssl rand -base64 32.
In `@internal/controller/server_controller.go`:
- Around line 720-767: getOrCreateSigningSecret has two issues: concurrent
reconciles can race on creating the secret and invalid secrets are never
recovered. Modify getOrCreateSigningSecret to: when an existing secret is found
validate the "signing-key" and if invalid attempt to replace it (generate a new
key and Update/Patch the secret, handling Conflict by re-getting and
re-validating); when Create returns AlreadyExists after generating a new secret,
treat it as a benign race by fetching the newly-created secret and
validating/returning its signing-key instead of failing; in all fetch/update
paths handle transient IsNotFound/Conflict/AlreadyExists errors by re-fetching
and retrying validation once before returning an error. Reference:
getOrCreateSigningSecret, secret.Data["signing-key"], r.Create, r.Get, and use
metaltoken.GenerateSigningSecret for regeneration.
In `@internal/probe/probe.go`:
- Around line 214-219: The auth-error branch currently returns (false, nil)
which signals "not done, no error" so ExponentialBackoffWithContext keeps
retrying; change it to return a non-nil error to stop retries. In the block that
logs the auth failure (the a.log.Error call that checks resp.StatusCode ==
http.StatusUnauthorized || resp.StatusCode == http.StatusForbidden), return a
terminal error (e.g., errors.New or fmt.Errorf with the status code and context)
instead of nil so ExponentialBackoffWithContext will cease retries.
In `@internal/registry/server.go`:
- Around line 89-96: The current validateDiscoveryToken function blindly allows
requests when s.k8sClient == nil which is risky; change the logic to require an
explicit test-mode flag instead of inferring from s.k8sClient by adding a Server
field like testMode (bool) and only bypass validation when testMode is true, and
also add a startup-time log warning if testMode is enabled; update
validateDiscoveryToken to check s.testMode (and still log that validation is
skipped) and ensure any creation/initialization paths set testMode explicitly
rather than relying on a nil k8sClient.
- Around line 104-133: validateDiscoveryToken performs lazy loading of
s.signingSecret without synchronization, causing a data race; add a synchronized
one-time loader and protect writes: introduce fields on Server (e.g.,
signingSecretOnce sync.Once and signingSecretMu sync.Mutex) and implement a
helper method loadSigningSecret() that uses signingSecretOnce.Do to fetch the
Secret via s.k8sClient, validate secret.Data["signing-key"], then assign
s.signingSecret while holding signingSecretMu; call loadSigningSecret() from
validateDiscoveryToken and only read s.signingSecret under the mutex or after
the Once to ensure safe concurrent access.
In `@internal/token/token.go`:
- Around line 40-42: Validate the systemUUID before constructing the payload to
prevent delimiter injection: in the function that builds the payload (where
systemUUID, timestamp, and payload are used) add an RFC‑4122 UUID regex check
against systemUUID and return an error if it doesn't match, so payload :=
fmt.Sprintf("%s||%d", systemUUID, timestamp) only runs for a valid UUID;
alternatively you may replace the delimiter approach with length‑prefixing or a
null byte separator and update the corresponding parsing logic (the token
parsing code referenced at lines 86-99) to use the new format.
---
Nitpick comments:
In `@cmd/main.go`:
- Around line 627-633: The secret name "discovery-token-signing-secret" is
duplicated; extract it to a shared constant and use that constant wherever the
literal is used. Define a package-level constant (e.g.,
DiscoveryTokenSigningSecret) in a common package/file imported by both the
registry setup and the controller, then replace the literal in the
registry.NewServer call and the usage in server_controller.go (and any other
occurrences) with that constant to ensure a single source of truth.
In `@docs/discovery-token-security.md`:
- Around line 13-15: Update the fenced code blocks that show the token format
(the block containing "token =
base64url(systemUUID||timestamp||HMAC-SHA256(systemUUID||timestamp, secret))")
and the ASCII sequence diagram (the block starting with the box characters
"┌──────────────┐ ⟨...⟩") to include a language specifier such as "text" or
"plaintext" after the opening backticks; apply the same change to the other code
fences noted in the comment range (the blocks around lines 52–77) so the linter
accepts them and viewers render them as plain text.
In `@internal/controller/server_controller_test.go`:
- Around line 437-446: The current brittle manual parsing of contents (variables
tokenStart, tokenValueStart, tokenEnd, actualToken) using strings.Index and
IndexAny should be replaced with a more robust extraction: use a regex to locate
--discovery-token=<value> (or parse the YAML/ignition to get the systemd unit
content first) and capture the token value, or parse the YAML structure to
retrieve the unit and then extract the flag; update the test to use that
regex/capture group (or YAML unmarshalling) instead of the Index/IndexAny logic
so variations in quoting/whitespace won’t break the extraction.
- Around line 271-289: The test duplicates code to fetch the
"discovery-token-signing-secret" and generate a signed discovery token; extract
that logic into a helper function (e.g., getSignedDiscoveryToken(ctx
context.Context, k8sClient client.Client, ns string, systemUUID string) string)
placed near the top of the test file or a test utilities file, have it perform
the Eventually k8sClient.Get into a v1.Secret, assert signingKey length and
GenerateSignedDiscoveryToken (metaltoken.GenerateSignedDiscoveryToken), return
the token, and then replace the inline blocks in the tests (the creation of
signingSecret, signingKey check, metaltoken.GenerateSignedDiscoveryToken call,
and probe.NewAgent usage sites) with calls to getSignedDiscoveryToken to reduce
duplication and keep assertions intact.
In `@internal/token/token_test.go`:
- Around line 163-173: The test description is misleading about timing; update
the It description under Context("Token Security") to reflect that it verifies
signature correctness rather than constant-time behavior: change the It("should
use constant-time comparison (HMAC)") text to something like It("verifies signed
discovery token using HMAC signature") and/or update the comment above the test;
keep the test body using GenerateSignedDiscoveryToken and
VerifySignedDiscoveryToken unchanged so it still asserts successful generation
and valid verification.
- Around line 67-76: Test relies on wall-clock progress and can flake; update
the code or test to control time deterministically: either add a time provider
to GenerateSignedDiscoveryToken (e.g., inject a nowFunc or accept a timestamp
parameter) and in the test call GenerateSignedDiscoveryToken with two different
timestamps to assert tokens differ, or change the test to assert tokens may be
equal within the same second and avoid Eventually. Locate
GenerateSignedDiscoveryToken and the token_test.go case and implement the
time-injection approach (introduce nowFunc or timestamp arg) or alter the test
to compare tokens produced with explicit different times instead of using
Eventually.
In `@internal/token/token.go`:
- Around line 86-97: Replace the manual loop that searches for the "||"
delimiter on the payload with strings.Index(payload, "||") (or
strings.SplitN(payload, "||", 2)) to simplify and clarify parsing; update the
code paths that use firstDelim (the existing early-return when delimiter not
found and subsequent substring logic) to use the Index/Split result, and add
"strings" to the imports. This keeps the same behavior (returning "", 0, false,
nil when not found) but removes the manual byte-by-byte scan and improves
readability around the payload delimiter handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6424543f-9838-4804-8743-f03cb9c7032c
📒 Files selected for processing (13)
cmd/main.gocmd/metalprobe/main.godocs/discovery-token-security.mdinternal/api/registry/registry.gointernal/controller/server_controller.gointernal/controller/server_controller_test.gointernal/controller/suite_test.gointernal/probe/probe.gointernal/probe/probe_suite_test.gointernal/registry/registry_suite_test.gointernal/registry/server.gointernal/token/token.gointernal/token/token_test.go
| func (s *Server) validateDiscoveryToken(receivedToken string) (string, bool) { | ||
| // Skip validation in unit test mode (no k8s client) | ||
| if s.k8sClient == nil { | ||
| s.log.V(1).Info("Skipping token validation (no K8s client - unit test mode)") | ||
| // In test mode, extract systemUUID from token if possible, otherwise return empty | ||
| // For now, just allow all requests in test mode | ||
| return "", true | ||
| } |
There was a problem hiding this comment.
Bypassing authentication in test mode is risky.
When k8sClient == nil, all token validation is skipped and requests are allowed. While this is convenient for unit tests, it creates a risk if someone accidentally deploys with a nil client or misconfigures the server.
Consider logging a warning at startup if running in test mode, or requiring an explicit flag to enable test mode rather than inferring it from client presence.
📝 Suggested improvement
func NewServer(logger logr.Logger, addr string, k8sClient client.Client, signingSecretName, signingSecretNamespace string) *Server {
+ if k8sClient == nil {
+ logger.Info("WARNING: Running without K8s client - token validation DISABLED (test mode only)")
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/registry/server.go` around lines 89 - 96, The current
validateDiscoveryToken function blindly allows requests when s.k8sClient == nil
which is risky; change the logic to require an explicit test-mode flag instead
of inferring from s.k8sClient by adding a Server field like testMode (bool) and
only bypass validation when testMode is true, and also add a startup-time log
warning if testMode is enabled; update validateDiscoveryToken to check
s.testMode (and still log that validation is skipped) and ensure any
creation/initialization paths set testMode explicitly rather than relying on a
nil k8sClient.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
docs/discovery-token-security.md (1)
339-342:⚠️ Potential issue | 🟠 MajorManual secret creation command still produces incorrect key length.
As previously flagged,
openssl rand -base64 32generates 32 random bytes and base64-encodes them to ~44 characters. When stored via--from-literal, kubectl base64-encodes this again, so the decodedsigning-keywill be 44 bytes, not the required 32 bytes.🔧 Corrected command
-# Or manually create the secret -kubectl create secret generic discovery-token-signing-secret \ - --from-literal=signing-key=$(openssl rand -base64 32) \ - -n metal-operator-system +# Or manually create the secret (generate 32 raw bytes) +kubectl create secret generic discovery-token-signing-secret \ + --from-file=signing-key=<(openssl rand 32) \ + -n metal-operator-system🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/discovery-token-security.md` around lines 339 - 342, The current command uses openssl rand -base64 32 with --from-literal which yields a 44-byte base64 string that kubectl base64-encodes again, producing an incorrect 44-byte decoded signing-key; instead generate raw 32 bytes to a file and create the secret from that file so Kubernetes stores exactly 32 bytes: run openssl rand 32 > signing-key.bin and then kubectl create secret generic discovery-token-signing-secret --from-file=signing-key=signing-key.bin -n metal-operator-system (then securely remove signing-key.bin); update the documented command that references openssl rand -base64 32, --from-literal, and signing-key accordingly.
🧹 Nitpick comments (2)
internal/token/token_test.go (1)
69-78: Consider usingtime.Sleepor handling the error in Eventually.The test swallows the error from
GenerateSignedDiscoveryTokenon line 75. If token generation fails, the test would hang until timeout rather than fail fast with a clear error.♻️ Proposed fix to handle error explicitly
It("should generate different tokens at different times", func() { token1, err := GenerateSignedDiscoveryToken(signingSecret, "test-uuid") Expect(err).NotTo(HaveOccurred()) - // Small delay to ensure different timestamp - Eventually(func() string { - token2, _ := GenerateSignedDiscoveryToken(signingSecret, "test-uuid") - return token2 - }).ShouldNot(Equal(token1)) + // Wait to ensure different timestamp (JWT uses second precision) + time.Sleep(1100 * time.Millisecond) + + token2, err := GenerateSignedDiscoveryToken(signingSecret, "test-uuid") + Expect(err).NotTo(HaveOccurred()) + Expect(token2).NotTo(Equal(token1)) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/token/token_test.go` around lines 69 - 78, The test swallows the error from GenerateSignedDiscoveryToken inside Eventually causing silent failures; update the "should generate different tokens at different times" test to handle errors explicitly — either insert a short time.Sleep between calls and call GenerateSignedDiscoveryToken twice with Expect(err).NotTo(HaveOccurred()) before comparing token1 vs token2, or change the Eventually invocation to call a helper that returns (string, error) and assert on the error immediately; reference GenerateSignedDiscoveryToken, token1/token2 and signingSecret when making the change so the test fails fast on token generation errors instead of hanging.docs/discovery-token-security.md (1)
13-15: Add language specifiers to fenced code blocks for better rendering.The markdown linter flags code blocks without language specifiers. For non-code text blocks (token format examples and ASCII diagrams), use the
textspecifier.📐 Proposed fix
-``` +```text token = header.payload.signatureJWT Structure:
-
+text
..And for the ASCII diagram around line 68:
-``` +```text ┌──────────────┐ ┌──────────┐ ┌──────────┐ │ Controller │ │ Registry │ │ Server │ └──────┬───────┘ └────┬─────┘ └────┬─────┘Also applies to: 19-21, 68-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/discovery-token-security.md` around lines 13 - 15, The fenced code blocks in docs/discovery-token-security.md (examples like the line "token = header.payload.signature", the "JWT Structure" block showing "<base64-header>.<base64-payload>.<base64-signature>" and the ASCII diagram starting with "┌──────────────┐") lack language specifiers and fail the markdown linter; update each such fenced block (including the blocks around lines 19-21 and 68-94) to use the text language specifier (i.e., replace ``` with ```text for the token example, the JWT Structure example, and the ASCII diagram blocks) so they render correctly and satisfy the linter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/discovery-token-security.md`:
- Around line 441-447: The "24-hour token lifetime (increased from 1 hour)"
statement under the "Not Needed (Now Implemented)" section is inconsistent with
the rest of the document and code examples that use a 1-hour expiry; update that
statement to reflect a 1-hour token lifetime (or alternatively update the
code/examples and the "Why 1 hour?" section if you intend to change to 24
hours). Specifically, edit the text under the "Not Needed (Now Implemented)"
heading to say "1-hour token lifetime" (and remove the "(increased from 1 hour)"
clause) and verify the JWT example and the "Why 1 hour?" section wording match
this value so the document is consistent.
- Around line 28-36: In the "Claims (Payload):" section remove the "-
**secret**: 32-byte (256-bit) shared secret..." bullet since the signing secret
is not a JWT claim; keep only "- **sub**", "- **iat**", and "- **exp**" as the
transmitted claims, and instead add a short note near the "Signature:
HMAC-SHA256(header + payload, secret)" line clarifying that the secret is used
only to compute/verify the HMAC and is not included in the token payload.
---
Duplicate comments:
In `@docs/discovery-token-security.md`:
- Around line 339-342: The current command uses openssl rand -base64 32 with
--from-literal which yields a 44-byte base64 string that kubectl base64-encodes
again, producing an incorrect 44-byte decoded signing-key; instead generate raw
32 bytes to a file and create the secret from that file so Kubernetes stores
exactly 32 bytes: run openssl rand 32 > signing-key.bin and then kubectl create
secret generic discovery-token-signing-secret
--from-file=signing-key=signing-key.bin -n metal-operator-system (then securely
remove signing-key.bin); update the documented command that references openssl
rand -base64 32, --from-literal, and signing-key accordingly.
---
Nitpick comments:
In `@docs/discovery-token-security.md`:
- Around line 13-15: The fenced code blocks in docs/discovery-token-security.md
(examples like the line "token = header.payload.signature", the "JWT Structure"
block showing "<base64-header>.<base64-payload>.<base64-signature>" and the
ASCII diagram starting with "┌──────────────┐") lack language specifiers and
fail the markdown linter; update each such fenced block (including the blocks
around lines 19-21 and 68-94) to use the text language specifier (i.e., replace
``` with ```text for the token example, the JWT Structure example, and the ASCII
diagram blocks) so they render correctly and satisfy the linter.
In `@internal/token/token_test.go`:
- Around line 69-78: The test swallows the error from
GenerateSignedDiscoveryToken inside Eventually causing silent failures; update
the "should generate different tokens at different times" test to handle errors
explicitly — either insert a short time.Sleep between calls and call
GenerateSignedDiscoveryToken twice with Expect(err).NotTo(HaveOccurred()) before
comparing token1 vs token2, or change the Eventually invocation to call a helper
that returns (string, error) and assert on the error immediately; reference
GenerateSignedDiscoveryToken, token1/token2 and signingSecret when making the
change so the test fails fast on token generation errors instead of hanging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 34fd7cba-42bb-49b5-a86a-77dfe6d83e10
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
docs/discovery-token-security.mdgo.modinternal/token/token.gointernal/token/token_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/token/token.go
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/registry/server.go (1)
310-319:⚠️ Potential issue | 🔴 CriticalNil pointer dereference when k8sClient is nil and payload.Booted is true.
In test mode (
k8sClient == nil), token validation passes at Lines 294-299. Ifpayload.Bootedis true, the code reaches Line 315 which callss.k8sClient.List(), causing a panic.🐛 Proposed fix to guard against nil client
s.log.Info("Received boot state for system", "SystemUUID", payload.SystemUUID, "BootState", payload.Booted) if !payload.Booted { w.WriteHeader(http.StatusOK) return } + // In test mode without k8s client, we cannot update ServerBootConfiguration + if s.k8sClient == nil { + s.log.Info("Running in test mode, skipping ServerBootConfiguration update") + w.WriteHeader(http.StatusOK) + return + } var servers metalv1alpha1.ServerList if err := s.k8sClient.List(r.Context(), &servers, client.MatchingFields{"spec.systemUUID": payload.SystemUUID}); err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/registry/server.go` around lines 310 - 319, The handler panics when payload.Booted is true because s.k8sClient can be nil in test mode; before calling s.k8sClient.List(...) check for s.k8sClient == nil and short-circuit (e.g., write a success response and return or return a controlled error) to avoid dereferencing nil. Update the code path that uses s.k8sClient.List, referencing s.k8sClient and the payload.Booted check, to handle the nil client case (log a clear message and respond without calling List).
♻️ Duplicate comments (1)
internal/controller/server_controller.go (1)
730-736:⚠️ Potential issue | 🟠 MajorInvalid secret still causes permanent failure without auto-recovery.
When the signing secret exists but has an invalid key (wrong length or missing), this path returns an error without attempting to regenerate. This will cause permanent reconciliation failures until manual intervention, while the
AlreadyExistspath (Lines 778-790) does attempt recovery.Consider applying the same regeneration logic here for consistency:
🐛 Proposed fix to add auto-recovery
if err == nil { // Secret exists, return the signing key if signingKey, ok := secret.Data[metaltoken.DiscoveryTokenSigningSecretKey]; ok && len(signingKey) == 32 { return signingKey, nil } - // Secret exists but is invalid, regenerate - return nil, fmt.Errorf("existing signing secret is invalid") + // Secret exists but is invalid, update it with a valid key + ctrl.LoggerFrom(ctx).Info("Existing signing secret is invalid, regenerating", "name", secretName) + signingKey, err := metaltoken.GenerateSigningSecret() + if err != nil { + return nil, fmt.Errorf("failed to generate signing secret: %w", err) + } + secret.Data = map[string][]byte{metaltoken.DiscoveryTokenSigningSecretKey: signingKey} + if err := r.Update(ctx, secret); err != nil { + return nil, fmt.Errorf("failed to update signing secret: %w", err) + } + return signingKey, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/server_controller.go` around lines 730 - 736, The current branch inside the secret-fetch logic returns an error when secret.Data[metaltoken.DiscoveryTokenSigningSecretKey] is missing or not 32 bytes, causing permanent reconciliation failure; update the block in the function handling secret retrieval so that if signingKey is missing/invalid you treat it like a corrupted secret and trigger the same regeneration/recreate logic used in the AlreadyExists path (the code that handles creating a new signing secret), i.e., delete or overwrite the invalid secret and call the existing regeneration routine to create a valid 32-byte signing key rather than returning fmt.Errorf("existing signing secret is invalid").
🧹 Nitpick comments (1)
internal/registry/server.go (1)
144-151: Test mode bypass logs at V(1) which may not be visible.When running without a K8s client, authentication is completely bypassed. The warning is logged at verbosity level 1, which may not be visible in default logging configurations. Consider logging at Info level (V(0)) to ensure operators notice this security-sensitive configuration.
📝 Suggested improvement
// Skip validation in test mode (no k8s client) if s.k8sClient == nil { - s.log.V(1).Info("Running in TEST MODE without K8s client - skipping secret loading") + s.log.Info("WARNING: Running in TEST MODE without K8s client - token validation DISABLED") // In test mode, extract systemUUID from token if possible, otherwise return empty // For now, just allow all requests in test mode return "", true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/registry/server.go` around lines 144 - 151, The test-mode bypass in validateDiscoveryToken currently logs the security-sensitive message at verbosity V(1) which may be hidden; update the logging call on the s.k8sClient==nil branch (the s.log.V(1).Info(...) invocation inside validateDiscoveryToken) to use the default/info level (e.g. s.log.Info(...) or s.log.V(0).Info(...)) and make the message explicit (mention "TEST MODE - authentication bypassed, no K8s client") so operators will see it in normal log configurations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/probe/probe.go`:
- Around line 223-226: The log call is passing a nil `err` when the HTTP request
returned a non-success status; replace that by creating a real error describing
the response (e.g. using fmt.Errorf to include resp.StatusCode and optionally
the response body) and pass that error to `a.log.Error`, or alternatively call
`a.log.Error` without an error parameter and include the status and
`a.RegistryURL` in the message; target the block that checks `resp.StatusCode`
(the `if resp.StatusCode != http.StatusOK && resp.StatusCode !=
http.StatusCreated { ... }`), update the logging to use the newly constructed
error (or message) instead of the nil `err` so the log is informative and
non-nil.
---
Outside diff comments:
In `@internal/registry/server.go`:
- Around line 310-319: The handler panics when payload.Booted is true because
s.k8sClient can be nil in test mode; before calling s.k8sClient.List(...) check
for s.k8sClient == nil and short-circuit (e.g., write a success response and
return or return a controlled error) to avoid dereferencing nil. Update the code
path that uses s.k8sClient.List, referencing s.k8sClient and the payload.Booted
check, to handle the nil client case (log a clear message and respond without
calling List).
---
Duplicate comments:
In `@internal/controller/server_controller.go`:
- Around line 730-736: The current branch inside the secret-fetch logic returns
an error when secret.Data[metaltoken.DiscoveryTokenSigningSecretKey] is missing
or not 32 bytes, causing permanent reconciliation failure; update the block in
the function handling secret retrieval so that if signingKey is missing/invalid
you treat it like a corrupted secret and trigger the same regeneration/recreate
logic used in the AlreadyExists path (the code that handles creating a new
signing secret), i.e., delete or overwrite the invalid secret and call the
existing regeneration routine to create a valid 32-byte signing key rather than
returning fmt.Errorf("existing signing secret is invalid").
---
Nitpick comments:
In `@internal/registry/server.go`:
- Around line 144-151: The test-mode bypass in validateDiscoveryToken currently
logs the security-sensitive message at verbosity V(1) which may be hidden;
update the logging call on the s.k8sClient==nil branch (the s.log.V(1).Info(...)
invocation inside validateDiscoveryToken) to use the default/info level (e.g.
s.log.Info(...) or s.log.V(0).Info(...)) and make the message explicit (mention
"TEST MODE - authentication bypassed, no K8s client") so operators will see it
in normal log configurations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2c4ad7fa-b741-4419-82bd-b7a396b8689d
📒 Files selected for processing (8)
cmd/main.godocs/discovery-token-security.mdinternal/controller/server_controller.gointernal/controller/server_controller_test.gointernal/probe/probe.gointernal/registry/server.gointernal/token/constants.gointernal/token/token_test.go
✅ Files skipped from review due to trivial changes (2)
- docs/discovery-token-security.md
- internal/token/constants.go
| if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated { | ||
| a.log.Error(err, "failed to register server", "url", a.RegistryURL) | ||
| a.log.Error(err, "failed to register server", "url", a.RegistryURL, "statusCode", resp.StatusCode) | ||
| return false, nil | ||
| } |
There was a problem hiding this comment.
Logging nil error on non-success HTTP response.
At Line 224, err is nil at this point (the HTTP request succeeded but returned a non-success status code), yet a.log.Error(err, ...) is called. This logs a nil error which is misleading.
🐛 Proposed fix
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated {
- a.log.Error(err, "failed to register server", "url", a.RegistryURL, "statusCode", resp.StatusCode)
+ a.log.Info("Failed to register server", "url", a.RegistryURL, "statusCode", resp.StatusCode)
return false, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated { | |
| a.log.Error(err, "failed to register server", "url", a.RegistryURL) | |
| a.log.Error(err, "failed to register server", "url", a.RegistryURL, "statusCode", resp.StatusCode) | |
| return false, nil | |
| } | |
| if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated { | |
| a.log.Info("Failed to register server", "url", a.RegistryURL, "statusCode", resp.StatusCode) | |
| return false, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/probe/probe.go` around lines 223 - 226, The log call is passing a
nil `err` when the HTTP request returned a non-success status; replace that by
creating a real error describing the response (e.g. using fmt.Errorf to include
resp.StatusCode and optionally the response body) and pass that error to
`a.log.Error`, or alternatively call `a.log.Error` without an error parameter
and include the status and `a.RegistryURL` in the message; target the block that
checks `resp.StatusCode` (the `if resp.StatusCode != http.StatusOK &&
resp.StatusCode != http.StatusCreated { ... }`), update the logging to use the
newly constructed error (or message) instead of the nil `err` so the log is
informative and non-nil.
Proposed Changes
Fixes #
Summary by CodeRabbit
New Features
Security / Validation
Documentation
Tests