Parameterize certificate get and store operations.#283
Parameterize certificate get and store operations.#283juriskrumins wants to merge 1 commit intomedik8s:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: juriskrumins The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @juriskrumins. Thanks for your PR. I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
📝 WalkthroughWalkthroughThis pull request adds a configurable timeout parameter for certificate storage API operations, making the previously hardcoded 10-second timeout adjustable through the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip We've launched Issue Planner and it is currently in beta. Please try it out and share your feedback on Discord! Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
controllers/tests/config/suite_test.go (1)
102-109:⚠️ Potential issue | 🟠 Major
SecretCertStorageApiTimeoutnot set on the test reconciler.The
SelfNodeRemediationConfigReconcilerhere doesn't setSecretCertStorageApiTimeout, so it defaults to0. WhensyncCertsruns during reconciliation, it will create aSecretCertStoragewith a zero timeout, which could cause test flakiness or unexpected behavior.🔧 Suggested fix
err = (&controllers.SelfNodeRemediationConfigReconciler{ Client: k8sManager.GetClient(), Log: ctrl.Log.WithName("controllers").WithName("self-node-remediation-config-controller"), InstallFileFolder: "../../../install/", Scheme: scheme.Scheme, Namespace: shared.Namespace, RebootDurationCalculator: shared.MockRebootDurationCalculator{}, + SecretCertStorageApiTimeout: 10 * time.Second, }).SetupWithManager(k8sManager)api/v1alpha1/selfnoderemediationconfig_webhook_test.go (2)
337-352:⚠️ Potential issue | 🔴 CriticalMissing
certStorageApiTimeoutcase insetFieldValue— the test field override is silently ignored.
setFieldValuehas nocase certStorageApiTimeout:branch, socreateSelfNodeRemediationConfigCRWithFieldValue(certStorageApiTimeout, ...)never applies the test value. The CR retains the default (10s), and the "invalid value" test for this field will not actually test an invalid value.🐛 Proposed fix
case peerUpdateInterval: snrc.Spec.PeerUpdateInterval = timeValue + case certStorageApiTimeout: + snrc.Spec.CertStorageApiTimeout = timeValue }
275-309:⚠️ Potential issue | 🔴 Critical
testValidCRdoesn't setCertStorageApiTimeout— nil pointer dereference invalidateTimes.The CR is constructed as a bare struct (line 276) and only existing Duration fields are explicitly set (lines 281–286).
CertStorageApiTimeoutremainsnil. WhenvalidateTimesaccessesspec.CertStorageApiTimeout.Duration(webhook.go line 147), it will panic.🐛 Proposed fix
snrc.Spec.PeerUpdateInterval = &metav1.Duration{Duration: 10 * time.Second} + snrc.Spec.CertStorageApiTimeout = &metav1.Duration{Duration: 10 * time.Second} snrc.Spec.CustomDsTolerations = []v1.Toleration{{Key: "validValue", Effect: v1.TaintEffectNoSchedule}, {}, {Operator: v1.TolerationOpEqual, TolerationSeconds: pointer.Int64(-5)}, {Value: "SomeValidValue"}}main.go (1)
97-108:⚠️ Potential issue | 🔴 CriticalFix type mismatch: controller renders float64 seconds that IntVar flag cannot parse.
The controller converts
CertStorageApiTimeoutto float64 via.Seconds()and renders it directly in the template:--cert-storage-api-timeout={{.CertStorageApiTimeout}}. However,flag.IntVarcannot parse decimal values. If a user sets a fractional duration in the CRD (e.g.,500msor1.5s), the template will render0.5or1.5respectively, causing the pod to crash on startup with a flag parse error.Fix: Either convert to int64 in the controller before rendering (e.g.,
int64(snrConfig.Spec.CertStorageApiTimeout.Seconds())), or change the flag to accept a duration string format instead of an integer.
🤖 Fix all issues with AI agents
In `@api/v1alpha1/selfnoderemediationconfig_webhook_test.go`:
- Line 43: The test case uses certStorageApiTimeout with a value equal to the
minimum (10 * time.Second) so validation won't reject it; change the test input
to a value strictly below minDurCertStorageApiTimeout (for example 9 *
time.Second) so the validation in the webhook test for certStorageApiTimeout
correctly triggers rejection; update the test row where certStorageApiTimeout is
paired with minDurCertStorageApiTimeout to use a sub-minimum duration (e.g., 9 *
time.Second).
In `@controllers/selfnoderemediationconfig_controller.go`:
- Line 160: The code directly calls
snrConfig.Spec.CertStorageApiTimeout.Seconds() which can be nil; guard by
checking if snrConfig.Spec.CertStorageApiTimeout != nil before calling
Seconds(), and set data.Data["CertStorageApiTimeout"] to the default (e.g., 10)
or 0 when nil; apply the same nil-guard pattern used for other timeout fields
(the CertStorageApiTimeout field in snrConfig.Spec) to prevent panics.
In `@install/self-node-remediation-deamonset.yaml`:
- Line 42: The template renders CertStorageApiTimeout as a float (via
.Seconds()) causing values like "10.0" which flag.IntVar in main.go cannot
parse; update the DaemonSet template to emit an integer number of nanoseconds or
whole seconds to match main.go's parsing: either change the template to use
.Nanoseconds() (matching other timeouts) or apply math.Ceil on
snrConfig.Spec.CertStorageApiTimeout.Seconds() and cast to int before rendering
so the rendered --cert-storage-api-timeout value is an integer that flag.IntVar
can parse; locate uses of CertStorageApiTimeout, the DaemonSet flag rendering
line, and the parsing in main.go (flag.IntVar) to make the change consistently.
In `@main.go`:
- Around line 107-108: The flag certStorageApiTimeout (registered with
flag.IntVar) can be zero or negative which makes any context.WithTimeout using
it immediately expired; add validation after flag.Parse() to ensure
certStorageApiTimeout > 0 (e.g., if <= 0 then set a sensible minimum like 1,
return an error, or log.Fatal) and use the validated value when creating
contexts for certificate API calls (references: certStorageApiTimeout and the
code paths that call context.WithTimeout for collecting
self-node-remediation-certificates); ensure the chosen behavior is consistent
(reject invalid input or coerce to a minimum) and update any related error
messages to mention the timeout value.
🧹 Nitpick comments (2)
controllers/tests/config/suite_test.go (1)
123-123: Non-idiomatic duration expression.
time.Duration(10)*time.Secondworks but is unusual. The idiomatic form is10 * time.Second.♻️ Suggested fix
- certReader = certificates.NewSecretCertStorage(k8sClient, ctrl.Log.WithName("SecretCertStorage"), shared.Namespace, time.Duration(10)*time.Second) + certReader = certificates.NewSecretCertStorage(k8sClient, ctrl.Log.WithName("SecretCertStorage"), shared.Namespace, 10*time.Second)pkg/certificates/storage_test.go (1)
31-31: Nit:time.Duration(10)*time.Secondcan be simplified to10 * time.Second.The explicit cast is unnecessary here since
10is an untyped constant. This same pattern appears inmain.goas well (see comment there).Suggested simplification
- store := NewSecretCertStorage(k8sClient, ctrl.Log.WithName("TestSecretCertStore"), "default", time.Duration(10)*time.Second) + store := NewSecretCertStorage(k8sClient, ctrl.Log.WithName("TestSecretCertStore"), "default", 10*time.Second)
| {peerRequestTimeout, -1 * time.Minute, minDurPeerRequestTimeout}, | ||
| {apiCheckInterval, -1 * time.Second, minDurApiCheckInterval}, | ||
| {peerUpdateInterval, -10 * time.Second, minDurPeerUpdateInterval}, | ||
| {certStorageApiTimeout, 10 * time.Second, minDurCertStorageApiTimeout}, |
There was a problem hiding this comment.
Test value equals the minimum — this test case won't trigger a rejection.
The test value 10 * time.Second equals minDurCertStorageApiTimeout (also 10s). The validation only rejects values less than the minimum. This test expects rejection but will pass validation, causing a test failure.
Use a value strictly below the minimum (e.g., 9 * time.Second):
🐛 Proposed fix
- {certStorageApiTimeout, 10 * time.Second, minDurCertStorageApiTimeout},
+ {certStorageApiTimeout, 9 * time.Second, minDurCertStorageApiTimeout},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {certStorageApiTimeout, 10 * time.Second, minDurCertStorageApiTimeout}, | |
| {certStorageApiTimeout, 9 * time.Second, minDurCertStorageApiTimeout}, |
🤖 Prompt for AI Agents
In `@api/v1alpha1/selfnoderemediationconfig_webhook_test.go` at line 43, The test
case uses certStorageApiTimeout with a value equal to the minimum (10 *
time.Second) so validation won't reject it; change the test input to a value
strictly below minDurCertStorageApiTimeout (for example 9 * time.Second) so the
validation in the webhook test for certStorageApiTimeout correctly triggers
rejection; update the test row where certStorageApiTimeout is paired with
minDurCertStorageApiTimeout to use a sub-minimum duration (e.g., 9 *
time.Second).
| data.Data["MinPeersForRemediation"] = snrConfig.Spec.MinPeersForRemediation | ||
| data.Data["HostPort"] = snrConfig.Spec.HostPort | ||
| data.Data["IsSoftwareRebootEnabled"] = fmt.Sprintf("\"%t\"", snrConfig.Spec.IsSoftwareRebootEnabled) | ||
| data.Data["CertStorageApiTimeout"] = snrConfig.Spec.CertStorageApiTimeout.Seconds() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's look at the specific file and lines around the concern
cat -n controllers/selfnoderemediationconfig_controller.go | sed -n '140,170p'Repository: medik8s/self-node-remediation
Length of output: 1812
🏁 Script executed:
# Search for CertStorageApiTimeout usage across the codebase
rg -n 'CertStorageApiTimeout' --type=go -C 3Repository: medik8s/self-node-remediation
Length of output: 9510
🏁 Script executed:
# Search for webhook definitions that might set defaults
rg -n 'webhook\|defaulting\|mutating' --type=go -i | head -50Repository: medik8s/self-node-remediation
Length of output: 55
🏁 Script executed:
# Look for the CRD definition or struct that defines the field
fd -e yaml -e json | xargs rg -l 'CertStorageApiTimeout' 2>/dev/null | head -10Repository: medik8s/self-node-remediation
Length of output: 117
🏁 Script executed:
# Get full webhook validation logic around line 147
cat -n api/v1alpha1/selfnoderemediationconfig_webhook.go | sed -n '130,160p'Repository: medik8s/self-node-remediation
Length of output: 1367
🏁 Script executed:
# Look for where defaults might be set in the webhook
rg -n 'Default|default' api/v1alpha1/selfnoderemediationconfig_webhook.go -C 3Repository: medik8s/self-node-remediation
Length of output: 1641
🏁 Script executed:
# Check the types of other timeout fields to see if they're also optional
cat -n api/v1alpha1/selfnoderemediationconfig_types.go | sed -n '120,148p'Repository: medik8s/self-node-remediation
Length of output: 1775
🏁 Script executed:
# Look for the Default method in the webhook
rg -n 'func.*Default' api/v1alpha1/selfnoderemediationconfig_webhook.go -A 20Repository: medik8s/self-node-remediation
Length of output: 55
🏁 Script executed:
# Look at the full validate method to see the order of operations
cat -n api/v1alpha1/selfnoderemediationconfig_webhook.go | sed -n '80,135p'Repository: medik8s/self-node-remediation
Length of output: 2826
🏁 Script executed:
# Search for the Default method (mutating webhook)
rg -n 'func.*Default' api/v1alpha1/ --type=goRepository: medik8s/self-node-remediation
Length of output: 710
🏁 Script executed:
# Check if there's a separate defaulter for this type
fd . api/v1alpha1/ -name "*webhook*.go" -o -name "*default*.go"Repository: medik8s/self-node-remediation
Length of output: 303
🏁 Script executed:
# Look for all webhook-related files
ls -la api/v1alpha1/ | grep -i webhookRepository: medik8s/self-node-remediation
Length of output: 673
🏁 Script executed:
# Check if there's a Default method for SelfNodeRemediationConfig anywhere
rg -n 'func.*SelfNodeRemediationConfig.*Default' api/v1alpha1/ --type=go -A 5Repository: medik8s/self-node-remediation
Length of output: 55
🏁 Script executed:
# Look at the complete SelfNodeRemediationConfigSpec to see which fields are optional
cat -n api/v1alpha1/selfnoderemediationconfig_types.go | sed -n '55,148p'Repository: medik8s/self-node-remediation
Length of output: 5107
🏁 Script executed:
# Check if there's a mutating webhook or default handler that would apply these defaults
rg -n 'Default|Defaulting' api/v1alpha1/selfnoderemediationconfig_webhook.go -C 3Repository: medik8s/self-node-remediation
Length of output: 55
🏁 Script executed:
# Look for any pre-existing nil checks for these timeout fields in the codebase
rg -n 'CertStorageApiTimeout.*nil\|if.*CertStorageApiTimeout' --type=go -B 2 -A 2Repository: medik8s/self-node-remediation
Length of output: 55
🏁 Script executed:
# Check test cases to see if they test nil scenarios
rg -n 'nil\|CertStorageApiTimeout' api/v1alpha1/selfnoderemediationconfig_webhook_test.go -B 2 -A 2Repository: medik8s/self-node-remediation
Length of output: 55
🏁 Script executed:
# Check the actual webhook validation more carefully - look at validateTimes and how it's called
cat -n api/v1alpha1/selfnoderemediationconfig_webhook.go | sed -n '134,160p'Repository: medik8s/self-node-remediation
Length of output: 1299
🏁 Script executed:
# Check if there's any nil guard before validateTimes is called
rg -n 'validateTimes' api/v1alpha1/selfnoderemediationconfig_webhook.go -B 5 -A 5Repository: medik8s/self-node-remediation
Length of output: 1237
🏁 Script executed:
# Look for any checks before line 160 in the controller
cat -n controllers/selfnoderemediationconfig_controller.go | sed -n '125,165p'Repository: medik8s/self-node-remediation
Length of output: 2270
Add nil guard before accessing CertStorageApiTimeout.Seconds().
The field is optional (+optional) and can be nil. Although +kubebuilder:default:="10s" should apply in normal scenarios, edge cases like CRD/CR upgrades from older versions or webhook failures could leave it nil, causing a panic. This pattern affects all timeout fields (lines 149–154), but adding a nil guard here protects the latest field and sets a pattern for the others.
🤖 Prompt for AI Agents
In `@controllers/selfnoderemediationconfig_controller.go` at line 160, The code
directly calls snrConfig.Spec.CertStorageApiTimeout.Seconds() which can be nil;
guard by checking if snrConfig.Spec.CertStorageApiTimeout != nil before calling
Seconds(), and set data.Data["CertStorageApiTimeout"] to the default (e.g., 10)
or 0 when nil; apply the same nil-guard pattern used for other timeout fields
(the CertStorageApiTimeout field in snrConfig.Spec) to prevent panics.
| containers: | ||
| - args: | ||
| - --is-manager=false | ||
| - --cert-storage-api-timeout={{.CertStorageApiTimeout}} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the daemonset file and controller code
find . -type f -name "*daemonset*" -o -name "*controller*" -o -name "main.go" | head -20Repository: medik8s/self-node-remediation
Length of output: 1289
🏁 Script executed:
# Search for CertStorageApiTimeout and cert-storage-api-timeout across the repo
rg -l "CertStorageApiTimeout|cert-storage-api-timeout" --type=go --type=yamlRepository: medik8s/self-node-remediation
Length of output: 396
🏁 Script executed:
# Search for context around cert-storage-api-timeout parameter
rg "cert-storage-api-timeout" -C5 --type=go --type=yamlRepository: medik8s/self-node-remediation
Length of output: 1604
🏁 Script executed:
# Read the daemonset YAML file to see all timeout parameters
cat install/self-node-remediation-deamonset.yamlRepository: medik8s/self-node-remediation
Length of output: 3627
🏁 Script executed:
# Read the controller code around line 160 where CertStorageApiTimeout is rendered
sed -n '140,180p' controllers/selfnoderemediationconfig_controller.goRepository: medik8s/self-node-remediation
Length of output: 1885
🏁 Script executed:
# Read main.go to see the full context of certStorageApiTimeout parsing
sed -n '1,100p' main.goRepository: medik8s/self-node-remediation
Length of output: 3436
🏁 Script executed:
# Read more of main.go to see how certStorageApiTimeout is used
sed -n '75,130p' main.goRepository: medik8s/self-node-remediation
Length of output: 2119
🏁 Script executed:
# Check the type definition of CertStorageApiTimeout in the API types
grep -n "CertStorageApiTimeout" api/v1alpha1/selfnoderemediationconfig_types.go -A2 -B2Repository: medik8s/self-node-remediation
Length of output: 231
🏁 Script executed:
# Check if CertStorageApiTimeout has a kubebuilder default
grep -n "CertStorageApiTimeout" api/v1alpha1/ -r -B5Repository: medik8s/self-node-remediation
Length of output: 4859
🏁 Script executed:
# Verify that time.Duration.Seconds() returns float64 by checking the actual usage
rg "\.Seconds\(\)" --type=go -B2 -A2Repository: medik8s/self-node-remediation
Length of output: 30122
🏁 Script executed:
# Check if there's any conversion happening elsewhere or if certStorageApiTimeout is used
grep -n "certStorageApiTimeout" main.go -A5 -B5Repository: medik8s/self-node-remediation
Length of output: 1882
🏁 Script executed:
# Verify the exact behavior of metav1.Duration with Seconds()
cat api/v1alpha1/selfnoderemediationconfig_types.go | grep -A10 "CertStorageApiTimeout"Repository: medik8s/self-node-remediation
Length of output: 503
Critical bug: --cert-storage-api-timeout flag will fail to parse at runtime.
The controller renders CertStorageApiTimeout using .Seconds(), which returns a float64 value. For the default "10s" duration, this renders to --cert-storage-api-timeout=10.0 in the DaemonSet. However, main.go parses this flag with flag.IntVar(), which cannot parse "10.0" as a valid integer. This will cause a runtime error when the DaemonSet agent starts.
Additionally, this approach is inconsistent with other timeout parameters, which use .Nanoseconds() (returning int64) and are passed as environment variables. Change CertStorageApiTimeout to use .Nanoseconds() or apply math.Ceil(snrConfig.Spec.CertStorageApiTimeout.Seconds()) to convert to int before rendering.
🤖 Prompt for AI Agents
In `@install/self-node-remediation-deamonset.yaml` at line 42, The template
renders CertStorageApiTimeout as a float (via .Seconds()) causing values like
"10.0" which flag.IntVar in main.go cannot parse; update the DaemonSet template
to emit an integer number of nanoseconds or whole seconds to match main.go's
parsing: either change the template to use .Nanoseconds() (matching other
timeouts) or apply math.Ceil on snrConfig.Spec.CertStorageApiTimeout.Seconds()
and cast to int before rendering so the rendered --cert-storage-api-timeout
value is an integer that flag.IntVar can parse; locate uses of
CertStorageApiTimeout, the DaemonSet flag rendering line, and the parsing in
main.go (flag.IntVar) to make the change consistently.
| flag.IntVar(&certStorageApiTimeout, "cert-storage-api-timeout", 10, | ||
| "Set Kubernetes timeout value while collecting self-node-remediation-certificates secret.") |
There was a problem hiding this comment.
No validation for zero or negative timeout values.
If a user passes --cert-storage-api-timeout=0 or a negative value, the resulting context.WithTimeout will create an already-expired context, causing all certificate API calls to fail immediately. Consider adding a guard.
Suggested validation
flag.IntVar(&certStorageApiTimeout, "cert-storage-api-timeout", 10,
"Set Kubernetes timeout value while collecting self-node-remediation-certificates secret.")
+// ... after flag.Parse():
+if certStorageApiTimeout <= 0 {
+ setupLog.Error(fmt.Errorf("invalid cert-storage-api-timeout: %d", certStorageApiTimeout), "cert-storage-api-timeout must be positive")
+ os.Exit(1)
+}🤖 Prompt for AI Agents
In `@main.go` around lines 107 - 108, The flag certStorageApiTimeout (registered
with flag.IntVar) can be zero or negative which makes any context.WithTimeout
using it immediately expired; add validation after flag.Parse() to ensure
certStorageApiTimeout > 0 (e.g., if <= 0 then set a sensible minimum like 1,
return an error, or log.Fatal) and use the validated value when creating
contexts for certificate API calls (references: certStorageApiTimeout and the
code paths that call context.WithTimeout for collecting
self-node-remediation-certificates); ensure the chosen behavior is consistent
(reject invalid input or coerce to a minimum) and update any related error
messages to mention the timeout value.
|
Thanks for providing a PR. I went the more direct approach of waiting on informer sync in this PR, WDYT? #282 |
|
If later reads will take more then 10s ? As far as I understand it creates cache and sync from k8s tom this cache. So cache should probably eventually expire at some point in time, no ? |
|
No, there isn't a cache expiration. It's updated on the fly when secrets are added / modified / deleted. It's just the initial read of all secrets which can be slow for many secrets. |
|
I can test it, but |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
the get function is getting the one specific secret only. However, the informer always has all of them 🤷🏼♂️ |
|
Ok, thanks for the explanation. |
|
when things go well, we will have this in a 0.12.1 bugfix release in 2-3 weeks |
|
And one more question, looking at the code once again specifically related to informers as far as I understand it's not possible to specify filter ( for example based on namespace and kind ) on which objects we'd like to sync to the cache and keep eventually updated later on. Which is from my perspective is the root cause, since in cluster with huge amount of objects we're effectively pulling in all specified objects (in this case all Secrets land in the cache). Correct me if I'm wrong, probably don't understand all this good enough. |
|
feel free to further investigate if we can change the informer. Priority was to have simple quick fix for the planned release for now... |
Why we need this PR
I've created issue, we have in our cluster - too short timeout for certificate collection -10 seconds. Works good for us when we set it something like 30-40s, but I suppose parameter with default value of 10s is fine.
Changes made
Parameterize constant -
apiTimeout = 10 * time.Secondin certificate storage.Which issue(s) this PR fixes
Fixes #281
Test plan
Summary by CodeRabbit
--cert-storage-api-timeoutflag (default: 10 seconds).