-
Notifications
You must be signed in to change notification settings - Fork 11
ESO-182: Updates to use different TLS Secret name and disables cluster resource reconcile based on user config #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bharath-b-rh: This pull request references ESO-155 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughRename Makefile docker-* targets to image-* and switch image/bundle/catalog commands to use $(CONTAINER_TOOL); raise operator CPU/memory requests and remove limits; initialize TLS cert pool from system store with fallback; gate secret-volume diff by volume name; add webhook TLS secret handling and namespace-driven reconciler flags. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bharath-b-rh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/controller/common/utils.go (1)
213-222: Fix the volume comparison logic.The broadened condition on Line 213 now allows detection of
SecretNamedifferences whenItemsis nil, which is good. However, the inner loop logic has a critical flaw: it compares a desired volume against all fetched volumes without matching by volume name first.Current logic:
for each desiredVolume: if desiredVolume.Secret != nil: for each fetchedVolume: compare desiredVolume.Secret vs fetchedVolume.SecretThis incorrectly compares unrelated volumes and returns
trueon any mismatch, causing false positives.Apply this diff to fix the logic by matching volumes by name before comparing their secret properties:
- for _, desiredVolume := range desired.Spec.Template.Spec.Volumes { - if desiredVolume.Secret != nil { - for _, fetchedVolume := range fetched.Spec.Template.Spec.Volumes { - if !reflect.DeepEqual(desiredVolume.Secret.Items, fetchedVolume.Secret.Items) { - return true - } - if desiredVolume.Secret.SecretName != fetchedVolume.Secret.SecretName { - return true - } - } - } - } + for _, desiredVolume := range desired.Spec.Template.Spec.Volumes { + if desiredVolume.Secret != nil { + // Find the matching volume by name in the fetched deployment + found := false + for _, fetchedVolume := range fetched.Spec.Template.Spec.Volumes { + if desiredVolume.Name == fetchedVolume.Name { + found = true + if fetchedVolume.Secret == nil { + return true + } + if !reflect.DeepEqual(desiredVolume.Secret.Items, fetchedVolume.Secret.Items) { + return true + } + if desiredVolume.Secret.SecretName != fetchedVolume.Secret.SecretName { + return true + } + break + } + } + if !found { + return true + } + } + }
🧹 Nitpick comments (1)
Makefile (1)
201-204: Consider using CONTAINER_TOOL variable consistently.The docker-buildx target still uses hard-coded
dockercommands instead of$(CONTAINER_TOOL). While buildx is Docker-specific, this inconsistency might confuse users.Consider adding a comment explaining why
dockeris hard-coded here (buildx is Docker-specific), or conditionally skip this target whenCONTAINER_TOOL != docker.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (9)
Makefile(7 hunks)bundle/manifests/external-secrets-operator.clusterserviceversion.yaml(2 hunks)cmd/external-secrets-operator/main.go(1 hunks)config/manager/manager.yaml(1 hunks)config/manifests/bases/external-secrets-operator.clusterserviceversion.yaml(1 hunks)pkg/controller/common/utils.go(1 hunks)pkg/controller/external_secrets/certificate.go(1 hunks)pkg/controller/external_secrets/constants.go(1 hunks)pkg/controller/external_secrets/deployments.go(3 hunks)
🧰 Additional context used
🪛 checkmake (0.2.2)
Makefile
[warning] 429-429: Target body for "govulnscan" exceeds allowed length of 5 (16).
(maxbodylength)
🔇 Additional comments (15)
config/manager/manager.yaml (1)
123-124: LGTM! Resource adjustments align with observations and OpenShift conventions.The significant increase in resource requests (CPU: 10m→100m, memory: 64Mi→1Gi) and removal of resource limits follow OpenShift best practices. These values are based on actual cluster observations.
Consider monitoring resource usage in test environments to ensure these values remain appropriate under various workloads.
cmd/external-secrets-operator/main.go (1)
157-161: LGTM! System certificate pool integration is correct.The addition of
x509.SystemCertPool()properly leverages the system trust store before appending the OpenShift service CA certificate. The error handling withos.Exit(1)is appropriate for this initialization phase.config/manifests/bases/external-secrets-operator.clusterserviceversion.yaml (1)
8-8: LGTM! Operand delete confirmation enabled.Changing
console.openshift.io/disable-operand-deleteto"false"enables the OpenShift console to prompt users for confirmation when deleting operand custom resources, providing better user experience and preventing accidental deletions.bundle/manifests/external-secrets-operator.clusterserviceversion.yaml (3)
221-221: LGTM! Consistent with base manifest.This change aligns with the corresponding update in
config/manifests/bases/external-secrets-operator.clusterserviceversion.yaml, enabling operand deletion confirmation in the OpenShift console.
223-223: Timestamp update is expected.The
createdAttimestamp update is a standard bundle regeneration artifact.
750-751: LGTM! Resource configuration consistent with manager manifest.These resource request values match the updates in
config/manager/manager.yamland follow OpenShift conventions.Makefile (5)
182-188: LGTM! Container-tool-agnostic target naming.Renaming
docker-build→image-buildanddocker-push→image-pushmakes the Makefile more portable and aligns with theCONTAINER_TOOLvariable usage.
296-307: LGTM! Improved go-install-tool with vendor support.The addition of
GOFLAGS="-mod=vendor"ensures vendored dependencies are used during tool installation, and the messaging change from "Downloading" to "Installing" is more accurate.
422-445: Excellent govulnscan improvements.The enhanced vulnerability scanning workflow with:
- Clear progress messages
- Robust filtering of known vulnerabilities
- Better error detection and reporting
- Informative success/failure output
significantly improves the developer experience.
Note: The static analysis warning about target body length (16 > 5 lines) is a pedantic style guideline. The current implementation prioritizes clarity and robustness over brevity, which is appropriate for a critical security scanning task.
355-355: LGTM! Consistent CONTAINER_TOOL usage.Bundle build now uses
$(CONTAINER_TOOL)for container-tool-agnostic operation.
395-395: LGTM! Consistent CONTAINER_TOOL usage.Catalog build now uses
--container-tool $(CONTAINER_TOOL)parameter.pkg/controller/external_secrets/constants.go (1)
48-54: LGTM!The new constants are well-named and appropriately scoped. The comment on
certmanagerTLSSecretWebhookclearly explains why a different secret name is needed to avoid conflicts with the cert-controller-generated TLS secret.pkg/controller/external_secrets/certificate.go (1)
85-88: LGTM!The conditional logic correctly applies the new TLS secret name specifically for the webhook certificate asset, ensuring the webhook uses
certmanagerTLSSecretWebhookinstead of the default secret name that would conflict with the cert-controller.pkg/controller/external_secrets/deployments.go (2)
126-126: LGTM!The call to
updateWebhookVolumeConfigis correctly placed after the container spec update and only applies to the webhook deployment.
305-330: LGTM!The namespace-based cluster reconciler flag logic correctly implements the PR objective:
- When
operatingNamespaceis configured, the controller is restricted to that namespace and cluster-scoped reconciliation is disabled (both flags set tofalse)- When
operatingNamespaceis empty, cluster-scoped reconciliation is enabled (both flags set totrue)The explanatory comment clearly documents this behavior.
| func updateWebhookVolumeConfig(deployment *appsv1.Deployment, esc *operatorv1alpha1.ExternalSecretsConfig) { | ||
| if isCertManagerConfigEnabled(esc) { | ||
| const certsVolumeName = "certs" | ||
| if deployment.Spec.Template.Spec.Volumes == nil { | ||
| deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{ | ||
| Name: certsVolumeName, | ||
| }) | ||
| } | ||
| for i := range deployment.Spec.Template.Spec.Volumes { | ||
| if deployment.Spec.Template.Spec.Volumes[i].Name == certsVolumeName { | ||
| if deployment.Spec.Template.Spec.Volumes[i].Secret == nil { | ||
| deployment.Spec.Template.Spec.Volumes[i].Secret = &corev1.SecretVolumeSource{} | ||
| } | ||
| deployment.Spec.Template.Spec.Volumes[i].Secret.SecretName = certmanagerTLSSecretWebhook | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the volume creation logic.
The function only appends a certs volume when deployment.Spec.Template.Spec.Volumes is nil (lines 408-412). If Volumes is already non-nil but doesn't contain a certs entry, the subsequent loop (lines 413-420) won't find it, and no volume will be configured.
Apply this diff to ensure the certs volume is created when it doesn't exist:
func updateWebhookVolumeConfig(deployment *appsv1.Deployment, esc *operatorv1alpha1.ExternalSecretsConfig) {
if isCertManagerConfigEnabled(esc) {
const certsVolumeName = "certs"
- if deployment.Spec.Template.Spec.Volumes == nil {
- deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{
- Name: certsVolumeName,
- })
- }
+
+ // Find if certs volume already exists
+ certsVolumeExists := false
for i := range deployment.Spec.Template.Spec.Volumes {
if deployment.Spec.Template.Spec.Volumes[i].Name == certsVolumeName {
+ certsVolumeExists = true
if deployment.Spec.Template.Spec.Volumes[i].Secret == nil {
deployment.Spec.Template.Spec.Volumes[i].Secret = &corev1.SecretVolumeSource{}
}
deployment.Spec.Template.Spec.Volumes[i].Secret.SecretName = certmanagerTLSSecretWebhook
+ break
}
}
+
+ // If certs volume doesn't exist, create it
+ if !certsVolumeExists {
+ deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{
+ Name: certsVolumeName,
+ VolumeSource: corev1.VolumeSource{
+ Secret: &corev1.SecretVolumeSource{
+ SecretName: certmanagerTLSSecretWebhook,
+ },
+ },
+ })
+ }
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func updateWebhookVolumeConfig(deployment *appsv1.Deployment, esc *operatorv1alpha1.ExternalSecretsConfig) { | |
| if isCertManagerConfigEnabled(esc) { | |
| const certsVolumeName = "certs" | |
| if deployment.Spec.Template.Spec.Volumes == nil { | |
| deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{ | |
| Name: certsVolumeName, | |
| }) | |
| } | |
| for i := range deployment.Spec.Template.Spec.Volumes { | |
| if deployment.Spec.Template.Spec.Volumes[i].Name == certsVolumeName { | |
| if deployment.Spec.Template.Spec.Volumes[i].Secret == nil { | |
| deployment.Spec.Template.Spec.Volumes[i].Secret = &corev1.SecretVolumeSource{} | |
| } | |
| deployment.Spec.Template.Spec.Volumes[i].Secret.SecretName = certmanagerTLSSecretWebhook | |
| } | |
| } | |
| } | |
| } | |
| func updateWebhookVolumeConfig(deployment *appsv1.Deployment, esc *operatorv1alpha1.ExternalSecretsConfig) { | |
| if isCertManagerConfigEnabled(esc) { | |
| const certsVolumeName = "certs" | |
| // Find if certs volume already exists | |
| certsVolumeExists := false | |
| for i := range deployment.Spec.Template.Spec.Volumes { | |
| if deployment.Spec.Template.Spec.Volumes[i].Name == certsVolumeName { | |
| certsVolumeExists = true | |
| if deployment.Spec.Template.Spec.Volumes[i].Secret == nil { | |
| deployment.Spec.Template.Spec.Volumes[i].Secret = &corev1.SecretVolumeSource{} | |
| } | |
| deployment.Spec.Template.Spec.Volumes[i].Secret.SecretName = certmanagerTLSSecretWebhook | |
| break | |
| } | |
| } | |
| // If certs volume doesn't exist, create it | |
| if !certsVolumeExists { | |
| deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{ | |
| Name: certsVolumeName, | |
| VolumeSource: corev1.VolumeSource{ | |
| Secret: &corev1.SecretVolumeSource{ | |
| SecretName: certmanagerTLSSecretWebhook, | |
| }, | |
| }, | |
| }) | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In pkg/controller/external_secrets/deployments.go around lines 405 to 422, the
code only appends the "certs" volume when deployment.Spec.Template.Spec.Volumes
is nil so if Volumes is non-nil but missing a "certs" entry it never adds one;
change the logic to scan existing volumes for a "certs" entry, and if not found
append a corev1.Volume{Name: certsVolumeName} to the slice before the loop, then
in the loop ensure the Secret field is initialized (if nil) and set
Secret.SecretName = certmanagerTLSSecretWebhook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required, since certs is only the volume expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bharath-b-rh Thank you for the clarification! If the webhook deployment asset structure ensures that Volumes is either nil or already contains the certs volume, then the current implementation is indeed sufficient. I appreciate you confirming that the logic works correctly for your use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required, since
certsis only the volume expected.
Then we can just access deployment.Spec.Template.Spec.Volumes[0] directly right? Why the loop and nil check? To cover for any mutating webhooks outside the operator's control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
===> Then we can just access deployment.Spec.Template.Spec.Volumes[0] directly right? Why the loop and nil check? To cover for any mutating webhooks outside the operator's control?
|
@bharath-b-rh: This pull request references ESO-182 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ca5a522 to
5527bde
Compare
|
/label px-approved |
|
@bharath-b-rh: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
Makefile(7 hunks)
🧰 Additional context used
🪛 checkmake (0.2.2)
Makefile
[warning] 429-429: Target body for "govulnscan" exceeds allowed length of 5 (16).
(maxbodylength)
| $(CONTAINER_TOOL) buildx use external-secrets-operator-builder | ||
| - $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --tag ${IMG} -f Dockerfile.cross . | ||
| - $(CONTAINER_TOOL) buildx rm external-secrets-operator-builder | ||
| - docker buildx create --name external-secrets-operator-builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker is used instead of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[claude-generated] Critical Issues Identified:
1. Volume Comparison Logic Bug (pkg/controller/common/utils.go:212-221)
The loop logic in deploymentSpecModified appears incorrect. You're iterating over desired volumes and for each desiredVolume.Secret, you check ALL fetchedVolume.Secret.Items without matching volumes by name first. This means you're comparing wrong volumes and could cause false positives in detecting deployment changes.
Recommendation: Match volumes by name first, then compare Secret.Items only for matching volumes.
2. SystemCertPool Error Handling (cmd/external-secrets-operator/main.go:157-160)
If x509.SystemCertPool() fails, the application exits immediately. Consider:
- Is SystemCertPool() failure expected on certain platforms/containers?
- Should there be a fallback to
x509.NewCertPool()if system pool is unavailable? - This could break existing deployments if system cert pool becomes unavailable
3. Resource Request Increases (config/manager/manager.yaml:122-123)
CPU request increased from 10m to 100m (10x) and memory from 64Mi to 1Gi (16x). This is significant and needs justification:
- What workload patterns drove these increases?
- Has this been tested in resource-constrained environments?
- Will this impact the operator's ability to run on smaller clusters?
4. Resource Limits Removal (config/manager/manager.yaml)
While OpenShift conventions recommend removing limits, verify this won't cause resource contention. Consider memory usage patterns under load and whether cluster resource quotas are in place.
5. Operand Deletion Enabled (config/manifests/bases/external-secrets-operator.clusterserviceversion.yaml:8)
Setting disable-operand-delete: false allows users to delete operand CRs from console. Ensure:
- This is intentional and documented
- Users understand implications
- Safeguards exist for data loss prevention
- No orphaned resources remain
6. updateWebhookVolumeConfig Logic (pkg/controller/external_secrets/deployments.go:405-421)
The function assumes the volume exists and is initialized. Edge cases:
- If
deployment.Spec.Template.Spec.Volumesis nil, appending fails - If no volume with name "certs" exists, the loop completes without creating it
- Could result in nil pointer dereference or silent failure
Recommendation: Ensure the volume exists before trying to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline code review - critical issues
pkg/controller/common/utils.go
Outdated
| return true | ||
| } | ||
| if desiredVolume.Secret.SecretName != fetchedVolume.Secret.SecretName { | ||
| return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[claude-generated] Critical: This loop logic is incorrect. You're comparing all desired volumes against all fetched volumes without matching by name first. This will cause false positives.
Fix:
for _, desiredVolume := range desired.Spec.Template.Spec.Volumes {
if desiredVolume.Secret != nil {
for _, fetchedVolume := range fetched.Spec.Template.Spec.Volumes {
if desiredVolume.Name == fetchedVolume.Name { // Match by name first!
if !reflect.DeepEqual(desiredVolume.Secret.Items, fetchedVolume.Secret.Items) {
return true
}
if !reflect.DeepEqual(desiredVolume.Secret.SecretName, fetchedVolume.Secret.SecretName) {
return true
}
}
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here, we are relying on the fact that in any of the deployments there will only be one volume that will be mounted from secrets.
i feel this being common code, its quite a bit about how the deployment assets are structured.
We can add in the Name comparison.
Also i have 2 more questions:
- Why don't we just use controllerutil.CreateOrUpdate and avoid our custom matching?
- We are starting of with specific CreateOrApplyDeploymentFromAsset to a common facility HasObjectChanged, but there again we have a big switch case for each type. Why not just call deploymentSpecModified from CreateOrApplyDeploymentFromAsset? If we call into a common function that accepts a higher abstraction (i.e. Object vs Deployment), i think common function is expected to work at that higher level of abstraction instead of inspecting the types again unless its some kind of a framework (OpenClosedPrinciple).
Similar comment i had earlier on restructuring deployment struct in createOrApplyDeployment but i couldn't locate it now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are having our own helper functions because, we are using the static manifests to derive the desired objects, which will not have the default values populated in each of these objects. And the direct comparison at the spec will always fail and cause update to happen. Instead we are specifically looking for the desired fields.
| if err != nil { | ||
| setupLog.Error(err, "failed to load system certificate pool") | ||
| os.Exit(1) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[claude-generated] Critical: Hard exit on SystemCertPool() failure could break deployments. Consider adding a fallback:
certPool, err := x509.SystemCertPool()
if err != nil {
setupLog.Info("system cert pool unavailable, using empty pool", "error", err)
certPool = x509.NewCertPool()
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like claude mentioned above, is it really a fatal error if SystemCertPool() cannot be fetched that we need to do a hard exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel yes, atleast in RHEL which is our base image, the system CA certificates will always be present, and if at all failure occurs it would be a genuine failure and we don't want proceed I think.
| cpu: 10m | ||
| memory: 64Mi | ||
| cpu: 100m | ||
| memory: 1Gi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[claude-generated] Critical: CPU request increased from 10m to 100m (10x) and memory from 64Mi to 1Gi (16x). Please provide:
- Performance test results showing why 1Gi memory is needed
- CPU utilization metrics from production/staging
- Impact analysis on resource-constrained clusters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline with above claude comment, curious to know 1Gi memory need? Most of it is cache or is there any other significant use of memory in this operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was decided based on the observation added in this comment. With just one additional day-2 operator, the memory reached 512Mib when the operator started. And I think it's safe to keep 1Gi considering the large clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have not seen the memory go beyond 62MiB for the operator POD.. i started a k top pod in while loop and then installed the operator and created the ExternalSecretsConfig CR on my GCP cluster.
while true; do kubectl -n external-secrets-operator top pod; done
No resources found in external-secrets-operator namespace.
No resources found in external-secrets-operator namespace.
// Installed operator from console
error: metrics not available yet
error: metrics not available yet
// Operator came up
NAME CPU(cores) MEMORY(bytes)
external-secrets-operator-controller-manager-5685dc9cf7-kghkn 34m 62Mi
NAME CPU(cores) MEMORY(bytes)
external-secrets-operator-controller-manager-5685dc9cf7-kghkn 34m 62Mi
...
...
external-secrets-operator-controller-manager-5685dc9cf7-kghkn 1m 57Mi
NAME CPU(cores) MEMORY(bytes)
external-secrets-operator-controller-manager-5685dc9cf7-kghkn 1m 57Mi
...
...
// Created ExternalSecretsConfig
external-secrets-operator-controller-manager-5685dc9cf7-kghkn 9m 58Mi
NAME CPU(cores) MEMORY(bytes)
external-secrets-operator-controller-manager-5685dc9cf7-kghkn 9m 58Mi
Also, we already had limit as 128Mi which went through QA testing.
In addition, i tested with 50Mi limit which as expected crashed with OOM.
k -n external-secrets-operator get po
NAME READY STATUS RESTARTS AGE
external-secrets-operator-controller-manager-85b4888895-4pqjc 0/1 CrashLoopBackOff 2 (20s ago) 77s
k -n external-secrets-operator get po -o yaml | grep -A2 -B2 OOM
exitCode: 137
finishedAt: "2025-10-10T11:41:08Z"
reason: OOMKilled
startedAt: "2025-10-10T11:40:50Z"
name: manager
But even with 64Mi limit there are no OOMKilled.
Thus i feel 128Mi could be a safe resource request allowing for easier POD scheduling than to have it as 1Gi, unless i am missing something here.
| capabilities: Basic Install | ||
| categories: Security | ||
| console.openshift.io/disable-operand-delete: "true" | ||
| console.openshift.io/disable-operand-delete: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[claude-generated] Security: Enabling operand deletion (disable-operand-delete: "false") is a significant change. Ensure:
- This is documented
- Users understand implications
- Safeguards exist for data loss prevention
- Highlighted in release notes
| @@ -389,3 +401,22 @@ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[claude-generated] Good: The logic for disabling cluster-scoped reconciliation when operatingNamespace is set looks correct. Consider:
- Add e2e tests to verify cluster resources are truly ignored when operatingNamespace is set
- Update API docs to explain this behavior
- What happens if cluster resources exist and then operatingNamespace is enabled? Need migration path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add e2e tests to verify cluster resources are truly ignored when operatingNamespace is set
Please refer comment for the tests. Could we add e2e test case in our suite in a follow up PR?
What happens if cluster resources exist and then operatingNamespace is enabled? Need migration path.
The resources outside of the operatingNamespace will not be reconciled after namespace is configured. And the same applies for other way around too, which is checked in verification
cba3253 to
9703a95
Compare
…amespace is configured Signed-off-by: Bharath B <[email protected]>
…nabled Signed-off-by: Bharath B <[email protected]>
Signed-off-by: Bharath B <[email protected]>
Signed-off-by: Bharath B <[email protected]>
Signed-off-by: Bharath B <[email protected]>
9703a95 to
84b4046
Compare
|
/lgtm |
|
/cherrypick release-1.0 |
|
@bharath-b-rh: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Bharath B <[email protected]>
c887dab to
ca2c73f
Compare
|
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (4)
bundle/manifests/external-secrets-operator.clusterserviceversion.yaml(2 hunks)cmd/external-secrets-operator/main.go(1 hunks)pkg/controller/common/utils.go(1 hunks)pkg/controller/external_secrets/constants.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/controller/common/utils.go
- bundle/manifests/external-secrets-operator.clusterserviceversion.yaml
🔇 Additional comments (1)
cmd/external-secrets-operator/main.go (1)
159-163: LGTM! Fallback implementation addresses previous review concern.The code now implements the suggested fallback approach: attempting to load the system certificate pool, logging a non-fatal info message on failure, and creating an empty pool to continue. Since the OpenShift CA certificate is appended afterward (Line 170), the metrics server will still function correctly even if the system pool is unavailable.
| // certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different | ||
| // name is used to avoiding clash with the secret created by the inbuilt cert-controller component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammar in comment.
The comment contains a grammatical error.
Apply this diff:
- // certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different
- // name is used to avoiding clash with the secret created by the inbuilt cert-controller component.
+ // certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different
+ // name is used to avoid clash with the secret created by the inbuilt cert-controller component.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different | |
| // name is used to avoiding clash with the secret created by the inbuilt cert-controller component. | |
| // certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different | |
| // name is used to avoid clash with the secret created by the inbuilt cert-controller component. |
🤖 Prompt for AI Agents
In pkg/controller/external_secrets/constants.go around lines 52 to 53, the
comment has a grammatical error; update the sentence "A different name is used
to avoiding clash with the secret created by the inbuilt cert-controller
component." to a correct form such as "A different name is used to avoid a clash
with the secret created by the inbuilt cert-controller component." (or "to avoid
clashing with the secret...") so the comment reads clearly and grammatically
correct.
|
@bharath-b-rh: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@bharath-b-rh: This pull request references ESO-182 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
@bharath-b-rh: #75 failed to apply on top of branch "release-1.0": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherrypick release-1.0 |
|
@bharath-b-rh: #75 failed to apply on top of branch "release-1.0": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |

PR has following the changes:
external-secretscore controller args to disable cluster custom resources reconciliation when operatingNamespace is enabled. This will restrict operand from processing any other custom resources created outside of the operatingNamespace, including the cluster resources.