Skip to content

Conversation

@bharath-b-rh
Copy link
Contributor

PR has following the changes:

  • Updates the TLS secret name used by the webhook component when the cert-manager config is enabled. This is avoid conflict with the cert-controller generated TLS secret, where earlier both Secret names were same.
  • Updates the external-secrets core controller args to disable cluster custom resources reconciliation when operatingNamespace is enabled. This will restrict operand from processing any other custom resources created outside of the operatingNamespace, including the cluster resources.
  • Updates resource request config for the operator and removes resource limit config.
    • Limit config is removed based on the suggestion here
    • Request config was updated based on the observation on a cluster having just ESO as additional operator apart from the core payload using derailed/k9s.
  • Updates operator metrics server to make use of system CA certificates along with openshift service CA.
  • Updates Makefile with AI suggested changes and few improvements.
  • Updates the operand delete option in CSV, so that when user deletes the operator from console a pop-up will be used to ask whether to delete operand custom resources.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 8, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 8, 2025

@bharath-b-rh: This pull request references ESO-155 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.21.0" version, but no target version was set.

In response to this:

PR has following the changes:

  • Updates the TLS secret name used by the webhook component when the cert-manager config is enabled. This is avoid conflict with the cert-controller generated TLS secret, where earlier both Secret names were same.
  • Updates the external-secrets core controller args to disable cluster custom resources reconciliation when operatingNamespace is enabled. This will restrict operand from processing any other custom resources created outside of the operatingNamespace, including the cluster resources.
  • Updates resource request config for the operator and removes resource limit config.
  • Limit config is removed based on the suggestion here
  • Request config was updated based on the observation on a cluster having just ESO as additional operator apart from the core payload using derailed/k9s.
  • Updates operator metrics server to make use of system CA certificates along with openshift service CA.
  • Updates Makefile with AI suggested changes and few improvements.
  • Updates the operand delete option in CSV, so that when user deletes the operator from console a pop-up will be used to ask whether to delete operand custom resources.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

Walkthrough

Rename Makefile docker-* targets to image-* and switch image/bundle/catalog commands to use $(CONTAINER_TOOL); raise operator CPU/memory requests and remove limits; initialize TLS cert pool from system store with fallback; gate secret-volume diff by volume name; add webhook TLS secret handling and namespace-driven reconciler flags.

Changes

Cohort / File(s) Summary
Build & tooling
Makefile
Renamed public targets (docker-buildimage-build, docker-pushimage-push); replaced direct docker usage with $(CONTAINER_TOOL) for image, bundle and catalog build/push and buildx steps; updated help text, temp outputs path, go install messaging (-mod=vendor), and improved govulncheck messaging/filtering/guards.
Manifests / manager config
bundle/manifests/external-secrets-operator.clusterserviceversion.yaml, config/manager/manager.yaml
Removed container resource limits; increased requests from cpu:10m100m and memory:64Mi1Gi; updated CSV createdAt timestamp.
TLS cert pool initialization
cmd/external-secrets-operator/main.go
Initialize TLS metrics cert pool via x509.SystemCertPool() with logged fallback to a new empty pool; OpenShift CA handling and TLS config retained.
Deployment secret-volume diff logic
pkg/controller/common/utils.go
In deploymentSpecModified, inner secret-volume comparisons now only run when desiredVolume.Name == fetchedVolume.Name; Secret.Items and Secret.SecretName comparisons moved inside this name-match branch.
Webhook / cert-manager & controller helpers
pkg/controller/external_secrets/constants.go, pkg/controller/external_secrets/certificate.go, pkg/controller/external_secrets/deployments.go
Added unexported certmanagerTLSSecretWebhook constant; set certificate.Spec.SecretName for webhook certificate asset; added updateWebhookVolumeConfig and updateSecretVolumeConfig helpers to ensure Secret-backed certs volumes (applied to webhook and bitwarden); container args updated to toggle cluster reconciler flags based on operatingNamespace.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "ESO-182: Updates to use different TLS Secret name and disables cluster resource reconcile based on user config" is clearly related to the changeset. It highlights two significant behavioral changes that are verifiable in the raw summary: the TLS secret naming changes for the webhook component (evident in constants.go, certificate.go, and deployments.go) and the cluster resource reconciliation logic updates (evident in deployments.go). While the title does not cover all aspects of the changeset such as resource configuration, metrics server updates, and Makefile changes, the instructions acknowledge that titles don't need to cover every detail. The title is specific, clear, and accurately describes real parts of the changeset.
Description Check ✅ Passed The description is clearly related to the changeset with specific details about the changes. The raw summary confirms most of the described changes: TLS secret naming updates (certificate.go, constants.go, deployments.go), cluster resource reconciliation logic (deployments.go), resource request and limit configuration updates (manager.yaml and CSV), metrics server system CA certificate integration (main.go), and Makefile updates. The description provides sufficient context and specificity relevant to the actual changes in the pull request, and the check criteria explicitly states that level of detail is not important as long as the description relates to the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from TrilokGeer and swghosh October 8, 2025 11:21
@openshift-ci
Copy link

openshift-ci bot commented Oct 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bharath-b-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2025
@bharath-b-rh
Copy link
Contributor Author

Resource request tests

  • Use below command to check the CPU and memory consumption by the operator.
docker run --rm -it -v $KUBECONFIG:/root/.kube/config quay.io/derailed/k9s -n external-secrets-operator
  • Below snapshot shows the consumption when it's ESO operator in cluster apart from core payload. i.e there aren't many deployment, serviceaccount, service, clusterrole, clusterrolebindings existing like in a large cluster, which would cause operator resource usage spike when filtering for cache and reconcile events.
Screenshot from 2025-10-08 14-26-05

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/controller/common/utils.go (1)

213-222: Fix the volume comparison logic.

The broadened condition on Line 213 now allows detection of SecretName differences when Items is nil, which is good. However, the inner loop logic has a critical flaw: it compares a desired volume against all fetched volumes without matching by volume name first.

Current logic:

for each desiredVolume:
  if desiredVolume.Secret != nil:
    for each fetchedVolume:
      compare desiredVolume.Secret vs fetchedVolume.Secret

This incorrectly compares unrelated volumes and returns true on any mismatch, causing false positives.

Apply this diff to fix the logic by matching volumes by name before comparing their secret properties:

-	for _, desiredVolume := range desired.Spec.Template.Spec.Volumes {
-		if desiredVolume.Secret != nil {
-			for _, fetchedVolume := range fetched.Spec.Template.Spec.Volumes {
-				if !reflect.DeepEqual(desiredVolume.Secret.Items, fetchedVolume.Secret.Items) {
-					return true
-				}
-				if desiredVolume.Secret.SecretName != fetchedVolume.Secret.SecretName {
-					return true
-				}
-			}
-		}
-	}
+	for _, desiredVolume := range desired.Spec.Template.Spec.Volumes {
+		if desiredVolume.Secret != nil {
+			// Find the matching volume by name in the fetched deployment
+			found := false
+			for _, fetchedVolume := range fetched.Spec.Template.Spec.Volumes {
+				if desiredVolume.Name == fetchedVolume.Name {
+					found = true
+					if fetchedVolume.Secret == nil {
+						return true
+					}
+					if !reflect.DeepEqual(desiredVolume.Secret.Items, fetchedVolume.Secret.Items) {
+						return true
+					}
+					if desiredVolume.Secret.SecretName != fetchedVolume.Secret.SecretName {
+						return true
+					}
+					break
+				}
+			}
+			if !found {
+				return true
+			}
+		}
+	}
🧹 Nitpick comments (1)
Makefile (1)

201-204: Consider using CONTAINER_TOOL variable consistently.

The docker-buildx target still uses hard-coded docker commands instead of $(CONTAINER_TOOL). While buildx is Docker-specific, this inconsistency might confuse users.

Consider adding a comment explaining why docker is hard-coded here (buildx is Docker-specific), or conditionally skip this target when CONTAINER_TOOL != docker.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between e3fe1cf and ca5a522.

📒 Files selected for processing (9)
  • Makefile (7 hunks)
  • bundle/manifests/external-secrets-operator.clusterserviceversion.yaml (2 hunks)
  • cmd/external-secrets-operator/main.go (1 hunks)
  • config/manager/manager.yaml (1 hunks)
  • config/manifests/bases/external-secrets-operator.clusterserviceversion.yaml (1 hunks)
  • pkg/controller/common/utils.go (1 hunks)
  • pkg/controller/external_secrets/certificate.go (1 hunks)
  • pkg/controller/external_secrets/constants.go (1 hunks)
  • pkg/controller/external_secrets/deployments.go (3 hunks)
🧰 Additional context used
🪛 checkmake (0.2.2)
Makefile

[warning] 429-429: Target body for "govulnscan" exceeds allowed length of 5 (16).

(maxbodylength)

🔇 Additional comments (15)
config/manager/manager.yaml (1)

123-124: LGTM! Resource adjustments align with observations and OpenShift conventions.

The significant increase in resource requests (CPU: 10m→100m, memory: 64Mi→1Gi) and removal of resource limits follow OpenShift best practices. These values are based on actual cluster observations.

Consider monitoring resource usage in test environments to ensure these values remain appropriate under various workloads.

cmd/external-secrets-operator/main.go (1)

157-161: LGTM! System certificate pool integration is correct.

The addition of x509.SystemCertPool() properly leverages the system trust store before appending the OpenShift service CA certificate. The error handling with os.Exit(1) is appropriate for this initialization phase.

config/manifests/bases/external-secrets-operator.clusterserviceversion.yaml (1)

8-8: LGTM! Operand delete confirmation enabled.

Changing console.openshift.io/disable-operand-delete to "false" enables the OpenShift console to prompt users for confirmation when deleting operand custom resources, providing better user experience and preventing accidental deletions.

bundle/manifests/external-secrets-operator.clusterserviceversion.yaml (3)

221-221: LGTM! Consistent with base manifest.

This change aligns with the corresponding update in config/manifests/bases/external-secrets-operator.clusterserviceversion.yaml, enabling operand deletion confirmation in the OpenShift console.


223-223: Timestamp update is expected.

The createdAt timestamp update is a standard bundle regeneration artifact.


750-751: LGTM! Resource configuration consistent with manager manifest.

These resource request values match the updates in config/manager/manager.yaml and follow OpenShift conventions.

Makefile (5)

182-188: LGTM! Container-tool-agnostic target naming.

Renaming docker-buildimage-build and docker-pushimage-push makes the Makefile more portable and aligns with the CONTAINER_TOOL variable usage.


296-307: LGTM! Improved go-install-tool with vendor support.

The addition of GOFLAGS="-mod=vendor" ensures vendored dependencies are used during tool installation, and the messaging change from "Downloading" to "Installing" is more accurate.


422-445: Excellent govulnscan improvements.

The enhanced vulnerability scanning workflow with:

  • Clear progress messages
  • Robust filtering of known vulnerabilities
  • Better error detection and reporting
  • Informative success/failure output

significantly improves the developer experience.

Note: The static analysis warning about target body length (16 > 5 lines) is a pedantic style guideline. The current implementation prioritizes clarity and robustness over brevity, which is appropriate for a critical security scanning task.


355-355: LGTM! Consistent CONTAINER_TOOL usage.

Bundle build now uses $(CONTAINER_TOOL) for container-tool-agnostic operation.


395-395: LGTM! Consistent CONTAINER_TOOL usage.

Catalog build now uses --container-tool $(CONTAINER_TOOL) parameter.

pkg/controller/external_secrets/constants.go (1)

48-54: LGTM!

The new constants are well-named and appropriately scoped. The comment on certmanagerTLSSecretWebhook clearly explains why a different secret name is needed to avoid conflicts with the cert-controller-generated TLS secret.

pkg/controller/external_secrets/certificate.go (1)

85-88: LGTM!

The conditional logic correctly applies the new TLS secret name specifically for the webhook certificate asset, ensuring the webhook uses certmanagerTLSSecretWebhook instead of the default secret name that would conflict with the cert-controller.

pkg/controller/external_secrets/deployments.go (2)

126-126: LGTM!

The call to updateWebhookVolumeConfig is correctly placed after the container spec update and only applies to the webhook deployment.


305-330: LGTM!

The namespace-based cluster reconciler flag logic correctly implements the PR objective:

  • When operatingNamespace is configured, the controller is restricted to that namespace and cluster-scoped reconciliation is disabled (both flags set to false)
  • When operatingNamespace is empty, cluster-scoped reconciliation is enabled (both flags set to true)

The explanatory comment clearly documents this behavior.

Comment on lines 405 to 443
func updateWebhookVolumeConfig(deployment *appsv1.Deployment, esc *operatorv1alpha1.ExternalSecretsConfig) {
if isCertManagerConfigEnabled(esc) {
const certsVolumeName = "certs"
if deployment.Spec.Template.Spec.Volumes == nil {
deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{
Name: certsVolumeName,
})
}
for i := range deployment.Spec.Template.Spec.Volumes {
if deployment.Spec.Template.Spec.Volumes[i].Name == certsVolumeName {
if deployment.Spec.Template.Spec.Volumes[i].Secret == nil {
deployment.Spec.Template.Spec.Volumes[i].Secret = &corev1.SecretVolumeSource{}
}
deployment.Spec.Template.Spec.Volumes[i].Secret.SecretName = certmanagerTLSSecretWebhook
}
}
}
}
Copy link

@coderabbitai coderabbitai bot Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix the volume creation logic.

The function only appends a certs volume when deployment.Spec.Template.Spec.Volumes is nil (lines 408-412). If Volumes is already non-nil but doesn't contain a certs entry, the subsequent loop (lines 413-420) won't find it, and no volume will be configured.

Apply this diff to ensure the certs volume is created when it doesn't exist:

 func updateWebhookVolumeConfig(deployment *appsv1.Deployment, esc *operatorv1alpha1.ExternalSecretsConfig) {
 	if isCertManagerConfigEnabled(esc) {
 		const certsVolumeName = "certs"
-		if deployment.Spec.Template.Spec.Volumes == nil {
-			deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{
-				Name: certsVolumeName,
-			})
-		}
+		
+		// Find if certs volume already exists
+		certsVolumeExists := false
 		for i := range deployment.Spec.Template.Spec.Volumes {
 			if deployment.Spec.Template.Spec.Volumes[i].Name == certsVolumeName {
+				certsVolumeExists = true
 				if deployment.Spec.Template.Spec.Volumes[i].Secret == nil {
 					deployment.Spec.Template.Spec.Volumes[i].Secret = &corev1.SecretVolumeSource{}
 				}
 				deployment.Spec.Template.Spec.Volumes[i].Secret.SecretName = certmanagerTLSSecretWebhook
+				break
 			}
 		}
+		
+		// If certs volume doesn't exist, create it
+		if !certsVolumeExists {
+			deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{
+				Name: certsVolumeName,
+				VolumeSource: corev1.VolumeSource{
+					Secret: &corev1.SecretVolumeSource{
+						SecretName: certmanagerTLSSecretWebhook,
+					},
+				},
+			})
+		}
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func updateWebhookVolumeConfig(deployment *appsv1.Deployment, esc *operatorv1alpha1.ExternalSecretsConfig) {
if isCertManagerConfigEnabled(esc) {
const certsVolumeName = "certs"
if deployment.Spec.Template.Spec.Volumes == nil {
deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{
Name: certsVolumeName,
})
}
for i := range deployment.Spec.Template.Spec.Volumes {
if deployment.Spec.Template.Spec.Volumes[i].Name == certsVolumeName {
if deployment.Spec.Template.Spec.Volumes[i].Secret == nil {
deployment.Spec.Template.Spec.Volumes[i].Secret = &corev1.SecretVolumeSource{}
}
deployment.Spec.Template.Spec.Volumes[i].Secret.SecretName = certmanagerTLSSecretWebhook
}
}
}
}
func updateWebhookVolumeConfig(deployment *appsv1.Deployment, esc *operatorv1alpha1.ExternalSecretsConfig) {
if isCertManagerConfigEnabled(esc) {
const certsVolumeName = "certs"
// Find if certs volume already exists
certsVolumeExists := false
for i := range deployment.Spec.Template.Spec.Volumes {
if deployment.Spec.Template.Spec.Volumes[i].Name == certsVolumeName {
certsVolumeExists = true
if deployment.Spec.Template.Spec.Volumes[i].Secret == nil {
deployment.Spec.Template.Spec.Volumes[i].Secret = &corev1.SecretVolumeSource{}
}
deployment.Spec.Template.Spec.Volumes[i].Secret.SecretName = certmanagerTLSSecretWebhook
break
}
}
// If certs volume doesn't exist, create it
if !certsVolumeExists {
deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{
Name: certsVolumeName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: certmanagerTLSSecretWebhook,
},
},
})
}
}
}
🤖 Prompt for AI Agents
In pkg/controller/external_secrets/deployments.go around lines 405 to 422, the
code only appends the "certs" volume when deployment.Spec.Template.Spec.Volumes
is nil so if Volumes is non-nil but missing a "certs" entry it never adds one;
change the logic to scan existing volumes for a "certs" entry, and if not found
append a corev1.Volume{Name: certsVolumeName} to the slice before the loop, then
in the loop ensure the Secret field is initialized (if nil) and set
Secret.SecretName = certmanagerTLSSecretWebhook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required, since certs is only the volume expected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bharath-b-rh Thank you for the clarification! If the webhook deployment asset structure ensures that Volumes is either nil or already contains the certs volume, then the current implementation is indeed sufficient. I appreciate you confirming that the logic works correctly for your use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required, since certs is only the volume expected.

Then we can just access deployment.Spec.Template.Spec.Volumes[0] directly right? Why the loop and nil check? To cover for any mutating webhooks outside the operator's control?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

===> Then we can just access deployment.Spec.Template.Spec.Volumes[0] directly right? Why the loop and nil check? To cover for any mutating webhooks outside the operator's control?

@bharath-b-rh bharath-b-rh changed the title ESO-155: Updates to use different TLS Secret name and disables cluster resource reconcile based on user config ESO-182: Updates to use different TLS Secret name and disables cluster resource reconcile based on user config Oct 8, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 8, 2025

@bharath-b-rh: This pull request references ESO-182 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

PR has following the changes:

  • Updates the TLS secret name used by the webhook component when the cert-manager config is enabled. This is avoid conflict with the cert-controller generated TLS secret, where earlier both Secret names were same.
  • Updates the external-secrets core controller args to disable cluster custom resources reconciliation when operatingNamespace is enabled. This will restrict operand from processing any other custom resources created outside of the operatingNamespace, including the cluster resources.
  • Updates resource request config for the operator and removes resource limit config.
  • Limit config is removed based on the suggestion here
  • Request config was updated based on the observation on a cluster having just ESO as additional operator apart from the core payload using derailed/k9s.
  • Updates operator metrics server to make use of system CA certificates along with openshift service CA.
  • Updates Makefile with AI suggested changes and few improvements.
  • Updates the operand delete option in CSV, so that when user deletes the operator from console a pop-up will be used to ask whether to delete operand custom resources.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bharath-b-rh
Copy link
Contributor Author

/label px-approved
/label docs-approved
/cherrypick release-1.0

@openshift-cherrypick-robot

@bharath-b-rh: once the present PR merges, I will cherry-pick it on top of release-1.0 in a new PR and assign it to you.

In response to this:

/label px-approved
/label docs-approved
/cherrypick release-1.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR labels Oct 8, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ca5a522 and 5527bde.

📒 Files selected for processing (1)
  • Makefile (7 hunks)
🧰 Additional context used
🪛 checkmake (0.2.2)
Makefile

[warning] 429-429: Target body for "govulnscan" exceeds allowed length of 5 (16).

(maxbodylength)

$(CONTAINER_TOOL) buildx use external-secrets-operator-builder
- $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --tag ${IMG} -f Dockerfile.cross .
- $(CONTAINER_TOOL) buildx rm external-secrets-operator-builder
- docker buildx create --name external-secrets-operator-builder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker is used instead of $(CONTAINER_TOOL) because, by default $(CONTAINER_TOOL) is set to podman, and buildx is supported by docker. Hence just here docker is hardcoded.

@bharath-b-rh
Copy link
Contributor Author

Please refer comments [1] and [2] for test results.

Copy link
Contributor

@mytreya-rh mytreya-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude-generated] Critical Issues Identified:

1. Volume Comparison Logic Bug (pkg/controller/common/utils.go:212-221)

The loop logic in deploymentSpecModified appears incorrect. You're iterating over desired volumes and for each desiredVolume.Secret, you check ALL fetchedVolume.Secret.Items without matching volumes by name first. This means you're comparing wrong volumes and could cause false positives in detecting deployment changes.

Recommendation: Match volumes by name first, then compare Secret.Items only for matching volumes.

2. SystemCertPool Error Handling (cmd/external-secrets-operator/main.go:157-160)

If x509.SystemCertPool() fails, the application exits immediately. Consider:

  • Is SystemCertPool() failure expected on certain platforms/containers?
  • Should there be a fallback to x509.NewCertPool() if system pool is unavailable?
  • This could break existing deployments if system cert pool becomes unavailable

3. Resource Request Increases (config/manager/manager.yaml:122-123)

CPU request increased from 10m to 100m (10x) and memory from 64Mi to 1Gi (16x). This is significant and needs justification:

  • What workload patterns drove these increases?
  • Has this been tested in resource-constrained environments?
  • Will this impact the operator's ability to run on smaller clusters?

4. Resource Limits Removal (config/manager/manager.yaml)

While OpenShift conventions recommend removing limits, verify this won't cause resource contention. Consider memory usage patterns under load and whether cluster resource quotas are in place.

5. Operand Deletion Enabled (config/manifests/bases/external-secrets-operator.clusterserviceversion.yaml:8)

Setting disable-operand-delete: false allows users to delete operand CRs from console. Ensure:

  • This is intentional and documented
  • Users understand implications
  • Safeguards exist for data loss prevention
  • No orphaned resources remain

6. updateWebhookVolumeConfig Logic (pkg/controller/external_secrets/deployments.go:405-421)

The function assumes the volume exists and is initialized. Edge cases:

  • If deployment.Spec.Template.Spec.Volumes is nil, appending fails
  • If no volume with name "certs" exists, the loop completes without creating it
  • Could result in nil pointer dereference or silent failure

Recommendation: Ensure the volume exists before trying to update it.

Copy link
Contributor

@mytreya-rh mytreya-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline code review - critical issues

return true
}
if desiredVolume.Secret.SecretName != fetchedVolume.Secret.SecretName {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude-generated] Critical: This loop logic is incorrect. You're comparing all desired volumes against all fetched volumes without matching by name first. This will cause false positives.

Fix:

for _, desiredVolume := range desired.Spec.Template.Spec.Volumes {
    if desiredVolume.Secret != nil {
        for _, fetchedVolume := range fetched.Spec.Template.Spec.Volumes {
            if desiredVolume.Name == fetchedVolume.Name {  // Match by name first!
                if !reflect.DeepEqual(desiredVolume.Secret.Items, fetchedVolume.Secret.Items) {
                    return true
                }
                if !reflect.DeepEqual(desiredVolume.Secret.SecretName, fetchedVolume.Secret.SecretName) {
                    return true
                }
            }
        }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here, we are relying on the fact that in any of the deployments there will only be one volume that will be mounted from secrets.
i feel this being common code, its quite a bit about how the deployment assets are structured.
We can add in the Name comparison.
Also i have 2 more questions:

  1. Why don't we just use controllerutil.CreateOrUpdate and avoid our custom matching?
  2. We are starting of with specific CreateOrApplyDeploymentFromAsset to a common facility HasObjectChanged, but there again we have a big switch case for each type. Why not just call deploymentSpecModified from CreateOrApplyDeploymentFromAsset? If we call into a common function that accepts a higher abstraction (i.e. Object vs Deployment), i think common function is expected to work at that higher level of abstraction instead of inspecting the types again unless its some kind of a framework (OpenClosedPrinciple).
    Similar comment i had earlier on restructuring deployment struct in createOrApplyDeployment but i couldn't locate it now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are having our own helper functions because, we are using the static manifests to derive the desired objects, which will not have the default values populated in each of these objects. And the direct comparison at the spec will always fail and cause update to happen. Instead we are specifically looking for the desired fields.

if err != nil {
setupLog.Error(err, "failed to load system certificate pool")
os.Exit(1)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude-generated] Critical: Hard exit on SystemCertPool() failure could break deployments. Consider adding a fallback:

certPool, err := x509.SystemCertPool()
if err != nil {
    setupLog.Info("system cert pool unavailable, using empty pool", "error", err)
    certPool = x509.NewCertPool()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like claude mentioned above, is it really a fatal error if SystemCertPool() cannot be fetched that we need to do a hard exit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel yes, atleast in RHEL which is our base image, the system CA certificates will always be present, and if at all failure occurs it would be a genuine failure and we don't want proceed I think.

cpu: 10m
memory: 64Mi
cpu: 100m
memory: 1Gi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude-generated] Critical: CPU request increased from 10m to 100m (10x) and memory from 64Mi to 1Gi (16x). Please provide:

  • Performance test results showing why 1Gi memory is needed
  • CPU utilization metrics from production/staging
  • Impact analysis on resource-constrained clusters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline with above claude comment, curious to know 1Gi memory need? Most of it is cache or is there any other significant use of memory in this operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was decided based on the observation added in this comment. With just one additional day-2 operator, the memory reached 512Mib when the operator started. And I think it's safe to keep 1Gi considering the large clusters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have not seen the memory go beyond 62MiB for the operator POD.. i started a k top pod in while loop and then installed the operator and created the ExternalSecretsConfig CR on my GCP cluster.

while true; do kubectl -n external-secrets-operator top pod; done
No resources found in external-secrets-operator namespace.
No resources found in external-secrets-operator namespace.
// Installed operator from console
error: metrics not available yet
error: metrics not available yet
// Operator came up
NAME                                                            CPU(cores)   MEMORY(bytes)
external-secrets-operator-controller-manager-5685dc9cf7-kghkn   34m          62Mi
NAME                                                            CPU(cores)   MEMORY(bytes)
external-secrets-operator-controller-manager-5685dc9cf7-kghkn   34m          62Mi
...
...
external-secrets-operator-controller-manager-5685dc9cf7-kghkn   1m           57Mi
NAME                                                            CPU(cores)   MEMORY(bytes)
external-secrets-operator-controller-manager-5685dc9cf7-kghkn   1m           57Mi
...
...
// Created ExternalSecretsConfig
external-secrets-operator-controller-manager-5685dc9cf7-kghkn   9m           58Mi
NAME                                                            CPU(cores)   MEMORY(bytes)
external-secrets-operator-controller-manager-5685dc9cf7-kghkn   9m           58Mi

Also, we already had limit as 128Mi which went through QA testing.
In addition, i tested with 50Mi limit which as expected crashed with OOM.

k -n external-secrets-operator get po
NAME                                                            READY   STATUS             RESTARTS      AGE
external-secrets-operator-controller-manager-85b4888895-4pqjc   0/1     CrashLoopBackOff   2 (20s ago)   77s
k -n external-secrets-operator get po -o yaml | grep -A2 -B2 OOM
          exitCode: 137
          finishedAt: "2025-10-10T11:41:08Z"
          reason: OOMKilled
          startedAt: "2025-10-10T11:40:50Z"
      name: manager

But even with 64Mi limit there are no OOMKilled.
Thus i feel 128Mi could be a safe resource request allowing for easier POD scheduling than to have it as 1Gi, unless i am missing something here.

capabilities: Basic Install
categories: Security
console.openshift.io/disable-operand-delete: "true"
console.openshift.io/disable-operand-delete: "false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude-generated] Security: Enabling operand deletion (disable-operand-delete: "false") is a significant change. Ensure:

  1. This is documented
  2. Users understand implications
  3. Safeguards exist for data loss prevention
  4. Highlighted in release notes

@@ -389,3 +401,22 @@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude-generated] Good: The logic for disabling cluster-scoped reconciliation when operatingNamespace is set looks correct. Consider:

  1. Add e2e tests to verify cluster resources are truly ignored when operatingNamespace is set
  2. Update API docs to explain this behavior
  3. What happens if cluster resources exist and then operatingNamespace is enabled? Need migration path.

Copy link
Contributor Author

@bharath-b-rh bharath-b-rh Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add e2e tests to verify cluster resources are truly ignored when operatingNamespace is set

Please refer comment for the tests. Could we add e2e test case in our suite in a follow up PR?

What happens if cluster resources exist and then operatingNamespace is enabled? Need migration path.

The resources outside of the operatingNamespace will not be reconciled after namespace is configured. And the same applies for other way around too, which is checked in verification

@mytreya-rh
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2025
@bharath-b-rh
Copy link
Contributor Author

/cherrypick release-1.0

@openshift-cherrypick-robot

@bharath-b-rh: once the present PR merges, I will cherry-pick it on top of release-1.0 in a new PR and assign it to you.

In response to this:

/cherrypick release-1.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2025
@mytreya-rh
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 16, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ca2c73f and d82cfc0.

📒 Files selected for processing (4)
  • bundle/manifests/external-secrets-operator.clusterserviceversion.yaml (2 hunks)
  • cmd/external-secrets-operator/main.go (1 hunks)
  • pkg/controller/common/utils.go (1 hunks)
  • pkg/controller/external_secrets/constants.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/controller/common/utils.go
  • bundle/manifests/external-secrets-operator.clusterserviceversion.yaml
🔇 Additional comments (1)
cmd/external-secrets-operator/main.go (1)

159-163: LGTM! Fallback implementation addresses previous review concern.

The code now implements the suggested fallback approach: attempting to load the system certificate pool, logging a non-fatal info message on failure, and creating an empty pool to continue. Since the OpenShift CA certificate is appended afterward (Line 170), the metrics server will still function correctly even if the system pool is unavailable.

Comment on lines +52 to +53
// certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different
// name is used to avoiding clash with the secret created by the inbuilt cert-controller component.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix grammar in comment.

The comment contains a grammatical error.

Apply this diff:

-	// certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different
-	// name is used to avoiding clash with the secret created by the inbuilt cert-controller component.
+	// certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different
+	// name is used to avoid clash with the secret created by the inbuilt cert-controller component.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different
// name is used to avoiding clash with the secret created by the inbuilt cert-controller component.
// certmanagerTLSSecretWebhook is the TLS secret created by cert-manager for the webhook component. A different
// name is used to avoid clash with the secret created by the inbuilt cert-controller component.
🤖 Prompt for AI Agents
In pkg/controller/external_secrets/constants.go around lines 52 to 53, the
comment has a grammatical error; update the sentence "A different name is used
to avoiding clash with the secret created by the inbuilt cert-controller
component." to a correct form such as "A different name is used to avoid a clash
with the secret created by the inbuilt cert-controller component." (or "to avoid
clashing with the secret...") so the comment reads clearly and grammatically
correct.

@openshift-ci
Copy link

openshift-ci bot commented Oct 18, 2025

@bharath-b-rh: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@emmajiafan
Copy link

emmajiafan commented Oct 20, 2025

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 20, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 20, 2025

@bharath-b-rh: This pull request references ESO-182 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

PR has following the changes:

  • Updates the TLS secret name used by the webhook component when the cert-manager config is enabled. This is avoid conflict with the cert-controller generated TLS secret, where earlier both Secret names were same.
  • Updates the external-secrets core controller args to disable cluster custom resources reconciliation when operatingNamespace is enabled. This will restrict operand from processing any other custom resources created outside of the operatingNamespace, including the cluster resources.
  • Updates resource request config for the operator and removes resource limit config.
  • Limit config is removed based on the suggestion here
  • Request config was updated based on the observation on a cluster having just ESO as additional operator apart from the core payload using derailed/k9s.
  • Updates operator metrics server to make use of system CA certificates along with openshift service CA.
  • Updates Makefile with AI suggested changes and few improvements.
  • Updates the operand delete option in CSV, so that when user deletes the operator from console a pop-up will be used to ask whether to delete operand custom resources.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mytreya-rh
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 20, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 05a312b into openshift:main Oct 20, 2025
9 checks passed
@openshift-cherrypick-robot

@bharath-b-rh: #75 failed to apply on top of branch "release-1.0":

Applying: ESO-155: Sets cluster resource reconcile arg to false when operatingNamespace is configured
Applying: ESO-155: Updates to use different TLS Secret name when cert-manager enabled
Applying: ESO-155: Updates resources config for operator pod
Using index info to reconstruct a base tree...
M	bundle/manifests/external-secrets-operator.clusterserviceversion.yaml
Falling back to patching base and 3-way merge...
Auto-merging bundle/manifests/external-secrets-operator.clusterserviceversion.yaml
CONFLICT (content): Merge conflict in bundle/manifests/external-secrets-operator.clusterserviceversion.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0003 ESO-155: Updates resources config for operator pod

In response to this:

/cherrypick release-1.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bharath-b-rh bharath-b-rh deleted the eso-155-3 branch October 20, 2025 07:57
@bharath-b-rh
Copy link
Contributor Author

/cherrypick release-1.0

@openshift-cherrypick-robot

@bharath-b-rh: #75 failed to apply on top of branch "release-1.0":

Applying: ESO-155: Sets cluster resource reconcile arg to false when operatingNamespace is configured
Applying: ESO-155: Updates to use different TLS Secret name when cert-manager enabled
Applying: ESO-155: Updates resources config for operator pod
Using index info to reconstruct a base tree...
M	bundle/manifests/external-secrets-operator.clusterserviceversion.yaml
Falling back to patching base and 3-way merge...
Auto-merging bundle/manifests/external-secrets-operator.clusterserviceversion.yaml
CONFLICT (content): Merge conflict in bundle/manifests/external-secrets-operator.clusterserviceversion.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0003 ESO-155: Updates resources config for operator pod

In response to this:

/cherrypick release-1.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants