Skip to content

Conversation

anik120
Copy link
Contributor

@anik120 anik120 commented Sep 23, 2025

Description of the change:

Implements native metrics authentication and authorization for OLM and catalog operators using controller-runtime
filters. Adds TLS support with automatic certificate management via cert-manager, replacing unprotected HTTP metrics
endpoints with authenticated HTTPS endpoints on port 8443.

Motivation for the change:

Current metrics endpoints are unprotected and accessible to anyone with cluster access, creating potential security
risks. This change secures metrics access by requiring proper Kubernetes RBAC authentication and authorization,
following the same pattern used by operator-controller for production deployments.

Architectural changes:

  • Integrates controller-runtime's WithAuthenticationAndAuthorization filter for metrics endpoints
  • Adds cert-manager integration for automatic TLS certificate lifecycle management
  • Implements dynamic certificate watching and reloading using existing filemonitor package
  • Disables HTTP/2 to mitigate known CVEs, enforcing HTTP/1.1 only
  • Updates both operators to use HTTPS (port 8443) with client certificate authentication
  • Maintains fallback to unprotected metrics when TLS is disabled for development scenarios

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@anik120 anik120 requested a review from joelanford September 23, 2025 19:15
@anik120 anik120 requested review from tmshort and removed request for perdasilva and ankitathomas September 23, 2025 19:15
@anik120 anik120 force-pushed the native-metrics-authnz branch 3 times, most recently from a62b02f to f63b885 Compare September 23, 2025 20:28
@tmshort
Copy link
Contributor

tmshort commented Sep 23, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2025
@anik120
Copy link
Contributor Author

anik120 commented Sep 23, 2025

Looks like I have to make change to the metrics e2e tests coz the current ones are not authenticating themselves which is why they're failing. Great sign that the changes are working, working on the modifications to the e2e tests.....

@anik120 anik120 force-pushed the native-metrics-authnz branch from f63b885 to 7704b39 Compare September 24, 2025 13:44
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 24, 2025
Copy link

openshift-ci bot commented Sep 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from tmshort. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anik120 anik120 force-pushed the native-metrics-authnz branch from 7704b39 to 9bac784 Compare September 24, 2025 13:51
@anik120 anik120 force-pushed the native-metrics-authnz branch from 9bac784 to 63ab287 Compare September 24, 2025 14:32
KIND_CLUSTER_NAME="kind-olmv0-${i}" \
KIND_CREATE_OPTS="--kubeconfig=${E2E_KUBECONFIG_ROOT}/kubeconfig-${i}" \
HELM_INSTALL_OPTS="--kubeconfig ${E2E_KUBECONFIG_ROOT}/kubeconfig-${i}" \
HELM_INSTALL_OPTS="--kubeconfig ${E2E_KUBECONFIG_ROOT}/kubeconfig-${i} --set certManager.enabled=false" \
Copy link
Contributor Author

@anik120 anik120 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This presented itself as the easiest way to do all the tests we have for metrics (since these tests are about testing the metrics omitted, eg "creating a subscription emits these metrics", and not the security aspect of the endpoints).

path: /healthz
port: {{ .Values.olm.service.internalPort }}
scheme: {{ if .Values.olm.tlsSecret }}HTTPS{{ else }}HTTP{{end}}
port: {{ if .Values.certManager.enabled }}{{ .Values.olm.service.internalPortHttps }}{{ else }}{{ .Values.olm.service.internalPort }}{{ end }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means the templates had to be updated to configure different endpoints based on the presence of the cert-manager

e2e-local: e2e-build kind-create e2e-local-deploy e2e

.PHONY: e2e-local-deploy
e2e-local-deploy: $(KIND) $(HELM) #HELP Deploy OLM for e2e testing (without cert-manager)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also had to make a new deploy target to deploy olm without cert-manager for e2e testing

@tmshort
Copy link
Contributor

tmshort commented Sep 24, 2025

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants