MON-4479: use TLS profile to configure CMO server by simonpasquier · Pull Request #2841 · openshift/cluster-monitoring-operator

simonpasquier · 2026-03-06T14:38:43Z

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

Refactor
- Startup flow separated into prepare vs run; TLS settings are injected before serving and configuration errors surface earlier.
- Operator now watches API server and related config resources and will stop/restart when TLS or config changes are detected.
Chores
- API server TLS details are logged and applied to server TLS configuration.
- Improved lifecycle and graceful shutdown handling.

openshift-ci-robot · 2026-03-09T09:04:30Z

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-03-09T15:09:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Main builds a wired operator client, reads the APIServer TLS config, passes TLS settings into server.NewServer, calls Server.Prepare(ctx) before Run, and injects the APIServer config plus a cancel callback into the operator which watches APIServer/config and related resources to trigger graceful restart on changes.

Changes

Cohort / File(s)	Summary
Main / client bootstrap `cmd/operator/main.go`	Adds unexported `newClient(...)` to construct a wired `*client.Client`; calls `GetAPIServerConfig(ctx)`, wraps it via `manifests.NewAPIServerConfig`, logs TLS settings, passes API server config and a `cancel` callback into `cmo.New`, and supplies TLS params into `server.NewServer`; invokes `srv.Prepare(ctx)` before `Run`.
Client API `pkg/client/client.go`	Simplifies `GetAPIServerConfig(ctx)` signature (removes name param) and always fetches APIServer named `"cluster"`. Adds `OpenShiftConfigClientset()` accessor.
Operator constructor & watchers `pkg/operator/operator.go`	`New` now accepts `client.Client`, `manifests.APIServerConfig`, and a `cancel func()`; operator stores `apiServerConfig` and `cancel`, removes `lastKnownApiServerConfig`, switches to OpenShift config informers, adds watchers for APIServer, ConfigMaps, Secrets, Infrastructure, Console, etc., and triggers `cancel()` when APIServer TLS profile changes.
APIServer config utilities `pkg/manifests/apiserver_config.go`	Adds exported `Equal(other *APIServerConfig) bool` that compares `MinTLSVersion()` and `TLSCiphers()` using `slices.Equal`.
Server lifecycle & TLS wiring `pkg/server/server.go`	`Server` gains `minTLSVersion`, `cipherSuites`, and `srv *genericapiserver.GenericAPIServer` fields; `NewServer` signature extended to accept TLS params; introduces `Prepare(ctx) error` to configure ServingInfo (certs, client CA, MinTLSVersion, CipherSuites) and register webhook handlers; `Run` now uses the prepared `s.srv`.

Sequence Diagram(s)

sequenceDiagram
    actor Main as Main
    participant Client as OperatorClient
    participant APIServer as OpenShiftAPIServer
    participant Server as OperatorServer
    participant Operator as CMOOperator
    participant Watcher as Informers/Watchers

    Main->>Client: newClient(ctx, restConfig, ...)
    Client->>APIServer: create clients & recorder
    Main->>Client: GetAPIServerConfig(ctx)
    APIServer-->>Client: APIServer object
    Client-->>Main: *client.Client / APIServerConfig

    Main->>Server: NewServer(..., minTLSVersion, tlsCiphers)
    Main->>Server: Prepare(ctx)
    Server-->>Main: prepared srv stored

    Main->>Operator: cmo.New(client, apiServerConfig, cancel)
    Operator->>Watcher: start informers/watch(APIServer, ConfigMaps, Secrets, ...)
    Main->>Operator: Run(ctx)

    Watcher->>Operator: APIServer change event
    Operator->>Operator: compare via APIServerConfig.Equal()
    alt config changed
        Operator->>Main: call cancel()
        Main->>Operator: graceful restart/shutdown
    end

    Main->>Server: Run(ctx)
    Server->>APIServer: serve with configured TLS settings

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main objective: configuring the CMO server to use TLS profile settings from the API server.
Stable And Deterministic Test Names	✅ Passed	PR modifies only implementation files, not Ginkgo test files, so test naming requirements are not applicable.
Test Structure And Quality	✅ Passed	No Ginkgo test files were modified or added in this PR; only non-test source files were changed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Jeffail/gabs/v2@v2.6.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/alecthomas/units@v0.0.0-20240927000941-0f3dac36c52b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/blang/semver/v4@v4.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-openapi/strfmt@v0.24.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/uuid@v1.6.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/imdario/mergo@v0.3.16: is explicitly

... [truncated 21195 characters] ...

les.txt\n\tsigs.k8s.io/apiserver-network-proxy/konnectivity-client@v0.31.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/kube-storage-version-migrator@v0.0.6-0.20230721195810-5c8923c5ff96: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/ginkgo/v2: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2026-03-09T15:11:06Z

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-03-09T15:11:50Z

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

Refactor

Improved operator initialization with better dependency management for client and configuration handling.

Enhanced API server TLS configuration detection and application with automatic reactivity to configuration changes.

Optimized server startup flow with explicit preparation step before running.

Chores

Implemented graceful shutdown signaling for improved operator lifecycle management.

Expanded configuration monitoring for better consistency with API server settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/server/server.go (1)

52-65: ⚠️ Potential issue | 🔴 Critical

Critical: TLS configuration parameters are not assigned to the struct.

The minTLSVersion and cipherSuites parameters are accepted by NewServer but never stored in the returned Server struct. This means the TLS configuration will be empty when Prepare() uses s.minTLSVersion and s.cipherSuites.

🐛 Proposed fix

 func NewServer(name string, config *rest.Config, kubeConfig, certFile, keyFile string, minTLSVersion string, cipherSuites []string) (*Server, error) {
 	kubeClient, err := kubernetes.NewForConfig(config)
 	if err != nil {
 		return nil, err
 	}

 	return &Server{
-		name:       name,
-		kubeClient: kubeClient,
-		kubeConfig: kubeConfig,
-		certFile:   certFile,
-		keyFile:    keyFile,
+		name:          name,
+		kubeClient:    kubeClient,
+		kubeConfig:    kubeConfig,
+		certFile:      certFile,
+		keyFile:       keyFile,
+		minTLSVersion: minTLSVersion,
+		cipherSuites:  cipherSuites,
 	}, nil
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/server/server.go` around lines 52 - 65, NewServer currently accepts
minTLSVersion and cipherSuites but does not assign them to the Server it
returns, so Prepare() will see empty s.minTLSVersion and s.cipherSuites; update
NewServer to set the Server fields (e.g., minTLSVersion: minTLSVersion,
cipherSuites: cipherSuites) when constructing the &Server{}, and if those fields
do not exist on the Server struct add them (matching types used by Prepare()),
ensuring Prepare() uses the populated s.minTLSVersion and s.cipherSuites.

🧹 Nitpick comments (2)

pkg/client/client.go (1)
1967-1973: Consider consolidating duplicate accessor methods.

There are now two methods returning the same field:

EventRecorder() (Line 279)

EventsRecorder() (Line 1971)

Both return c.eventRecorder. This creates potential confusion for maintainers. Consider using a single consistent name or documenting why both are needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/client/client.go` around lines 1967 - 1973, There are duplicate accessors
returning the same field c.eventRecorder (EventRecorder and EventsRecorder);
consolidate to a single canonical method name (pick one, e.g., EventRecorder) by
removing the redundant method (or make it a thin deprecated wrapper that calls
the canonical method), update all call sites to use the canonical name, and add
a small comment on the remaining method explaining its purpose; ensure the
unique symbols EventRecorder, EventsRecorder, and c.eventRecorder are used to
locate and update the code and callers.
pkg/server/server.go (1)
135-142: Verify Prepare() is called before Run().

Run() now assumes s.srv is initialized by a prior call to Prepare(). If Run() is called without Prepare(), this will panic with a nil pointer dereference on s.srv.PrepareRun().

Consider adding a guard or documenting this requirement clearly.
🛡️ Proposed defensive check
 func (s *Server) Run(ctx context.Context) error {
+	if s.srv == nil {
+		return fmt.Errorf("server not prepared: call Prepare() before Run()")
+	}
 	if err := s.srv.PrepareRun().RunWithContext(ctx); ctx.Err() == nil {
 		return err
 	}

 	klog.Info("server exited")
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/server/server.go` around lines 135 - 142, The Run method assumes s.srv is
initialized and will panic if Prepare() was not called; add a defensive guard at
the start of Server.Run that verifies s.srv is non-nil (or call s.Prepare() if
that is the intended behavior) before invoking s.srv.PrepareRun(), and return a
clear error if initialization is missing; update Server.Run to check s.srv
(and/or s.srv.PrepareRun() availability) and return a descriptive error instead
of dereferencing a nil pointer.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/operator/main.go`:
- Around line 270-286: The Server returned by NewServer is missing assignment of
the TLS fields so Prepare() sees empty values; update the struct literal
returned by NewServer (the &Server{...} in server.NewServer) to include
minTLSVersion: minTLSVersion and cipherSuites: cipherSuites (matching the
function parameters) so the Server struct's fields are populated before
Prepare() is called.

In `@pkg/manifests/apiserver_config.go`:
- Around line 42-47: APIServerConfig.Equal currently calls methods on the
receiver and on the other pointer without nil checks and can panic; update
APIServerConfig.Equal to handle nil safely by returning true if both receiver
and other are nil, false if only one is nil, and otherwise comparing
c.MinTLSVersion() and slices.Equal(c.TLSCiphers(), other.TLSCiphers()) as
before; reference the Equal method on APIServerConfig and the helper methods
MinTLSVersion and TLSCiphers when locating the change.

In `@pkg/operator/operator.go`:
- Around line 632-640: The handler closes o.done directly which can panic if
called twice; add a sync.Once field (e.g., doneOnce) to the Operator struct and
replace close(o.done) with doneOnce.Do(func() { close(o.done) }) inside the
event handling path (the switch handling *configv1.APIServer in the method that
closes o.done) so the channel is closed exactly once.

---

Outside diff comments:
In `@pkg/server/server.go`:
- Around line 52-65: NewServer currently accepts minTLSVersion and cipherSuites
but does not assign them to the Server it returns, so Prepare() will see empty
s.minTLSVersion and s.cipherSuites; update NewServer to set the Server fields
(e.g., minTLSVersion: minTLSVersion, cipherSuites: cipherSuites) when
constructing the &Server{}, and if those fields do not exist on the Server
struct add them (matching types used by Prepare()), ensuring Prepare() uses the
populated s.minTLSVersion and s.cipherSuites.

---

Nitpick comments:
In `@pkg/client/client.go`:
- Around line 1967-1973: There are duplicate accessors returning the same field
c.eventRecorder (EventRecorder and EventsRecorder); consolidate to a single
canonical method name (pick one, e.g., EventRecorder) by removing the redundant
method (or make it a thin deprecated wrapper that calls the canonical method),
update all call sites to use the canonical name, and add a small comment on the
remaining method explaining its purpose; ensure the unique symbols
EventRecorder, EventsRecorder, and c.eventRecorder are used to locate and update
the code and callers.

In `@pkg/server/server.go`:
- Around line 135-142: The Run method assumes s.srv is initialized and will
panic if Prepare() was not called; add a defensive guard at the start of
Server.Run that verifies s.srv is non-nil (or call s.Prepare() if that is the
intended behavior) before invoking s.srv.PrepareRun(), and return a clear error
if initialization is missing; update Server.Run to check s.srv (and/or
s.srv.PrepareRun() availability) and return a descriptive error instead of
dereferencing a nil pointer.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4184a2ff-26a7-4dbc-bdcf-0a7acbd40b4a

📥 Commits

Reviewing files that changed from the base of the PR and between 61407b0 and 76df984.

📒 Files selected for processing (5)

cmd/operator/main.go
pkg/client/client.go
pkg/manifests/apiserver_config.go
pkg/operator/operator.go
pkg/server/server.go

cmd/operator/main.go

pkg/manifests/apiserver_config.go

pkg/operator/operator.go

openshift-ci-robot · 2026-03-09T16:40:59Z

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

Refactor

Reworked operator and server startup to inject API server TLS settings and prepare the server before running; operator initialization now accepts a preconfigured client and API server config.

Expanded watch/sync surface so the operator reacts to API server and related config changes.

Chores

Added graceful shutdown signaling and improved lifecycle handling.

Exposed API server TLS details for logging and applied them to server TLS configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/operator/operator.go (1)
351-355: ⚠️ Potential issue | 🟠 Major

Handle the initial APIServer object too.

CMO snapshots the TLS profile in cmd/operator/main.go before this informer starts, but this handler only reacts to later updates. If the profile changes in that startup window, the process can keep the stale TLS settings until another APIServer event arrives.
🔧 Suggested fix
 	_, err = informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
+		AddFunc: o.handleEvent,
 		UpdateFunc: func(_, newObj interface{}) {
 			o.handleEvent(newObj)
 		},
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/operator.go` around lines 351 - 355, The handler currently only
registers an UpdateFunc calling o.handleEvent, so the initial APIServer snapshot
can be missed; ensure the initial APIServer is processed at startup by either
adding an AddFunc to the cache.ResourceEventHandlerFuncs that calls
o.handleEvent for created objects or, after the informer is synced, retrieve the
current APIServer from the informer's store/lister and call o.handleEvent on
that object (use the same informer variable and o.handleEvent to locate where to
invoke this). Ensure this runs once after informer sync so startup TLS profile
changes are handled.

🧹 Nitpick comments (1)

pkg/client/client.go (1)
1971-1973: Avoid a second accessor for the same recorder.

EventRecorder() already exposes c.eventRecorder, so EventsRecorder() just adds another public spelling for the same dependency. Reusing the existing method keeps the client surface smaller and avoids singular/plural drift.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/client/client.go` around lines 1971 - 1973, Remove the duplicate accessor
EventsRecorder and consolidate on the existing EventRecorder method: delete the
EventsRecorder function from type Client, then update any callers that reference
EventsRecorder to call EventRecorder instead (preserve the same return type
events.Recorder). Ensure imports/tests/usage compile after replacing
EventsRecorder with EventRecorder and run tests to verify.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/operator/operator.go`:
- Around line 351-355: The handler currently only registers an UpdateFunc
calling o.handleEvent, so the initial APIServer snapshot can be missed; ensure
the initial APIServer is processed at startup by either adding an AddFunc to the
cache.ResourceEventHandlerFuncs that calls o.handleEvent for created objects or,
after the informer is synced, retrieve the current APIServer from the informer's
store/lister and call o.handleEvent on that object (use the same informer
variable and o.handleEvent to locate where to invoke this). Ensure this runs
once after informer sync so startup TLS profile changes are handled.

---

Nitpick comments:
In `@pkg/client/client.go`:
- Around line 1971-1973: Remove the duplicate accessor EventsRecorder and
consolidate on the existing EventRecorder method: delete the EventsRecorder
function from type Client, then update any callers that reference EventsRecorder
to call EventRecorder instead (preserve the same return type events.Recorder).
Ensure imports/tests/usage compile after replacing EventsRecorder with
EventRecorder and run tests to verify.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff0f74b4-31f1-4bab-b122-c24040f7042e

📥 Commits

Reviewing files that changed from the base of the PR and between 76df984 and d482ba8.

📒 Files selected for processing (5)

cmd/operator/main.go
pkg/client/client.go
pkg/manifests/apiserver_config.go
pkg/operator/operator.go
pkg/server/server.go

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

openshift-ci-robot · 2026-03-10T14:04:59Z

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

Refactor

Operator and server startup reworked to accept a preconfigured client and API server config, injecting TLS settings and preparing the server before running.

Operator watch/sync surface expanded so it reacts to API server and related config changes and can stop/restart on TLS/config updates.

Chores

Improved lifecycle and graceful shutdown handling.

API server TLS details are now logged and applied to server TLS configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/operator/operator.go (1)
345-355: ⚠️ Potential issue | 🟠 Major

Handle the APIServer informer’s initial add event too.

cmd/operator/main.go reads APIServer/cluster before informers start, so a TLS profile change in that window only shows up during this informer's initial LIST. With only UpdateFunc registered here, CMO misses that drift and can keep serving with stale TLS settings until the next edit.
🛠️ Suggested fix
 	_, err = informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
+		AddFunc: func(obj interface{}) {
+			o.handleEvent(obj)
+		},
 		UpdateFunc: func(_, newObj interface{}) {
 			o.handleEvent(newObj)
 		},
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/operator.go` around lines 345 - 355, The APIServer informer only
registers UpdateFunc, so the initial LIST (add) events are ignored and
o.handleEvent never runs for the existing APIServer object; update the informer
registration (the cache.NewSharedIndexInformer block where
informer.AddEventHandler is called) to also provide an AddFunc in the
cache.ResourceEventHandlerFuncs that calls o.handleEvent with the newObj (same
as UpdateFunc) so the initial APIServer/cluster state is processed immediately
when the informer starts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/operator/operator.go`:
- Around line 345-355: The APIServer informer only registers UpdateFunc, so the
initial LIST (add) events are ignored and o.handleEvent never runs for the
existing APIServer object; update the informer registration (the
cache.NewSharedIndexInformer block where informer.AddEventHandler is called) to
also provide an AddFunc in the cache.ResourceEventHandlerFuncs that calls
o.handleEvent with the newObj (same as UpdateFunc) so the initial
APIServer/cluster state is processed immediately when the informer starts.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de251b38-0a3e-4b4c-9800-a81e4dd3330c

📥 Commits

Reviewing files that changed from the base of the PR and between d482ba8 and abb8516.

📒 Files selected for processing (5)

cmd/operator/main.go
pkg/client/client.go
pkg/manifests/apiserver_config.go
pkg/operator/operator.go
pkg/server/server.go

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

openshift-ci-robot · 2026-03-10T14:22:36Z

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

Refactor

Startup flow reworked to accept a preconfigured client and API server config; server preparation is separated from run and TLS settings are injected before serving.

Operator now watches additional API server and related config resources and can stop/restart when TLS or config changes occur.

Chores

Improved lifecycle and graceful shutdown handling.

API server TLS details are logged and applied to server TLS configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

simonpasquier · 2026-03-11T09:06:08Z

/retest-required

simonpasquier · 2026-03-11T12:30:09Z

/test e2e-hypershift-conformance

simonpasquier · 2026-03-11T13:26:50Z

/test e2e-hypershift-conformance

juzhao · 2026-03-12T02:24:23Z

/test e2e-hypershift-conformance

simonpasquier · 2026-03-12T09:07:08Z

/test e2e-hypershift-conformance/test e2e-hypershift-conformance

simonpasquier · 2026-03-12T09:07:10Z

/test e2e-hypershift-conformance

machine424

thanks for this!
lgtm, I have some questions/suggestions.

we had an e2e test that we had to turn into unit because MCO started to restart nodes to propagate TLS changes (the change becomes too disruptive). https://issues.redhat.com/browse/MON-3960

It'd be great to have a test for this though to make sure CMO behaves well after a change/restart, maybe that will come for free as part of openshift/origin#30801? (not a blocker of this PR)

machine424 · 2026-03-12T07:58:48Z

cmd/operator/main.go

 		return 1
 	}
+
+	if err := srv.Prepare(ctx); err != nil {


a comment on whether this as separated would be helpful.
Also, maybe Prepare could just return the server?

machine424 · 2026-03-12T08:04:27Z

pkg/operator/operator.go

 	)

 	_, err = informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
+		AddFunc: func(obj interface{}) {


we should probably do the same for the other configmaps/secrets in case they get deleted by mistake, recreated and never updated. (and the informers not restarted in the meanwhile, not sure if they do a relist tough...)

Correct. In practice I suppose that secrets/configmaps will have revision changes after recreation so it might be more of a theoretical issue.

machine424 · 2026-03-12T08:08:24Z

cmd/operator/main.go

+	}
+	apiServerConfigAdapter := manifests.NewAPIServerConfig(apiServerConfig)
+	klog.Infof(
+		"TLS configuration: minimum version=%q, ciphers=[%s]",


maybe

Suggested change

"TLS configuration: minimum version=%q, ciphers=[%s]",

"TLS settings from API server configuration: minimum version=%q, ciphers=[%s]",

to be more explicit?

machine424 · 2026-03-12T08:41:50Z

pkg/operator/operator.go

+			// Trigger a restart of the process to read the new TLS
+			// configuration.
+			klog.Info("Detected changes to the TLS profile configuration, stopping the process")
+			o.stop()


IIUC this will cancel the root ctx

cluster-monitoring-operator/cmd/operator/main.go

Line 170 in 1f0930d

ctx, cancel := context.WithCancel(context.Background())

so it'd not only stop the server (the PR title misled me).

I assume it's simpler to just restart the whole container in case of just the server?

Correct it will make the process exit. This is what is currently recommended as it's simple and updating the TLS profile shouldn't be a frequent operation.

simonpasquier · 2026-03-12T10:02:32Z

It'd be great to have a test for this though to make sure CMO behaves well after a change/restart, maybe that will come for free as part of openshift/origin#30801? (not a blocker of this PR)

yeah, I'd defer that to a generic origin test rather than something particular to CMO.

machine424 · 2026-03-12T10:13:10Z

yeah, I'd defer that to a generic origin test rather than something particular to CMO.

sure, I assume you gave this a try locally :)

/lgtm

openshift-ci-robot · 2026-03-12T10:13:59Z

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

Refactor

Startup flow separated into prepare vs run; TLS settings are injected before serving and configuration errors surface earlier.

Operator now watches API server and related config resources and will stop/restart when TLS or config changes are detected.

Chores

API server TLS details are logged and applied to server TLS configuration.

Improved lifecycle and graceful shutdown handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-03-12T10:14:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: machine424, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [machine424,simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

cmd/operator/main.go (1)
92-126: LGTM - Clean helper function for client initialization.

The newClient helper encapsulates client creation with proper error handling and event recorder setup. The fallback behavior when controller reference is unavailable is appropriate.

One minor inconsistency: the kubernetes client error (line 100) is wrapped with context, but the config client error (line 117) is returned unwrapped. Consider wrapping for consistency:
🔧 Optional: Wrap config client error for consistency
 	configClient, err := configv1client.NewForConfig(config)
 	if err != nil {
-		return nil, err
+		return nil, fmt.Errorf("creating openshift config clientset client: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/operator/main.go` around lines 92 - 126, In newClient, the error returned
from configv1client.NewForConfig is not wrapped for context; update the error
return when creating configClient (configv1client.NewForConfig) to wrap the
original error with fmt.Errorf and a descriptive message (e.g., "creating
openshift config client: %w") so it matches the earlier kubernetes client error
handling and aids debugging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/operator.go`:
- Line 421: The hardcoded 10*time.Minute used when creating configInformers
should be made consistent with the existing resyncPeriod constant or explicitly
documented if the shorter interval is intentional: replace the literal
10*time.Minute in the configInformers :=
configv1informers.NewSharedInformerFactory(...) call with the resyncPeriod
constant used by kubeInformersOperatorNS, or if you intentionally want faster
feature-gate/ClusterVersion detection, add a concise comment next to the
configInformers creation (referencing configInformers, resyncPeriod, and
kubeInformersOperatorNS) explaining why a shorter resync is required.

---

Nitpick comments:
In `@cmd/operator/main.go`:
- Around line 92-126: In newClient, the error returned from
configv1client.NewForConfig is not wrapped for context; update the error return
when creating configClient (configv1client.NewForConfig) to wrap the original
error with fmt.Errorf and a descriptive message (e.g., "creating openshift
config client: %w") so it matches the earlier kubernetes client error handling
and aids debugging.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3475a490-c6ee-49ed-a804-ff308f4b8655

📥 Commits

Reviewing files that changed from the base of the PR and between 304662e and 6d9331d.

📒 Files selected for processing (2)

cmd/operator/main.go
pkg/operator/operator.go

coderabbitai · 2026-03-12T10:16:40Z

pkg/operator/operator.go

 	o.informerFactories = append(o.informerFactories, kubeInformersOperatorNS)

-	configInformers := configv1informers.NewSharedInformerFactory(configClient, 10*time.Minute)
+	configInformers := configv1informers.NewSharedInformerFactory(c.OpenShiftConfigClientset(), 10*time.Minute)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if different resync periods are used intentionally across the codebase rg -n "NewSharedInformerFactory" --type=go -A2 | head -30

Repository: openshift/cluster-monitoring-operator

Length of output: 3123

🏁 Script executed:

# First, examine the file around line 421 sed -n '415,430p' pkg/operator/operator.go

Repository: openshift/cluster-monitoring-operator

Length of output: 766

🏁 Script executed:

# Search for resyncPeriod constant definition in non-vendor code rg -n "resyncPeriod\s*:?=" --type=go --type-not=go -- 'vendor' -v

Repository: openshift/cluster-monitoring-operator

Length of output: 123

🏁 Script executed:

# Find all NewSharedInformerFactory usages in non-vendor code with context rg -n "NewSharedInformerFactory" --type=go -B2 -A2 -- 'vendor' -v

Repository: openshift/cluster-monitoring-operator

Length of output: 8754

🏁 Script executed:

# Search for resyncPeriod definition in non-vendor code rg -n "resyncPeriod\s*:?=" --type=go | grep -v vendor

Repository: openshift/cluster-monitoring-operator

Length of output: 217

🏁 Script executed:

# Also check around the context for resyncPeriod sed -n '400,440p' pkg/operator/operator.go

Repository: openshift/cluster-monitoring-operator

Length of output: 1648

🏁 Script executed:

# Search for all NewSharedInformerFactory usages in non-vendor code rg -n "NewSharedInformerFactory" --type=go -B1 -A1 | grep -v vendor

Repository: openshift/cluster-monitoring-operator

Length of output: 560

🏁 Script executed:

# Check extended context around configInformers for any comments sed -n '405,445p' pkg/operator/operator.go

Repository: openshift/cluster-monitoring-operator

Length of output: 1555

🏁 Script executed:

# Search for any references to configInformers or featuregates that might explain the timing rg -n "10.*time.Minute|config.*Informer" pkg/operator/operator.go -B2 -A2

Repository: openshift/cluster-monitoring-operator

Length of output: 657

The hardcoded 10-minute resync period for configInformers differs from the 15-minute resyncPeriod constant used by other informers, and this difference lacks documentation.

The configInformers instance is specifically used for watching feature gates and cluster versions (which can trigger container restarts). If this shorter period is intentional for faster feature gate detection, add a comment to explain it. Otherwise, use the resyncPeriod constant for consistency with kubeInformersOperatorNS.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/operator/operator.go` at line 421, The hardcoded 10*time.Minute used when creating configInformers should be made consistent with the existing resyncPeriod constant or explicitly documented if the shorter interval is intentional: replace the literal 10*time.Minute in the configInformers := configv1informers.NewSharedInformerFactory(...) call with the resyncPeriod constant used by kubeInformersOperatorNS, or if you intentionally want faster feature-gate/ClusterVersion detection, add a concise comment next to the configInformers creation (referencing configInformers, resyncPeriod, and kubeInformersOperatorNS) explaining why a shorter resync is required.

openshift-ci · 2026-03-12T13:48:29Z

@simonpasquier: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-hypershift-conformance	`6d9331d`	link	true	`/test e2e-hypershift-conformance`
ci/prow/versions	`6d9331d`	link	false	`/test versions`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

simonpasquier · 2026-03-12T13:57:07Z

/hold

As discussed with @machine424, the e2e verification is hard (if not impossible) to do with our test framework (too disruptive), I'll let @juzhao verify that whenever the TLS profile is updated, CMO reconfigures itself as expected.

simonpasquier · 2026-03-12T13:57:15Z

/skip

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 6, 2026

openshift-ci bot requested review from rexagod and slashpai March 6, 2026 14:44

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2026

simonpasquier force-pushed the read-tls-profile branch from 3d4df6a to bf0d422 Compare March 9, 2026 08:58

simonpasquier changed the title ~~[WIP] feat: use TLS profile to configure CMO server~~ MON-4479: [WIP] feat: use TLS profile to configure CMO server Mar 9, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 9, 2026

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 9, 2026

simonpasquier changed the title ~~MON-4479: [WIP] feat: use TLS profile to configure CMO server~~ MON-4479: [WIP] use TLS profile to configure CMO server Mar 9, 2026

simonpasquier changed the title ~~MON-4479: [WIP] use TLS profile to configure CMO server~~ MON-4479: use TLS profile to configure CMO server Mar 9, 2026

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

cmd/operator/main.go Show resolved Hide resolved

pkg/manifests/apiserver_config.go Show resolved Hide resolved

pkg/operator/operator.go Show resolved Hide resolved

simonpasquier force-pushed the read-tls-profile branch 2 times, most recently from 6bbd685 to d482ba8 Compare March 9, 2026 16:32

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

simonpasquier force-pushed the read-tls-profile branch from d482ba8 to 12ff858 Compare March 10, 2026 08:34

feat: use TLS profile to configure CMO server

96f688c

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

simonpasquier force-pushed the read-tls-profile branch from 12ff858 to abb8516 Compare March 10, 2026 14:02

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

feat: exit process on TLS profile changes

304662e

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

simonpasquier force-pushed the read-tls-profile branch from abb8516 to 304662e Compare March 10, 2026 14:19

machine424 reviewed Mar 12, 2026

View reviewed changes

Address @machine424 comments

6d9331d

openshift-ci bot assigned machine424 Mar 12, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2026

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2026

	"TLS configuration: minimum version=%q, ciphers=[%s]",
	"TLS settings from API server configuration: minimum version=%q, ciphers=[%s]",

Conversation

simonpasquier commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Mar 9, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci-robot commented Mar 9, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Mar 9, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-ci-robot commented Mar 9, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Mar 10, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Mar 10, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

simonpasquier commented Mar 11, 2026

Uh oh!

simonpasquier commented Mar 11, 2026

Uh oh!

simonpasquier commented Mar 11, 2026

Uh oh!

juzhao commented Mar 12, 2026

Uh oh!

simonpasquier commented Mar 12, 2026

Uh oh!

simonpasquier commented Mar 12, 2026

Uh oh!

machine424 left a comment

Choose a reason for hiding this comment

Uh oh!

machine424 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

machine424 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

simonpasquier Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

machine424 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

simonpasquier Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

machine424 Mar 12, 2026

Choose a reason for hiding this comment

simonpasquier commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

openshift-ci-robot commented Mar 9, 2026 •

edited by openshift-ci bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

openshift-ci-robot commented Mar 9, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Mar 9, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Mar 9, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Mar 10, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Mar 10, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Mar 12, 2026 •

edited by openshift-ci bot

Loading