Skip to content

MON-4479: use TLS profile to configure CMO server#2841

Open
simonpasquier wants to merge 3 commits intoopenshift:mainfrom
simonpasquier:read-tls-profile
Open

MON-4479: use TLS profile to configure CMO server#2841
simonpasquier wants to merge 3 commits intoopenshift:mainfrom
simonpasquier:read-tls-profile

Conversation

@simonpasquier
Copy link
Contributor

@simonpasquier simonpasquier commented Mar 6, 2026

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

  • Refactor

    • Startup flow separated into prepare vs run; TLS settings are injected before serving and configuration errors surface earlier.
    • Operator now watches API server and related config resources and will stop/restart when TLS or config changes are detected.
  • Chores

    • API server TLS details are logged and applied to server TLS configuration.
    • Improved lifecycle and graceful shutdown handling.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 6, 2026
@openshift-ci openshift-ci bot requested review from rexagod and slashpai March 6, 2026 14:44
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2026
@simonpasquier simonpasquier changed the title [WIP] feat: use TLS profile to configure CMO server MON-4479: [WIP] feat: use TLS profile to configure CMO server Mar 9, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 9, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 9, 2026
@simonpasquier simonpasquier changed the title MON-4479: [WIP] feat: use TLS profile to configure CMO server MON-4479: [WIP] use TLS profile to configure CMO server Mar 9, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Main builds a wired operator client, reads the APIServer TLS config, passes TLS settings into server.NewServer, calls Server.Prepare(ctx) before Run, and injects the APIServer config plus a cancel callback into the operator which watches APIServer/config and related resources to trigger graceful restart on changes.

Changes

Cohort / File(s) Summary
Main / client bootstrap
cmd/operator/main.go
Adds unexported newClient(...) to construct a wired *client.Client; calls GetAPIServerConfig(ctx), wraps it via manifests.NewAPIServerConfig, logs TLS settings, passes API server config and a cancel callback into cmo.New, and supplies TLS params into server.NewServer; invokes srv.Prepare(ctx) before Run.
Client API
pkg/client/client.go
Simplifies GetAPIServerConfig(ctx) signature (removes name param) and always fetches APIServer named "cluster". Adds OpenShiftConfigClientset() accessor.
Operator constructor & watchers
pkg/operator/operator.go
New now accepts *client.Client, *manifests.APIServerConfig, and a cancel func(); operator stores apiServerConfig and cancel, removes lastKnownApiServerConfig, switches to OpenShift config informers, adds watchers for APIServer, ConfigMaps, Secrets, Infrastructure, Console, etc., and triggers cancel() when APIServer TLS profile changes.
APIServer config utilities
pkg/manifests/apiserver_config.go
Adds exported Equal(other *APIServerConfig) bool that compares MinTLSVersion() and TLSCiphers() using slices.Equal.
Server lifecycle & TLS wiring
pkg/server/server.go
Server gains minTLSVersion, cipherSuites, and srv *genericapiserver.GenericAPIServer fields; NewServer signature extended to accept TLS params; introduces Prepare(ctx) error to configure ServingInfo (certs, client CA, MinTLSVersion, CipherSuites) and register webhook handlers; Run now uses the prepared s.srv.

Sequence Diagram(s)

sequenceDiagram
    actor Main as Main
    participant Client as OperatorClient
    participant APIServer as OpenShiftAPIServer
    participant Server as OperatorServer
    participant Operator as CMOOperator
    participant Watcher as Informers/Watchers

    Main->>Client: newClient(ctx, restConfig, ...)
    Client->>APIServer: create clients & recorder
    Main->>Client: GetAPIServerConfig(ctx)
    APIServer-->>Client: APIServer object
    Client-->>Main: *client.Client / APIServerConfig

    Main->>Server: NewServer(..., minTLSVersion, tlsCiphers)
    Main->>Server: Prepare(ctx)
    Server-->>Main: prepared srv stored

    Main->>Operator: cmo.New(client, apiServerConfig, cancel)
    Operator->>Watcher: start informers/watch(APIServer, ConfigMaps, Secrets, ...)
    Main->>Operator: Run(ctx)

    Watcher->>Operator: APIServer change event
    Operator->>Operator: compare via APIServerConfig.Equal()
    alt config changed
        Operator->>Main: call cancel()
        Main->>Operator: graceful restart/shutdown
    end

    Main->>Server: Run(ctx)
    Server->>APIServer: serve with configured TLS settings
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main objective: configuring the CMO server to use TLS profile settings from the API server.
Stable And Deterministic Test Names ✅ Passed PR modifies only implementation files, not Ginkgo test files, so test naming requirements are not applicable.
Test Structure And Quality ✅ Passed No Ginkgo test files were modified or added in this PR; only non-test source files were changed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Jeffail/gabs/v2@v2.6.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/alecthomas/units@v0.0.0-20240927000941-0f3dac36c52b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/blang/semver/v4@v4.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-openapi/strfmt@v0.24.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/uuid@v1.6.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/imdario/mergo@v0.3.16: is explicitly

... [truncated 21195 characters] ...

les.txt\n\tsigs.k8s.io/apiserver-network-proxy/konnectivity-client@v0.31.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/kube-storage-version-migrator@v0.0.6-0.20230721195810-5c8923c5ff96: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/ginkgo/v2: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"


Comment @coderabbitai help to get the list of available commands and usage tips.

@simonpasquier simonpasquier changed the title MON-4479: [WIP] use TLS profile to configure CMO server MON-4479: use TLS profile to configure CMO server Mar 9, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

  • Refactor

  • Improved operator initialization with better dependency management for client and configuration handling.

  • Enhanced API server TLS configuration detection and application with automatic reactivity to configuration changes.

  • Optimized server startup flow with explicit preparation step before running.

  • Chores

  • Implemented graceful shutdown signaling for improved operator lifecycle management.

  • Expanded configuration monitoring for better consistency with API server settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/server/server.go (1)

52-65: ⚠️ Potential issue | 🔴 Critical

Critical: TLS configuration parameters are not assigned to the struct.

The minTLSVersion and cipherSuites parameters are accepted by NewServer but never stored in the returned Server struct. This means the TLS configuration will be empty when Prepare() uses s.minTLSVersion and s.cipherSuites.

🐛 Proposed fix
 func NewServer(name string, config *rest.Config, kubeConfig, certFile, keyFile string, minTLSVersion string, cipherSuites []string) (*Server, error) {
 	kubeClient, err := kubernetes.NewForConfig(config)
 	if err != nil {
 		return nil, err
 	}

 	return &Server{
-		name:       name,
-		kubeClient: kubeClient,
-		kubeConfig: kubeConfig,
-		certFile:   certFile,
-		keyFile:    keyFile,
+		name:          name,
+		kubeClient:    kubeClient,
+		kubeConfig:    kubeConfig,
+		certFile:      certFile,
+		keyFile:       keyFile,
+		minTLSVersion: minTLSVersion,
+		cipherSuites:  cipherSuites,
 	}, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/server/server.go` around lines 52 - 65, NewServer currently accepts
minTLSVersion and cipherSuites but does not assign them to the Server it
returns, so Prepare() will see empty s.minTLSVersion and s.cipherSuites; update
NewServer to set the Server fields (e.g., minTLSVersion: minTLSVersion,
cipherSuites: cipherSuites) when constructing the &Server{}, and if those fields
do not exist on the Server struct add them (matching types used by Prepare()),
ensuring Prepare() uses the populated s.minTLSVersion and s.cipherSuites.
🧹 Nitpick comments (2)
pkg/client/client.go (1)

1967-1973: Consider consolidating duplicate accessor methods.

There are now two methods returning the same field:

  • EventRecorder() (Line 279)
  • EventsRecorder() (Line 1971)

Both return c.eventRecorder. This creates potential confusion for maintainers. Consider using a single consistent name or documenting why both are needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/client/client.go` around lines 1967 - 1973, There are duplicate accessors
returning the same field c.eventRecorder (EventRecorder and EventsRecorder);
consolidate to a single canonical method name (pick one, e.g., EventRecorder) by
removing the redundant method (or make it a thin deprecated wrapper that calls
the canonical method), update all call sites to use the canonical name, and add
a small comment on the remaining method explaining its purpose; ensure the
unique symbols EventRecorder, EventsRecorder, and c.eventRecorder are used to
locate and update the code and callers.
pkg/server/server.go (1)

135-142: Verify Prepare() is called before Run().

Run() now assumes s.srv is initialized by a prior call to Prepare(). If Run() is called without Prepare(), this will panic with a nil pointer dereference on s.srv.PrepareRun().

Consider adding a guard or documenting this requirement clearly.

🛡️ Proposed defensive check
 func (s *Server) Run(ctx context.Context) error {
+	if s.srv == nil {
+		return fmt.Errorf("server not prepared: call Prepare() before Run()")
+	}
 	if err := s.srv.PrepareRun().RunWithContext(ctx); ctx.Err() == nil {
 		return err
 	}

 	klog.Info("server exited")
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/server/server.go` around lines 135 - 142, The Run method assumes s.srv is
initialized and will panic if Prepare() was not called; add a defensive guard at
the start of Server.Run that verifies s.srv is non-nil (or call s.Prepare() if
that is the intended behavior) before invoking s.srv.PrepareRun(), and return a
clear error if initialization is missing; update Server.Run to check s.srv
(and/or s.srv.PrepareRun() availability) and return a descriptive error instead
of dereferencing a nil pointer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/operator/main.go`:
- Around line 270-286: The Server returned by NewServer is missing assignment of
the TLS fields so Prepare() sees empty values; update the struct literal
returned by NewServer (the &Server{...} in server.NewServer) to include
minTLSVersion: minTLSVersion and cipherSuites: cipherSuites (matching the
function parameters) so the Server struct's fields are populated before
Prepare() is called.

In `@pkg/manifests/apiserver_config.go`:
- Around line 42-47: APIServerConfig.Equal currently calls methods on the
receiver and on the other pointer without nil checks and can panic; update
APIServerConfig.Equal to handle nil safely by returning true if both receiver
and other are nil, false if only one is nil, and otherwise comparing
c.MinTLSVersion() and slices.Equal(c.TLSCiphers(), other.TLSCiphers()) as
before; reference the Equal method on APIServerConfig and the helper methods
MinTLSVersion and TLSCiphers when locating the change.

In `@pkg/operator/operator.go`:
- Around line 632-640: The handler closes o.done directly which can panic if
called twice; add a sync.Once field (e.g., doneOnce) to the Operator struct and
replace close(o.done) with doneOnce.Do(func() { close(o.done) }) inside the
event handling path (the switch handling *configv1.APIServer in the method that
closes o.done) so the channel is closed exactly once.

---

Outside diff comments:
In `@pkg/server/server.go`:
- Around line 52-65: NewServer currently accepts minTLSVersion and cipherSuites
but does not assign them to the Server it returns, so Prepare() will see empty
s.minTLSVersion and s.cipherSuites; update NewServer to set the Server fields
(e.g., minTLSVersion: minTLSVersion, cipherSuites: cipherSuites) when
constructing the &Server{}, and if those fields do not exist on the Server
struct add them (matching types used by Prepare()), ensuring Prepare() uses the
populated s.minTLSVersion and s.cipherSuites.

---

Nitpick comments:
In `@pkg/client/client.go`:
- Around line 1967-1973: There are duplicate accessors returning the same field
c.eventRecorder (EventRecorder and EventsRecorder); consolidate to a single
canonical method name (pick one, e.g., EventRecorder) by removing the redundant
method (or make it a thin deprecated wrapper that calls the canonical method),
update all call sites to use the canonical name, and add a small comment on the
remaining method explaining its purpose; ensure the unique symbols
EventRecorder, EventsRecorder, and c.eventRecorder are used to locate and update
the code and callers.

In `@pkg/server/server.go`:
- Around line 135-142: The Run method assumes s.srv is initialized and will
panic if Prepare() was not called; add a defensive guard at the start of
Server.Run that verifies s.srv is non-nil (or call s.Prepare() if that is the
intended behavior) before invoking s.srv.PrepareRun(), and return a clear error
if initialization is missing; update Server.Run to check s.srv (and/or
s.srv.PrepareRun() availability) and return a descriptive error instead of
dereferencing a nil pointer.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4184a2ff-26a7-4dbc-bdcf-0a7acbd40b4a

📥 Commits

Reviewing files that changed from the base of the PR and between 61407b0 and 76df984.

📒 Files selected for processing (5)
  • cmd/operator/main.go
  • pkg/client/client.go
  • pkg/manifests/apiserver_config.go
  • pkg/operator/operator.go
  • pkg/server/server.go

@simonpasquier simonpasquier force-pushed the read-tls-profile branch 2 times, most recently from 6bbd685 to d482ba8 Compare March 9, 2026 16:32
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

  • Refactor

  • Reworked operator and server startup to inject API server TLS settings and prepare the server before running; operator initialization now accepts a preconfigured client and API server config.

  • Expanded watch/sync surface so the operator reacts to API server and related config changes.

  • Chores

  • Added graceful shutdown signaling and improved lifecycle handling.

  • Exposed API server TLS details for logging and applied them to server TLS configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/operator/operator.go (1)

351-355: ⚠️ Potential issue | 🟠 Major

Handle the initial APIServer object too.

CMO snapshots the TLS profile in cmd/operator/main.go before this informer starts, but this handler only reacts to later updates. If the profile changes in that startup window, the process can keep the stale TLS settings until another APIServer event arrives.

🔧 Suggested fix
 	_, err = informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
+		AddFunc: o.handleEvent,
 		UpdateFunc: func(_, newObj interface{}) {
 			o.handleEvent(newObj)
 		},
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/operator.go` around lines 351 - 355, The handler currently only
registers an UpdateFunc calling o.handleEvent, so the initial APIServer snapshot
can be missed; ensure the initial APIServer is processed at startup by either
adding an AddFunc to the cache.ResourceEventHandlerFuncs that calls
o.handleEvent for created objects or, after the informer is synced, retrieve the
current APIServer from the informer's store/lister and call o.handleEvent on
that object (use the same informer variable and o.handleEvent to locate where to
invoke this). Ensure this runs once after informer sync so startup TLS profile
changes are handled.
🧹 Nitpick comments (1)
pkg/client/client.go (1)

1971-1973: Avoid a second accessor for the same recorder.

EventRecorder() already exposes c.eventRecorder, so EventsRecorder() just adds another public spelling for the same dependency. Reusing the existing method keeps the client surface smaller and avoids singular/plural drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/client/client.go` around lines 1971 - 1973, Remove the duplicate accessor
EventsRecorder and consolidate on the existing EventRecorder method: delete the
EventsRecorder function from type Client, then update any callers that reference
EventsRecorder to call EventRecorder instead (preserve the same return type
events.Recorder). Ensure imports/tests/usage compile after replacing
EventsRecorder with EventRecorder and run tests to verify.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/operator/operator.go`:
- Around line 351-355: The handler currently only registers an UpdateFunc
calling o.handleEvent, so the initial APIServer snapshot can be missed; ensure
the initial APIServer is processed at startup by either adding an AddFunc to the
cache.ResourceEventHandlerFuncs that calls o.handleEvent for created objects or,
after the informer is synced, retrieve the current APIServer from the informer's
store/lister and call o.handleEvent on that object (use the same informer
variable and o.handleEvent to locate where to invoke this). Ensure this runs
once after informer sync so startup TLS profile changes are handled.

---

Nitpick comments:
In `@pkg/client/client.go`:
- Around line 1971-1973: Remove the duplicate accessor EventsRecorder and
consolidate on the existing EventRecorder method: delete the EventsRecorder
function from type Client, then update any callers that reference EventsRecorder
to call EventRecorder instead (preserve the same return type events.Recorder).
Ensure imports/tests/usage compile after replacing EventsRecorder with
EventRecorder and run tests to verify.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff0f74b4-31f1-4bab-b122-c24040f7042e

📥 Commits

Reviewing files that changed from the base of the PR and between 76df984 and d482ba8.

📒 Files selected for processing (5)
  • cmd/operator/main.go
  • pkg/client/client.go
  • pkg/manifests/apiserver_config.go
  • pkg/operator/operator.go
  • pkg/server/server.go

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

  • Refactor

  • Operator and server startup reworked to accept a preconfigured client and API server config, injecting TLS settings and preparing the server before running.

  • Operator watch/sync surface expanded so it reacts to API server and related config changes and can stop/restart on TLS/config updates.

  • Chores

  • Improved lifecycle and graceful shutdown handling.

  • API server TLS details are now logged and applied to server TLS configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/operator/operator.go (1)

345-355: ⚠️ Potential issue | 🟠 Major

Handle the APIServer informer’s initial add event too.

cmd/operator/main.go reads APIServer/cluster before informers start, so a TLS profile change in that window only shows up during this informer's initial LIST. With only UpdateFunc registered here, CMO misses that drift and can keep serving with stale TLS settings until the next edit.

🛠️ Suggested fix
 	_, err = informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
+		AddFunc: func(obj interface{}) {
+			o.handleEvent(obj)
+		},
 		UpdateFunc: func(_, newObj interface{}) {
 			o.handleEvent(newObj)
 		},
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/operator.go` around lines 345 - 355, The APIServer informer only
registers UpdateFunc, so the initial LIST (add) events are ignored and
o.handleEvent never runs for the existing APIServer object; update the informer
registration (the cache.NewSharedIndexInformer block where
informer.AddEventHandler is called) to also provide an AddFunc in the
cache.ResourceEventHandlerFuncs that calls o.handleEvent with the newObj (same
as UpdateFunc) so the initial APIServer/cluster state is processed immediately
when the informer starts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/operator/operator.go`:
- Around line 345-355: The APIServer informer only registers UpdateFunc, so the
initial LIST (add) events are ignored and o.handleEvent never runs for the
existing APIServer object; update the informer registration (the
cache.NewSharedIndexInformer block where informer.AddEventHandler is called) to
also provide an AddFunc in the cache.ResourceEventHandlerFuncs that calls
o.handleEvent with the newObj (same as UpdateFunc) so the initial
APIServer/cluster state is processed immediately when the informer starts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de251b38-0a3e-4b4c-9800-a81e4dd3330c

📥 Commits

Reviewing files that changed from the base of the PR and between d482ba8 and abb8516.

📒 Files selected for processing (5)
  • cmd/operator/main.go
  • pkg/client/client.go
  • pkg/manifests/apiserver_config.go
  • pkg/operator/operator.go
  • pkg/server/server.go

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

  • Refactor

  • Startup flow reworked to accept a preconfigured client and API server config; server preparation is separated from run and TLS settings are injected before serving.

  • Operator now watches additional API server and related config resources and can stop/restart when TLS or config changes occur.

  • Chores

  • Improved lifecycle and graceful shutdown handling.

  • API server TLS details are logged and applied to server TLS configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@simonpasquier
Copy link
Contributor Author

/retest-required

@simonpasquier
Copy link
Contributor Author

/test e2e-hypershift-conformance

1 similar comment
@simonpasquier
Copy link
Contributor Author

/test e2e-hypershift-conformance

@juzhao
Copy link
Contributor

juzhao commented Mar 12, 2026

/test e2e-hypershift-conformance

@simonpasquier
Copy link
Contributor Author

/test e2e-hypershift-conformance/test e2e-hypershift-conformance

@simonpasquier
Copy link
Contributor Author

/test e2e-hypershift-conformance

Copy link
Contributor

@machine424 machine424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this!
lgtm, I have some questions/suggestions.

we had an e2e test that we had to turn into unit because MCO started to restart nodes to propagate TLS changes (the change becomes too disruptive). https://issues.redhat.com/browse/MON-3960

It'd be great to have a test for this though to make sure CMO behaves well after a change/restart, maybe that will come for free as part of openshift/origin#30801? (not a blocker of this PR)

return 1
}

if err := srv.Prepare(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment on whether this as separated would be helpful.
Also, maybe Prepare could just return the server?

)

_, err = informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably do the same for the other configmaps/secrets in case they get deleted by mistake, recreated and never updated. (and the informers not restarted in the meanwhile, not sure if they do a relist tough...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. In practice I suppose that secrets/configmaps will have revision changes after recreation so it might be more of a theoretical issue.

}
apiServerConfigAdapter := manifests.NewAPIServerConfig(apiServerConfig)
klog.Infof(
"TLS configuration: minimum version=%q, ciphers=[%s]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Suggested change
"TLS configuration: minimum version=%q, ciphers=[%s]",
"TLS settings from API server configuration: minimum version=%q, ciphers=[%s]",

to be more explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

// Trigger a restart of the process to read the new TLS
// configuration.
klog.Info("Detected changes to the TLS profile configuration, stopping the process")
o.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this will cancel the root ctx

ctx, cancel := context.WithCancel(context.Background())
so it'd not only stop the server (the PR title misled me).

I assume it's simpler to just restart the whole container in case of just the server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct it will make the process exit. This is what is currently recommended as it's simple and updating the TLS profile shouldn't be a frequent operation.

@simonpasquier
Copy link
Contributor Author

It'd be great to have a test for this though to make sure CMO behaves well after a change/restart, maybe that will come for free as part of openshift/origin#30801? (not a blocker of this PR)

yeah, I'd defer that to a generic origin test rather than something particular to CMO.

@machine424
Copy link
Contributor

yeah, I'd defer that to a generic origin test rather than something particular to CMO.

sure, I assume you gave this a try locally :)

/lgtm

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 12, 2026

@simonpasquier: This pull request references MON-4479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements both reading the TLS configuration from the APIServer configuration and stopping the process when a change is detected. It ensures that CMO always uses the up-to-date configuration.

Summary by CodeRabbit

  • Refactor

  • Startup flow separated into prepare vs run; TLS settings are injected before serving and configuration errors surface earlier.

  • Operator now watches API server and related config resources and will stop/restart when TLS or config changes are detected.

  • Chores

  • API server TLS details are logged and applied to server TLS configuration.

  • Improved lifecycle and graceful shutdown handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: machine424, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [machine424,simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cmd/operator/main.go (1)

92-126: LGTM - Clean helper function for client initialization.

The newClient helper encapsulates client creation with proper error handling and event recorder setup. The fallback behavior when controller reference is unavailable is appropriate.

One minor inconsistency: the kubernetes client error (line 100) is wrapped with context, but the config client error (line 117) is returned unwrapped. Consider wrapping for consistency:

🔧 Optional: Wrap config client error for consistency
 	configClient, err := configv1client.NewForConfig(config)
 	if err != nil {
-		return nil, err
+		return nil, fmt.Errorf("creating openshift config clientset client: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/operator/main.go` around lines 92 - 126, In newClient, the error returned
from configv1client.NewForConfig is not wrapped for context; update the error
return when creating configClient (configv1client.NewForConfig) to wrap the
original error with fmt.Errorf and a descriptive message (e.g., "creating
openshift config client: %w") so it matches the earlier kubernetes client error
handling and aids debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/operator.go`:
- Line 421: The hardcoded 10*time.Minute used when creating configInformers
should be made consistent with the existing resyncPeriod constant or explicitly
documented if the shorter interval is intentional: replace the literal
10*time.Minute in the configInformers :=
configv1informers.NewSharedInformerFactory(...) call with the resyncPeriod
constant used by kubeInformersOperatorNS, or if you intentionally want faster
feature-gate/ClusterVersion detection, add a concise comment next to the
configInformers creation (referencing configInformers, resyncPeriod, and
kubeInformersOperatorNS) explaining why a shorter resync is required.

---

Nitpick comments:
In `@cmd/operator/main.go`:
- Around line 92-126: In newClient, the error returned from
configv1client.NewForConfig is not wrapped for context; update the error return
when creating configClient (configv1client.NewForConfig) to wrap the original
error with fmt.Errorf and a descriptive message (e.g., "creating openshift
config client: %w") so it matches the earlier kubernetes client error handling
and aids debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3475a490-c6ee-49ed-a804-ff308f4b8655

📥 Commits

Reviewing files that changed from the base of the PR and between 304662e and 6d9331d.

📒 Files selected for processing (2)
  • cmd/operator/main.go
  • pkg/operator/operator.go

o.informerFactories = append(o.informerFactories, kubeInformersOperatorNS)

configInformers := configv1informers.NewSharedInformerFactory(configClient, 10*time.Minute)
configInformers := configv1informers.NewSharedInformerFactory(c.OpenShiftConfigClientset(), 10*time.Minute)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if different resync periods are used intentionally across the codebase
rg -n "NewSharedInformerFactory" --type=go -A2 | head -30

Repository: openshift/cluster-monitoring-operator

Length of output: 3123


🏁 Script executed:

# First, examine the file around line 421
sed -n '415,430p' pkg/operator/operator.go

Repository: openshift/cluster-monitoring-operator

Length of output: 766


🏁 Script executed:

# Search for resyncPeriod constant definition in non-vendor code
rg -n "resyncPeriod\s*:?=" --type=go --type-not=go -- 'vendor' -v

Repository: openshift/cluster-monitoring-operator

Length of output: 123


🏁 Script executed:

# Find all NewSharedInformerFactory usages in non-vendor code with context
rg -n "NewSharedInformerFactory" --type=go -B2 -A2 -- 'vendor' -v

Repository: openshift/cluster-monitoring-operator

Length of output: 8754


🏁 Script executed:

# Search for resyncPeriod definition in non-vendor code
rg -n "resyncPeriod\s*:?=" --type=go | grep -v vendor

Repository: openshift/cluster-monitoring-operator

Length of output: 217


🏁 Script executed:

# Also check around the context for resyncPeriod
sed -n '400,440p' pkg/operator/operator.go

Repository: openshift/cluster-monitoring-operator

Length of output: 1648


🏁 Script executed:

# Search for all NewSharedInformerFactory usages in non-vendor code
rg -n "NewSharedInformerFactory" --type=go -B1 -A1 | grep -v vendor

Repository: openshift/cluster-monitoring-operator

Length of output: 560


🏁 Script executed:

# Check extended context around configInformers for any comments
sed -n '405,445p' pkg/operator/operator.go

Repository: openshift/cluster-monitoring-operator

Length of output: 1555


🏁 Script executed:

# Search for any references to configInformers or featuregates that might explain the timing
rg -n "10.*time.Minute|config.*Informer" pkg/operator/operator.go -B2 -A2

Repository: openshift/cluster-monitoring-operator

Length of output: 657


The hardcoded 10-minute resync period for configInformers differs from the 15-minute resyncPeriod constant used by other informers, and this difference lacks documentation.

The configInformers instance is specifically used for watching feature gates and cluster versions (which can trigger container restarts). If this shorter period is intentional for faster feature gate detection, add a comment to explain it. Otherwise, use the resyncPeriod constant for consistency with kubeInformersOperatorNS.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/operator.go` at line 421, The hardcoded 10*time.Minute used when
creating configInformers should be made consistent with the existing
resyncPeriod constant or explicitly documented if the shorter interval is
intentional: replace the literal 10*time.Minute in the configInformers :=
configv1informers.NewSharedInformerFactory(...) call with the resyncPeriod
constant used by kubeInformersOperatorNS, or if you intentionally want faster
feature-gate/ClusterVersion detection, add a concise comment next to the
configInformers creation (referencing configInformers, resyncPeriod, and
kubeInformersOperatorNS) explaining why a shorter resync is required.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 12, 2026

@simonpasquier: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift-conformance 6d9331d link true /test e2e-hypershift-conformance
ci/prow/versions 6d9331d link false /test versions

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@simonpasquier
Copy link
Contributor Author

/hold

As discussed with @machine424, the e2e verification is hard (if not impossible) to do with our test framework (too disruptive), I'll let @juzhao verify that whenever the TLS profile is updated, CMO reconfigures itself as expected.

@simonpasquier
Copy link
Contributor Author

/skip

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants