Skip to content

MON-4517: Introduce missing minimal monitors with "full" CP metrics#2814

Open
rexagod wants to merge 7 commits intoopenshift:mainfrom
rexagod:MON-4517
Open

MON-4517: Introduce missing minimal monitors with "full" CP metrics#2814
rexagod wants to merge 7 commits intoopenshift:mainfrom
rexagod:MON-4517

Conversation

@rexagod
Copy link
Member

@rexagod rexagod commented Feb 10, 2026

Bases minimal monitors on the same specs as the "full" ones.


This is rebased over #2821

Summary by CodeRabbit

  • New Features

    • Added a new "telemetry" collection profile and telemetry-specific ServiceMonitors across core monitoring components (Alertmanager, Prometheus, kube-state-metrics, node-exporter, telemeter-client, control-plane, openshift-state-metrics).
    • Introduced "minimal" collection-profile ServiceMonitors for many components to reduce metric volume and provide TLS-authenticated scrape endpoints.
  • Tests

    • Updated collection-profile tests to include the new telemetry profile in selection logic.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 10, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 10, 2026

@rexagod: This pull request references MON-4517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Contractually, minimal profile is supposed to support telemetry
operations as well. This makes sure that minimal monitors always
have the expected set of telemetry metrics as well.


Please merge #2694 first. This PR is rebased over that. Only the last commit reflects the actual changes introduced here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2026
@rexagod rexagod force-pushed the MON-4517 branch 2 times, most recently from 22a4323 to d65d951 Compare February 10, 2026 17:06
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 17, 2026

@rexagod: This pull request references MON-4517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Contractually, minimal profile is supposed to support telemetry
operations as well
. This makes sure that minimal monitors always
have the expected set of telemetry metrics as well.


Please merge #2694 first. This PR is rebased over that. Only the last commit reflects the actual changes introduced here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@simonpasquier
Copy link
Contributor

/cc @simonpasquier

@openshift-ci openshift-ci bot requested a review from simonpasquier February 17, 2026 15:55
@simonpasquier
Copy link
Contributor

Contractually, minimal profile is supposed to support telemetry operations as well. This makes sure that minimal monitors always
have the expected set of telemetry metrics as well.

IMHO it should be the other way around: unless defined more precisely, minimal should be the same as full (because of dashboards and alerting rules).

rexagod added a commit to rexagod/cluster-monitoring-operator that referenced this pull request Feb 24, 2026
Does not entail the dynamically generating the whitelist and updating
the monitoring from that [1], as well as, the `minimal` monitors' revision
[2].

[1]:openshift#2694
[2]:openshift#2814
rexagod added a commit to rexagod/cluster-monitoring-operator that referenced this pull request Feb 24, 2026
Does not entail the dynamically generating the whitelist and updating
the monitoring from that [1], as well as, the `minimal` monitors' revision
[2].

[1]:openshift#2694
[2]:openshift#2814
rexagod added a commit to rexagod/cluster-monitoring-operator that referenced this pull request Feb 24, 2026
Does not entail the dynamically generating the whitelist and updating
the monitoring from that [1], as well as, the `minimal` monitors' revision
[2].

[1]:openshift#2694
[2]:openshift#2814
Comment on lines +9 to +10
local run(sm, metrics, profile) =
local run(smWithDrop, metrics, profile, shouldRemoveDrop) =
local sm = if shouldRemoveDrop then removeDrop(smWithDrop) else smWithDrop;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~/bases/work/cluster-monitoring-operator/assets MON-4517 ≡
❯ for i in alertmanager cluster-monitoring-operator openshift-state-metrics prometheus-k8s telemeter-client; do diff $i/minimal-service-monitor.yaml $i/service-monitor.yaml; done
11,12c11
<     monitoring.openshift.io/collection-profile: minimal
<   name: alertmanager-main-minimal
---
>   name: alertmanager-main
8,9c8
<     monitoring.openshift.io/collection-profile: minimal
<   name: cluster-monitoring-operator-minimal
---
>   name: cluster-monitoring-operator
8,9c8
<     monitoring.openshift.io/collection-profile: minimal
<   name: openshift-state-metrics-minimal
---
>   name: openshift-state-metrics
11,12c11
<     monitoring.openshift.io/collection-profile: minimal
<   name: prometheus-k8s-minimal
---
>   name: prometheus-k8s
8,9c8
<     monitoring.openshift.io/collection-profile: minimal
<   name: telemeter-client-minimal
---
>   name: telemeter-client

Copy link
Contributor

@danielmellado danielmellado Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a diff of an output for the libsonnet generation? xD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry. I pasted this here to show that nothing was changed in the newly generated minimal monitors from their full counterparts except names and the CP label.

@rexagod rexagod changed the title MON-4517: Make minimal monitors inherit telemetry metrics MON-4517: Make minimal monitors inherit ~telemetry~ full metrics Feb 25, 2026
@rexagod rexagod changed the title MON-4517: Make minimal monitors inherit ~telemetry~ full metrics MON-4517: Make minimal monitors inherit "full" metrics Feb 25, 2026
@@ -6,7 +6,8 @@
// 2. Add the profile prefix to the ServiceMonitor name
// 3. Add the profile label "monitoring.openshift.io/collection-profile: <profile>"
// 4. Add a metricRelabelings with action keep and regex equal to metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace this with something such as "// 4. If metrics is non-null, add a metricRelabelings with action keep and regex equal to metrics" as the logic somehow changed

@rexagod rexagod force-pushed the MON-4517 branch 2 times, most recently from 1182437 to 361d6a7 Compare February 25, 2026 16:27
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 25, 2026

@rexagod: This pull request references MON-4517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Bases minimal monitors on the same specs as the "full" ones.


This is rebased over #2821

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

rexagod added a commit to rexagod/cluster-monitoring-operator that referenced this pull request Feb 25, 2026
Does not entail the dynamically generating the whitelist and updating
the monitoring from that [1], as well as, the `minimal` monitors' revision
[2].

[1]:openshift#2694
[2]:openshift#2814
@rexagod rexagod changed the title MON-4517: Make minimal monitors inherit "full" metrics MON-4517: Introduce missing minimal monitors with "full" CP metrics Feb 25, 2026
},

minimal(sm, metrics): minimal(removeDrop(sm), metrics),
minimal(sm, metrics, removeDrop=true): run(sm, metrics, profiles[0], removeDrop),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I feel that adding more arguments makes it harder to understand what happens at the caller site. Can we have a more explicit API?

// Returns a copy of the input service monitor with the correct annotation.
serviceMonitorForMinimalProfile(sm): ...
serviceMonitorForTelemetryProfile(sm): ...


// Returns a copy of the input service monitor with the list of metrics.
keepOnlyMetrics(sm, metrics): ... 

so we would call it like:

utils.serviceMonitorForTelemetryProfile(
    utils.keepOnlyMetrics(sm, ["foo", "bar"]),
)

or

// just use the full service monitor (no customization for minimal).
utils.serviceMonitorForMinimalProfile(sm)

if err != nil {
return fmt.Errorf("reconciling Alertmanager ServiceMonitor failed: %w", err)
for _, smam := range smams {
err = t.client.CreateOrUpdateServiceMonitor(ctx, smam)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a CreateOrUpdateServiceMonitors() function to reduce duplication?

interval: 30s
metricRelabelings:
- action: keep
regex: (ALERTS|prometheus_tsdb_head_samples_appended_total|prometheus_tsdb_head_series|scrape_samples_post_metric_relabeling|scrape_series_added|up)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) up and scrape_* metrics aren't coming from the scraped target.

Does not entail the dynamically generating the whitelist and updating
the monitoring from that [1], as well as, the `minimal` monitors' revision
[2].

[1]:openshift#2694
[2]:openshift#2814
rexagod added 2 commits March 11, 2026 01:51
Bases minimal monitors on the same specs as the "full" ones.

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0fd74c02-6732-4798-85c6-21aa0993b62b

📥 Commits

Reviewing files that changed from the base of the PR and between 3ab1f06 and 49786bf.

📒 Files selected for processing (6)
  • assets/prometheus-k8s/minimal-service-monitor.yaml
  • assets/prometheus-k8s/telemetry-service-monitor.yaml
  • pkg/tasks/alertmanager.go
  • pkg/tasks/clustermonitoringoperator.go
  • pkg/tasks/openshiftstatemetrics.go
  • pkg/tasks/prometheus.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/tasks/clustermonitoringoperator.go
  • assets/prometheus-k8s/telemetry-service-monitor.yaml
  • assets/prometheus-k8s/minimal-service-monitor.yaml

Walkthrough

Adds telemetry and minimal ServiceMonitor variants and a profile-driven generator; updates Jsonnet components to export the new monitors; extends manifests Factory/types/tests to produce grouped full/minimal/telemetry outputs; and changes client/tasks to create/update/delete ServiceMonitors in batch.

Changes

Cohort / File(s) Summary
ServiceMonitor YAML assets
assets/alertmanager/*.yaml, assets/cluster-monitoring-operator/*.yaml, assets/control-plane/telemetry-service-monitor-kubelet.yaml, assets/kube-state-metrics/*.yaml, assets/node-exporter/*.yaml, assets/openshift-state-metrics/*.yaml, assets/prometheus-k8s/*.yaml, assets/telemeter-client/*.yaml
Added new minimal and telemetry ServiceMonitor manifests for multiple components. Most manifests add TLS-enabled endpoints, scrapeClass tls-client-certificate-auth, metricRelabelings; kubelet manifest contains multiple TLS endpoints with per-endpoint relabeling and filtering.
Jsonnet utils: generator & auth
jsonnet/utils/generate-service-monitors.libsonnet, jsonnet/utils/configure-authentication-for-monitors.libsonnet
Replaced hard-coded minimal logic with a profile-based generator: keepOnlyMetrics, serviceMonitorForMinimalProfile, serviceMonitorForTelemetryProfile, profile suffix strip/add, and improved metricRelabeling drop/keep handling.
Jsonnet component exports
jsonnet/components/.../*.libsonnet (e.g., alertmanager.libsonnet, cluster-monitoring-operator.libsonnet, control-plane.libsonnet, kube-state-metrics.libsonnet, node-exporter.libsonnet, openshift-state-metrics.libsonnet, prometheus.libsonnet, telemeter-client.libsonnet)
Imported the generator util and added exported minimalServiceMonitor and telemetryServiceMonitor (or telemetry variants) across components; refactored generation to compose keepOnlyMetrics → profile builders, changing construction flow/signatures.
Go: manifests factory, types, tests
pkg/manifests/manifests.go, pkg/manifests/types.go, pkg/manifests/manifests_test.go
Expanded Factory API and helper to return grouped ServiceMonitors (full, minimal, telemetry); added TelemetryCollectionProfile constant and updated tests/selectors to include telemetry in collection-profile logic.
Go: client batch API
pkg/client/client.go
Added batch methods CreateOrUpdateServiceMonitors and DeleteServiceMonitors that iterate over provided ServiceMonitors and return the first error; adjusted some error messages to include namespace/name.
Go: task reconciler changes
pkg/tasks/*.go (e.g., alertmanager.go, clustermonitoringoperator.go, controlplane.go, kubestatemetrics.go, nodeexporter.go, openshiftstatemetrics.go, prometheus.go, telemeter.go)
Replaced per-item reconciliation with batch operations using new Factory multi-monitor methods and client batch APIs; updated variable names and consolidated create/update/delete calls and error messages to pluralized “ServiceMonitors”.

Sequence Diagram(s)

sequenceDiagram
    participant Operator as ClusterMonitoring Operator
    participant Factory as Manifests Factory
    participant Generator as Jsonnet Profile Generator
    participant Client as API Client

    Operator->>Factory: Request ServiceMonitors(component)
    Factory->>Factory: Load base ServiceMonitor asset
    Factory->>Generator: keepOnlyMetrics(base, metricList)
    Generator-->>Factory: filtered ServiceMonitor
    Factory->>Generator: serviceMonitorForMinimalProfile(filtered)
    Generator-->>Factory: minimal variant
    Factory->>Generator: serviceMonitorForTelemetryProfile(filtered)
    Generator-->>Factory: telemetry variant
    Factory-->>Operator: Return [full, minimal, telemetry]
    Operator->>Client: CreateOrUpdateServiceMonitors(ctx, sms)
    Client->>Client: Iterate and create/update each ServiceMonitor
    Client-->>Operator: Success / Error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main change: introducing missing minimal monitors based on full control-plane metrics specifications.
Stable And Deterministic Test Names ✅ Passed Pull request adds test cases with stable, deterministic names (full_collection_profile, minimal_collection_profile, telemetry_collection_profile) using static string literals with no dynamic values, timestamps, or generated identifiers.
Test Structure And Quality ✅ Passed PR does not modify any Ginkgo test code; only modifies standard Go unit tests and application code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Jeffail/gabs/v2@v2.6.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/alecthomas/units@v0.0.0-20240927000941-0f3dac36c52b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/blang/semver/v4@v4.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-openapi/strfmt@v0.24.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/uuid@v1.6.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/imdario/mergo@v0.3.16: is explicitly

... [truncated 21195 characters] ...

les.txt\n\tsigs.k8s.io/apiserver-network-proxy/konnectivity-client@v0.31.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/kube-storage-version-migrator@v0.0.6-0.20230721195810-5c8923c5ff96: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/ginkgo/v2: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@rexagod: This pull request references MON-4517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Bases minimal monitors on the same specs as the "full" ones.


This is rebased over #2821

Summary by CodeRabbit

Release Notes

  • New Features
  • Added support for telemetry collection profile across cluster monitoring components, including alertmanager, kube-state-metrics, node-exporter, prometheus, and telemeter-client.
  • Introduced minimal collection profile variants for monitoring components to provide granular metric collection options.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
jsonnet/utils/generate-service-monitors.libsonnet (1)

9-12: ⚠️ Potential issue | 🟠 Major

Filter incorrectly removes metricRelabelings without an explicit action field.

The condition std.objectHas(x, 'action') && x.action != 'drop' will exclude any metricRelabeling that doesn't have an action field. In Prometheus, when action is omitted, it defaults to replace. This means legitimate relabelings (like labeldrop, or implicit replace) without an explicit action key would be incorrectly removed.

Proposed fix
-                        metricRelabelings: [x for x in e.metricRelabelings if std.objectHas(x, 'action') && x.action != 'drop'],
+                        metricRelabelings: [x for x in e.metricRelabelings if !std.objectHas(x, 'action') || x.action != 'drop'],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jsonnet/utils/generate-service-monitors.libsonnet` around lines 9 - 12, The
current filter "[x for x in e.metricRelabelings if std.objectHas(x, 'action') &&
x.action != 'drop']" drops relabelings that omit the action field (which should
default to "replace"); change the predicate to keep entries that either do not
have an action or whose action is not "drop" (e.g. use "if ! (std.objectHas(x,
'action') && x.action == 'drop')" or "if !std.objectHas(x,'action') || x.action
!= 'drop'") so metricRelabelings without an explicit action are retained; update
the comprehension in generate-service-monitors.libsonnet where metricRelabelings
is constructed.
🧹 Nitpick comments (1)
pkg/tasks/clustermonitoringoperator.go (1)

143-151: Error messages should reflect plural ServiceMonitors.

The method calls have been updated to handle multiple ServiceMonitors, but the error messages on lines 145 and 150 still use singular "ServiceMonitor". For consistency and clearer debugging:

💡 Suggested fix
 	smscmo, err := t.factory.ClusterMonitoringOperatorServiceMonitors()
 	if err != nil {
-		return fmt.Errorf("initializing Cluster Monitoring Operator ServiceMonitor failed: %w", err)
+		return fmt.Errorf("initializing Cluster Monitoring Operator ServiceMonitors failed: %w", err)
 	}

 	err = t.client.CreateOrUpdateServiceMonitors(ctx, smscmo)
 	if err != nil {
-		return fmt.Errorf("reconciling Cluster Monitoring Operator ServiceMonitor failed: %w", err)
+		return fmt.Errorf("reconciling Cluster Monitoring Operator ServiceMonitors failed: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tasks/clustermonitoringoperator.go` around lines 143 - 151, The error
strings for ClusterMonitoringOperatorServiceMonitors are singular but the code
works with multiple items; update the two fmt.Errorf messages that reference
Cluster Monitoring Operator ServiceMonitor to use the plural "ServiceMonitors"
instead. Specifically, adjust the errors returned in the block where you call
t.factory.ClusterMonitoringOperatorServiceMonitors() and where you call
t.client.CreateOrUpdateServiceMonitors(ctx, smscmo) so both messages read "...
Cluster Monitoring Operator ServiceMonitors failed: %w".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@assets/cluster-monitoring-operator/telemetry-service-monitor.yaml`:
- Around line 12-24: The endpoint block under endpoints (the map containing
bearerTokenFile, metricRelabelings, port: https, scheme: https and tlsConfig) is
missing an explicit interval; add an interval field (e.g., interval: 30s or
another project-consistent value) to this endpoint to ensure it doesn't rely on
Prometheus defaults and matches the other ServiceMonitor entries, keeping the
scrapeClass tls-client-certificate-auth and existing metricRelabelings intact.

In `@assets/control-plane/telemetry-service-monitor-kubelet.yaml`:
- Around line 60-110: The /metrics/probes scrape (path: /metrics/probes) and the
CRI-O scrape (where replacement: crio and targetLabel: job) are using the
kubelet/cAdvisor keep regex in metricRelabelings which does not include prober_*
or crio_* families, so they will yield zero metrics; either remove those two
scrape blocks entirely or update the metricRelabelings regex (the regex under
metricRelabelings / sourceLabels: __name__) to include prober_probe_total and
the crio_* metric family names (e.g., add prober_probe_total|crio_.*) so the
intended probe and CRI-O metrics are allowed through.

In `@pkg/client/client.go`:
- Around line 1691-1698: The helper CreateOrUpdateServiceMonitors currently
returns raw errors and hides which ServiceMonitor failed; update it to wrap the
error with the failing ServiceMonitor identity (at minimum Namespace/Name) when
CreateOrUpdateServiceMonitor(ctx, sm) returns non-nil so callers can see which
monitor failed; use fmt.Errorf or errors.Wrapf to produce a message like
"reconciling ServiceMonitor <namespace>/<name>: %w" referencing the sm variable
inside CreateOrUpdateServiceMonitors.

In `@pkg/tasks/alertmanager.go`:
- Around line 226-231: The teardown currently deletes only the singular
AlertmanagerServiceMonitor while the setup creates a full set via
AlertmanagerServiceMonitors(), which leaves orphaned telemetry/minimal monitors;
update destroy() to obtain the full set by calling
t.factory.AlertmanagerServiceMonitors(), handle the error, and pass that set to
the plural deletion method (e.g., t.client.DeleteServiceMonitors(ctx, smams))
instead of deleting only AlertmanagerServiceMonitor(); ensure any existing
singular Delete call is removed or replaced and error handling/logging matches
the create path.

In `@pkg/tasks/telemeter.go`:
- Around line 190-195: The destroy() path currently deletes only the singular
TelemeterClientServiceMonitor; update destroy() to remove all monitors returned
by t.factory.TelemeterClientServiceMonitors() (the same slice used by
CreateOrUpdateServiceMonitors). Call the client-side removal for the whole slice
(e.g., use t.client.DeleteServiceMonitors(ctx, sms) if such a bulk delete
exists) or iterate over the slice and call the singular delete method for each
monitor, propagating and handling errors accordingly so all ServiceMonitors
created by TelemeterClientServiceMonitors() are cleaned up.

---

Outside diff comments:
In `@jsonnet/utils/generate-service-monitors.libsonnet`:
- Around line 9-12: The current filter "[x for x in e.metricRelabelings if
std.objectHas(x, 'action') && x.action != 'drop']" drops relabelings that omit
the action field (which should default to "replace"); change the predicate to
keep entries that either do not have an action or whose action is not "drop"
(e.g. use "if ! (std.objectHas(x, 'action') && x.action == 'drop')" or "if
!std.objectHas(x,'action') || x.action != 'drop'") so metricRelabelings without
an explicit action are retained; update the comprehension in
generate-service-monitors.libsonnet where metricRelabelings is constructed.

---

Nitpick comments:
In `@pkg/tasks/clustermonitoringoperator.go`:
- Around line 143-151: The error strings for
ClusterMonitoringOperatorServiceMonitors are singular but the code works with
multiple items; update the two fmt.Errorf messages that reference Cluster
Monitoring Operator ServiceMonitor to use the plural "ServiceMonitors" instead.
Specifically, adjust the errors returned in the block where you call
t.factory.ClusterMonitoringOperatorServiceMonitors() and where you call
t.client.CreateOrUpdateServiceMonitors(ctx, smscmo) so both messages read "...
Cluster Monitoring Operator ServiceMonitors failed: %w".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e2d6f304-0fa7-472f-a841-32380cbd9161

📥 Commits

Reviewing files that changed from the base of the PR and between 61407b0 and ffa9b79.

📒 Files selected for processing (35)
  • assets/alertmanager/minimal-service-monitor.yaml
  • assets/alertmanager/telemetry-service-monitor.yaml
  • assets/cluster-monitoring-operator/minimal-service-monitor.yaml
  • assets/cluster-monitoring-operator/telemetry-service-monitor.yaml
  • assets/control-plane/telemetry-service-monitor-kubelet.yaml
  • assets/kube-state-metrics/telemetry-service-monitor.yaml
  • assets/node-exporter/telemetry-service-monitor.yaml
  • assets/openshift-state-metrics/minimal-service-monitor.yaml
  • assets/openshift-state-metrics/telemetry-service-monitor.yaml
  • assets/prometheus-k8s/minimal-service-monitor.yaml
  • assets/prometheus-k8s/telemetry-service-monitor.yaml
  • assets/telemeter-client/minimal-service-monitor.yaml
  • assets/telemeter-client/telemetry-service-monitor.yaml
  • jsonnet/components/alertmanager.libsonnet
  • jsonnet/components/cluster-monitoring-operator.libsonnet
  • jsonnet/components/control-plane.libsonnet
  • jsonnet/components/kube-state-metrics.libsonnet
  • jsonnet/components/node-exporter.libsonnet
  • jsonnet/components/openshift-state-metrics.libsonnet
  • jsonnet/components/prometheus.libsonnet
  • jsonnet/components/telemeter-client.libsonnet
  • jsonnet/utils/configure-authentication-for-monitors.libsonnet
  • jsonnet/utils/generate-service-monitors.libsonnet
  • pkg/client/client.go
  • pkg/manifests/manifests.go
  • pkg/manifests/manifests_test.go
  • pkg/manifests/types.go
  • pkg/tasks/alertmanager.go
  • pkg/tasks/clustermonitoringoperator.go
  • pkg/tasks/controlplane.go
  • pkg/tasks/kubestatemetrics.go
  • pkg/tasks/nodeexporter.go
  • pkg/tasks/openshiftstatemetrics.go
  • pkg/tasks/prometheus.go
  • pkg/tasks/telemeter.go

Comment on lines +12 to +24
endpoints:
- bearerTokenFile: ""
metricRelabelings:
- action: keep
regex: (cluster_monitoring_operator_collection_profile)
sourceLabels:
- __name__
port: https
scheme: https
tlsConfig:
insecureSkipVerify: false
serverName: cluster-monitoring-operator.openshift-monitoring.svc
scrapeClass: tls-client-certificate-auth
Copy link

@coderabbitai coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing interval field in endpoint configuration.

Other telemetry ServiceMonitors in this PR explicitly specify a scrape interval (e.g., 30s for telemeter-client, 2m for openshift-state-metrics). This endpoint lacks an interval field and will use the Prometheus default. Consider adding an explicit interval for consistency.

Proposed fix
 spec:
   endpoints:
   - bearerTokenFile: ""
+    interval: 30s
     metricRelabelings:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
endpoints:
- bearerTokenFile: ""
metricRelabelings:
- action: keep
regex: (cluster_monitoring_operator_collection_profile)
sourceLabels:
- __name__
port: https
scheme: https
tlsConfig:
insecureSkipVerify: false
serverName: cluster-monitoring-operator.openshift-monitoring.svc
scrapeClass: tls-client-certificate-auth
endpoints:
- bearerTokenFile: ""
interval: 30s
metricRelabelings:
- action: keep
regex: (cluster_monitoring_operator_collection_profile)
sourceLabels:
- __name__
port: https
scheme: https
tlsConfig:
insecureSkipVerify: false
serverName: cluster-monitoring-operator.openshift-monitoring.svc
scrapeClass: tls-client-certificate-auth
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@assets/cluster-monitoring-operator/telemetry-service-monitor.yaml` around
lines 12 - 24, The endpoint block under endpoints (the map containing
bearerTokenFile, metricRelabelings, port: https, scheme: https and tlsConfig) is
missing an explicit interval; add an interval field (e.g., interval: 30s or
another project-consistent value) to this endpoint to ensure it doesn't rely on
Prometheus defaults and matches the other ServiceMonitor entries, keeping the
scrapeClass tls-client-certificate-auth and existing metricRelabelings intact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd make sense doing this in a separate PR, as this is pulled from CMO's default SM, which is not affected by this patch.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rexagod, that makes sense — since the telemetry ServiceMonitor inherits from the default CMO ServiceMonitor which doesn't set an explicit interval, aligning them in a follow-up PR that touches the root definition would be the cleaner approach. Noted!

Comment on lines +60 to +110
- bearerTokenFile: ""
honorLabels: true
interval: 30s
metricRelabelings:
- action: keep
regex: (apiserver_current_inflight_requests|apiserver_request_total|apiserver_storage_objects|container_cpu_usage_seconds_total|container_memory_working_set_bytes|kubelet_containers_per_pod_count_sum|kubelet_volume_stats_used_bytes|pv_collector_total_pv_count|selinux_warning_controller_selinux_volume_conflict|volume_manager_selinux_pod_context_mismatch_errors_total|volume_manager_selinux_pod_context_mismatch_warnings_total|volume_manager_selinux_volume_context_mismatch_errors_total|volume_manager_selinux_volume_context_mismatch_warnings_total|volume_manager_selinux_volumes_admitted_total)
sourceLabels:
- __name__
path: /metrics/probes
port: https-metrics
relabelings:
- action: replace
sourceLabels:
- __metrics_path__
targetLabel: metrics_path
scheme: https
scrapeTimeout: 30s
tlsConfig:
caFile: /etc/prometheus/configmaps/kubelet-serving-ca-bundle/ca-bundle.crt
insecureSkipVerify: false
- bearerTokenFile: ""
interval: 30s
metricRelabelings:
- action: keep
regex: (apiserver_current_inflight_requests|apiserver_request_total|apiserver_storage_objects|container_cpu_usage_seconds_total|container_memory_working_set_bytes|kubelet_containers_per_pod_count_sum|kubelet_volume_stats_used_bytes|pv_collector_total_pv_count|selinux_warning_controller_selinux_volume_conflict|volume_manager_selinux_pod_context_mismatch_errors_total|volume_manager_selinux_pod_context_mismatch_warnings_total|volume_manager_selinux_volume_context_mismatch_errors_total|volume_manager_selinux_volume_context_mismatch_warnings_total|volume_manager_selinux_volumes_admitted_total)
sourceLabels:
- __name__
port: https-metrics
relabelings:
- action: keep
regex: (linux|)
sourceLabels:
- __meta_kubernetes_node_label_kubernetes_io_os
- action: replace
regex: (.+)(?::\d+)
replacement: $1:9637
sourceLabels:
- __address__
targetLabel: __address__
- action: replace
replacement: crio
sourceLabels:
- endpoint
targetLabel: endpoint
- action: replace
replacement: crio
targetLabel: job
scheme: https
tlsConfig:
caFile: /etc/prometheus/configmaps/kubelet-serving-ca-bundle/ca-bundle.crt
insecureSkipVerify: false
Copy link

@coderabbitai coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Expected result: both checks below print "False".
python - <<'PY'
from pathlib import Path
import re

text = Path("assets/control-plane/telemetry-service-monitor-kubelet.yaml").read_text()
allow = set()

for pattern in re.findall(r'regex: \(([^)]*)\)', text):
    allow.update(pattern.split('|'))

print("Contains prober/probe metrics:", any(m.startswith(("prober_", "probe_")) for m in allow))
print("Contains crio metrics:", any(m.startswith("crio_") for m in allow))
PY

echo
echo "Kubernetes metrics reference:"
curl -fsSL https://kubernetes.io/docs/reference/instrumentation/metrics/ \
  | grep -nE 'prober_probe_total|prober_probe_duration_seconds' | head

echo
echo "CRI-O metrics guide:"
curl -fsSL https://raw.githubusercontent.com/cri-o/cri-o/main/tutorials/metrics.md \
  | grep -nE '^`crio_' | head

Repository: openshift/cluster-monitoring-operator

Length of output: 50394


Remove the probe and CRI-O telemetry scrapes or add their metric families to the allowlist.

Lines 60-110 define /metrics/probes and a crio endpoint that reuse the kubelet/cAdvisor keep regex. However, that regex contains no prober_* or crio_* entries. Kubernetes documents kubelet probe metrics under prober_probe_total, and CRI-O documents runtime metrics under the crio_* prefix. These two endpoints will retain zero metrics as written. Either remove them from the telemetry profile or add the intended probe and CRI-O metric families to the regex.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@assets/control-plane/telemetry-service-monitor-kubelet.yaml` around lines 60
- 110, The /metrics/probes scrape (path: /metrics/probes) and the CRI-O scrape
(where replacement: crio and targetLabel: job) are using the kubelet/cAdvisor
keep regex in metricRelabelings which does not include prober_* or crio_*
families, so they will yield zero metrics; either remove those two scrape blocks
entirely or update the metricRelabelings regex (the regex under
metricRelabelings / sourceLabels: __name__) to include prober_probe_total and
the crio_* metric family names (e.g., add prober_probe_total|crio_.*) so the
intended probe and CRI-O metrics are allowed through.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be addressed in a separate PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rexagod, understood! Would you like me to open a follow-up GitHub issue to track this — ensuring the /metrics/probes and CRI-O scrape endpoints in assets/control-plane/telemetry-service-monitor-kubelet.yaml have their metricRelabelings regex updated to include the relevant prober_* and crio_* metric families?

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@rexagod: This pull request references MON-4517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Bases minimal monitors on the same specs as the "full" ones.


This is rebased over #2821

Summary by CodeRabbit

  • New Features

  • Added telemetry collection profile and telemetry-specific monitors across monitoring components (alerting, kube-state-metrics, node-exporter, prometheus, telemeter-client, control-plane, etc.).

  • Introduced minimal collection profile variants for many components to reduce metric volume and scope.

  • Tests

  • Updated collection-profile tests to include the new telemetry profile in selection logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/tasks/openshiftstatemetrics.go`:
- Around line 108-116: Update the two error messages to use the plural
"ServiceMonitors" to match the updated methods and other tasks: change the error
returned after calling t.factory.OpenShiftStateMetricsServiceMonitors() and the
error returned after t.client.CreateOrUpdateServiceMonitors(ctx, sms) so both
messages read "... openshift-state-metrics ServiceMonitors failed: %w"
(referencing the functions OpenShiftStateMetricsServiceMonitors and
CreateOrUpdateServiceMonitors to locate the strings).

In `@pkg/tasks/prometheus.go`:
- Around line 406-414: The error messages in the Prometheus ServiceMonitor block
use the singular "ServiceMonitor" but the code deals with multiple resources;
update the fmt.Errorf messages around
t.factory.PrometheusK8sPrometheusServiceMonitors() and
t.client.CreateOrUpdateServiceMonitors(...) to say "ServiceMonitors" (plural) so
they match the resource collection and other files in the PR; locate these in
the function calling t.factory.PrometheusK8sPrometheusServiceMonitors and change
the two returned error strings accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 63e4bc26-42fd-4f09-a856-094ef7d96ebd

📥 Commits

Reviewing files that changed from the base of the PR and between ffa9b79 and a66c22b.

📒 Files selected for processing (11)
  • assets/prometheus-k8s/telemetry-service-monitor.yaml
  • jsonnet/components/prometheus.libsonnet
  • pkg/client/client.go
  • pkg/tasks/alertmanager.go
  • pkg/tasks/clustermonitoringoperator.go
  • pkg/tasks/controlplane.go
  • pkg/tasks/kubestatemetrics.go
  • pkg/tasks/nodeexporter.go
  • pkg/tasks/openshiftstatemetrics.go
  • pkg/tasks/prometheus.go
  • pkg/tasks/telemeter.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/tasks/clustermonitoringoperator.go
  • pkg/client/client.go
  • pkg/tasks/alertmanager.go
  • pkg/tasks/telemeter.go
  • jsonnet/components/prometheus.libsonnet
  • assets/prometheus-k8s/telemetry-service-monitor.yaml

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@rexagod: This pull request references MON-4517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Bases minimal monitors on the same specs as the "full" ones.


This is rebased over #2821

Summary by CodeRabbit

  • New Features

  • Added a new "telemetry" collection profile and telemetry-specific monitors across monitoring components (Alertmanager, Prometheus, kube-state-metrics, node-exporter, telemeter-client, control-plane, openshift-state-metrics).

  • Introduced minimal collection-profile variants for many components to reduce metric volume.

  • Tests

  • Updated collection-profile tests to include the new telemetry profile in selection logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/tasks/alertmanager.go (1)

226-234: Inconsistent error messages: use plural "ServiceMonitors" to match the factory/client calls.

Lines 228 and 233 still reference singular "ServiceMonitor" while the factory and client methods now operate on a slice. The destroy path (lines 388, 393) correctly uses the plural form.

✏️ Suggested fix for consistency
 	smams, err := t.factory.AlertmanagerServiceMonitors()
 	if err != nil {
-		return fmt.Errorf("initializing Alertmanager ServiceMonitor failed: %w", err)
+		return fmt.Errorf("initializing Alertmanager ServiceMonitors failed: %w", err)
 	}

 	err = t.client.CreateOrUpdateServiceMonitors(ctx, smams)
 	if err != nil {
-		return fmt.Errorf("reconciling Alertmanager ServiceMonitor failed: %w", err)
+		return fmt.Errorf("reconciling Alertmanager ServiceMonitors failed: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tasks/alertmanager.go` around lines 226 - 234, The error messages use
singular "ServiceMonitor" but the factory and client methods operate on slices;
update the fmt.Errorf messages in the Alertmanager reconciliation path so they
use the plural "ServiceMonitors". Specifically, change the error text around
calls to t.factory.AlertmanagerServiceMonitors() and
t.client.CreateOrUpdateServiceMonitors(ctx, smams) to read "initializing
Alertmanager ServiceMonitors failed" and "reconciling Alertmanager
ServiceMonitors failed" respectively so they match the method names and the
destroy-path wording.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/tasks/alertmanager.go`:
- Around line 226-234: The error messages use singular "ServiceMonitor" but the
factory and client methods operate on slices; update the fmt.Errorf messages in
the Alertmanager reconciliation path so they use the plural "ServiceMonitors".
Specifically, change the error text around calls to
t.factory.AlertmanagerServiceMonitors() and
t.client.CreateOrUpdateServiceMonitors(ctx, smams) to read "initializing
Alertmanager ServiceMonitors failed" and "reconciling Alertmanager
ServiceMonitors failed" respectively so they match the method names and the
destroy-path wording.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3aa12788-4f55-4f4c-86b0-904bf906b829

📥 Commits

Reviewing files that changed from the base of the PR and between a66c22b and b826e66.

📒 Files selected for processing (3)
  • pkg/client/client.go
  • pkg/tasks/alertmanager.go
  • pkg/tasks/telemeter.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/client/client.go

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@rexagod: This pull request references MON-4517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Bases minimal monitors on the same specs as the "full" ones.


This is rebased over #2821

Summary by CodeRabbit

  • New Features

  • Added a new "telemetry" collection profile and telemetry-specific monitors across core monitoring components (Alertmanager, Prometheus, kube-state-metrics, node-exporter, telemeter-client, control-plane, openshift-state-metrics).

  • Introduced minimal collection-profile variants for many components to reduce metric volume and TLS-authenticated telemetry endpoints.

  • Tests

  • Updated collection-profile tests to include the new telemetry profile in selection logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@rexagod: This pull request references MON-4517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target either version "4.22." or "openshift-4.22.", but it targets "openshift-5.0" instead.

Details

In response to this:

Bases minimal monitors on the same specs as the "full" ones.


This is rebased over #2821

Summary by CodeRabbit

  • New Features

  • Added a new "telemetry" collection profile and telemetry-specific ServiceMonitors across core monitoring components (Alertmanager, Prometheus, kube-state-metrics, node-exporter, telemeter-client, control-plane, openshift-state-metrics).

  • Introduced "minimal" collection-profile ServiceMonitors for many components to reduce metric volume and provide TLS-authenticated scrape endpoints.

  • Tests

  • Updated collection-profile tests to include the new telemetry profile in selection logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2026

@rexagod: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-techpreview 49786bf link true /test e2e-aws-ovn-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


// Returns a copy of the input service monitor with the provided list of metrics.
// The metrics parameter is an array of metric names (e.g., ["metric1", "metric2", "metric3"]).
// This function removes existing "drop" metricRelabelings before adding the keep filter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) can we explicit the motivation for removing the drop action(s)? I believe that it's for optimization purposes (no need to drop if we know that we have a keep-only strategy after).


func (f *Factory) AlertmanagerServiceMonitors() ([]*monv1.ServiceMonitor, error) {
return serviceMonitors(
f.AlertmanagerServiceMonitor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to get rid of the f.Alertmanager*ServiceMonitor() methods and pass file paths (e.g. AlertmanagerServiceMonitor, ...) directly to serviceMonitors().

func serviceMonitors(fullServiceMonitor, minimalServiceMonitor, telemetryServiceMonitor func() (*monv1.ServiceMonitor, error)) ([]*monv1.ServiceMonitor, error) {
var sms []*monv1.ServiceMonitor

if fullServiceMonitor != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't check for nil: our contract is that when supporting collection profiles, a component must implement 1 service monitor for each.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants