MON-2692: Reword and update KEP #1791

rexagod · 2025-05-07T07:32:38Z

Re-opening the KEP PR to backfill on the required proposal context.

Signed-off-by: Pranshu Srivastava [email protected]

Continues: #1298

rexagod · 2025-05-08T08:52:56Z

/cc @JoaoBraveCoding

Requesting a review here. If things look good to you, I'll request the API folks to take a look. 🙂

JoaoBraveCoding

LGTM 👍 Thank you for resuming this work 🙌

openshift-bot · 2025-06-11T09:16:00Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

JoelSpeed · 2025-06-11T12:37:08Z

enhancements/monitoring/metrics-collection-profiles.md

+data:
+  config.yaml: |
+    prometheusK8s:
+      collectionProfile: full


Enum values should be PascalCase

I see. I'll send out a patch to support camel and pascal cases (instead of just the former), and deprecate it in a later release. Does that sound good?

Also, for future references, is there a guide that outlines such practices that we follow across OpenShift?

This is a general k8s API convention.

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#constants

If we can get it updated to support both, but document only the PascalCase versions going forward that's the best approach, SGTM

And yep, an upstream K8s convention

JoelSpeed · 2025-06-11T12:38:48Z

enhancements/monitoring/metrics-collection-profiles.md

+	// metrics that are exposed by the platform components. In the `minimal`
+	// profile, Prometheus only collects metrics necessary for the default
+	// platform alerts, recording rules, telemetry and console dashboards.
+	CollectionProfile CollectionProfile `json:"collectionProfile,omitempty"`


Is this field required or optional?

What happens when it is not set?

How does the upgrade work for existing clusters, is there any action needed?

Is this field required or optional?
What happens when it is not set?

The field's optional, when unset, the operator behaves the same way as it did before this change (which is also exactly same as the full collection profile in all respects).

How does the upgrade work for existing clusters, is there any action needed?

Upon upgrading, setting this field to full has no change in the behavior of the operator compared to as it was before.

However, setting this to minimal will, in addition to kubelet, etcd, kube-state-metrics, and node-exporter, prompt the operator to look for any similar "minimal" marked service or pod monitors and apply those targets only.

Besides setting the field itself, there's no action needed. OOTB the field will be unset which has the same implications as the full profile, i.e., the same behavior as earlier (all targets are discovered).

Explaining in the godoc what the behaviours are when it's unset would be helpful for both our generated docs on the main docs site, and also for those using oc explain to understand their APIs

enhancements/monitoring/metrics-collection-profiles.md

JoelSpeed · 2025-06-11T12:46:12Z

enhancements/monitoring/metrics-collection-profiles.md

+
+OpenShift teams can decide if they want to adopt this feature. Without any
+change to a monitor, if a user picks a profile in the CMO config, things
+will work as they did before. When an OpenShift team wants to implement


If you decide later to add an additional profile, how would that impact existing teams? What would you have to do before you could introduce the new profile?

Profiles need service or pod monitors to accompany them into transitioning the cluster's metrics targets' scope to the desired one.

If a third profile is planned, the monitoring team will need to make sure we have the adequate set of monitors that will be deployed to create as much of a complete base experience as possible, as expected from that profile, before introducing it. These monitors will need to be created by the component owners.

Once it goes live, all teams that initially created (and other that will do so later on) monitors for that profile will be able to support it for their workloads should the cluster admin choose to use that profile.

As such, teams that do not have monitors for the newer profile will have their metric targets excluded, until they deploy a corresponding monitor.

This is useful context and something you may want to capture in a "What is required when we expand the profiles in the future" heading or something similar

It may also be worth you creating something in origin that checks that every monitor that defines a profile, has a mirror in the payload for every profile that you support. That way, when you do add another, you can update the test, add exceptions, and then work from that list to remove all of the exceptions that are missing the new profile

I see, a historical profile-monitor mapping will certainly prove useful. I'm not sure how this data will be preserved/persisted as a source of truth between runs, though (is there a similar case I can look at?).

Also, could you please elaborate a bit on what you meant by "a mirror" in this context?

By mirror, I meant one of the metrics collection resources for each of the profiles.

And I don't think you need to have data stored across runs, I think you just need to have a test that is aware of all profiles, and, when you update the test, it would check that if the component defines one profile, it defines all profiles. It will only fail because you deliberately updated it to include the new profile, and when you do that, you'll get a list of all those that have defined the previous profiles, because they won't be defining the new profile (most likely)

Thank you for the pointer, this makes a lot of sense.

I think at this point we were the only ones in OpenShift shipping these profile-specific monitors out, so the idea of making sure they exist between payloads was missed by me, but this will definitely be helpful, will do!

This might be a use-case for a utility I made back in the day, for various operations linked to these profiles (read, a MetricsCollectionProfilesCTL-esque CLI).

enhancements/monitoring/metrics-collection-profiles.md

simonpasquier · 2025-06-17T07:43:11Z

enhancements/monitoring/metrics-collection-profiles.md

+data:
+  config.yaml: |
+    prometheusK8s:
+      collectionProfile: full


This is a general k8s API convention.

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#constants

enhancements/monitoring/metrics-collection-profiles.md

openshift-bot · 2025-06-25T00:45:10Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2025-07-02T08:15:31Z

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2025-07-02T08:15:43Z

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rexagod · 2025-07-02T13:12:45Z

/reopen
/remove-lifecycle rotten

openshift-ci · 2025-07-02T13:13:12Z

@rexagod: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rexagod · 2025-07-02T13:16:04Z

Also opened openshift/cluster-monitoring-operator#2613 for CMO-side changes.

rexagod · 2025-07-14T07:31:13Z

Pinging @JoelSpeed for another look here.

JoelSpeed · 2025-07-23T11:23:52Z

enhancements/monitoring/metrics-collection-profiles.md

+- `full` (same as today)
+- `minimal` (only collect metrics necessary for recording rules, alerts,
+  dashboards, HPA, VPA and telemetry)


Given we've discussed changing these to Full and Minimal to match K8s conventions, can we fix the EP to represent the PascalCase versions?

JoelSpeed · 2025-07-23T11:29:11Z

enhancements/monitoring/metrics-collection-profiles.md

+
+### Open Questions
+
+## Test Plan


So perhaps we add here

E2E test that ensures that for every monitor that is labelled as `Full` collection profile, there also exists one for `Minimal, and vice versa

Though I'm not sure how you'd actually work out that pairing?

I believe rexagod/cpv can help us with that. :)

Could this be integrated into the testing plan?

JoelSpeed

I have no further feedback, I think we will probably want some adjustments to the APi when we come to making it first party rather than configmap based, but that can be discussed at that point

Re-opening the KEP PR to backfill on the required proposal context. This commit squashes over 25 previous ones from its predecessor. Signed-off-by: Pranshu Srivastava <[email protected]>

rexagod · 2025-08-12T08:40:12Z

Squashed.

rexagod · 2025-08-13T11:30:10Z

/cc @simonpasquier

rexagod · 2025-08-20T08:51:04Z

Pinging @simonpasquier for an LGTM here (if all looks good) 🙂

rexagod · 2025-08-31T15:10:02Z

Re-ping @simonpasquier for a look here 🙇🏼

simonpasquier

/lgtm

simonpasquier · 2025-09-01T15:47:42Z

/approve
/hold

Letting @jan--f the opportunity to review it once more.

openshift-ci · 2025-09-01T15:47:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoaoBraveCoding, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~enhancements/monitoring/OWNERS~~ [simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-09-02T13:12:24Z

@rexagod: This pull request references MON-2692 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Re-opening the KEP PR to backfill on the required proposal context.

Signed-off-by: Pranshu Srivastava [email protected]

Continues: #1298

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

rexagod · 2025-09-02T13:13:58Z

/jira refresh

openshift-ci-robot · 2025-09-02T13:14:01Z

@rexagod: This pull request references MON-2692 which is a valid jira issue.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

rexagod · 2025-09-10T13:21:30Z

(bump)

simonpasquier · 2025-09-10T15:21:47Z

/hold cancel

openshift-ci · 2025-09-10T15:38:41Z

@rexagod: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from jan--f and simonpasquier May 7, 2025 07:33

rexagod force-pushed the metrics-collection-profiles branch from 559e546 to 34d451e Compare May 8, 2025 08:06

openshift-ci bot requested a review from JoaoBraveCoding May 8, 2025 08:52

rexagod force-pushed the metrics-collection-profiles branch 3 times, most recently from 4670400 to 577a948 Compare May 12, 2025 08:34

JoaoBraveCoding approved these changes May 14, 2025

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2025

JoelSpeed reviewed Jun 11, 2025

View reviewed changes

simonpasquier reviewed Jun 17, 2025

View reviewed changes

rexagod force-pushed the metrics-collection-profiles branch from 9d6a0a2 to 0bf8482 Compare June 17, 2025 10:55

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 25, 2025

openshift-ci bot closed this Jul 2, 2025

openshift-ci bot reopened this Jul 2, 2025

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 2, 2025

rexagod force-pushed the metrics-collection-profiles branch from 7e4f324 to eb5b85b Compare July 2, 2025 13:15

rexagod requested a review from JoelSpeed July 9, 2025 07:45

JoelSpeed reviewed Jul 23, 2025

View reviewed changes

rexagod force-pushed the metrics-collection-profiles branch from 57608ea to 5901987 Compare July 29, 2025 23:11

rexagod requested a review from JoelSpeed July 29, 2025 23:11

JoelSpeed reviewed Aug 6, 2025

View reviewed changes

rexagod force-pushed the metrics-collection-profiles branch from 5901987 to c13077f Compare August 12, 2025 08:16

MetricsCollectionProfiles: Reword and update KEP

8257f54

Re-opening the KEP PR to backfill on the required proposal context. This commit squashes over 25 previous ones from its predecessor. Signed-off-by: Pranshu Srivastava <[email protected]>

rexagod force-pushed the metrics-collection-profiles branch from c13077f to 8257f54 Compare August 12, 2025 08:39

rexagod requested a review from JoelSpeed August 12, 2025 08:40

openshift-ci bot requested a review from simonpasquier August 13, 2025 11:30

simonpasquier reviewed Sep 1, 2025

View reviewed changes

openshift-ci bot assigned simonpasquier Sep 1, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 1, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 1, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 1, 2025

rexagod changed the title ~~MetricsCollectionProfiles: Reword and update KEP~~ MON-2692: Reword and update KEP Sep 2, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 2, 2025

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 10, 2025

openshift-merge-bot bot merged commit 59eed35 into openshift:master Sep 10, 2025
2 checks passed

rexagod mentioned this pull request Sep 10, 2025

MON-4367: Address Collection Profiles EP reviews openshift/cluster-monitoring-operator#2613

Draft

7 tasks

MON-2692: Reword and update KEP #1791

MON-2692: Reword and update KEP #1791

Uh oh!

Conversation

rexagod commented May 7, 2025

Uh oh!

rexagod commented May 8, 2025

Uh oh!

JoaoBraveCoding left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-bot commented Jun 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rexagod Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rexagod Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rexagod Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rexagod Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-bot commented Jun 25, 2025

Uh oh!

openshift-bot commented Jul 2, 2025

Uh oh!

openshift-ci bot commented Jul 2, 2025

Uh oh!

rexagod commented Jul 2, 2025

Uh oh!

openshift-ci bot commented Jul 2, 2025

Uh oh!

rexagod commented Jul 2, 2025

Uh oh!

rexagod commented Jul 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

rexagod Jun 16, 2025 •

edited

Loading

rexagod Jun 16, 2025 •

edited

Loading

rexagod Jun 16, 2025 •

edited

Loading

rexagod Jul 2, 2025 •

edited

Loading

openshift-ci-robot commented Sep 2, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Sep 2, 2025 •

edited by openshift-ci bot

Loading