-
Notifications
You must be signed in to change notification settings - Fork 521
MON-2692: Reword and update KEP #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MON-2692: Reword and update KEP #1791
Conversation
559e546
to
34d451e
Compare
/cc @JoaoBraveCoding Requesting a review here. If things look good to you, I'll request the API folks to take a look. 🙂 |
4670400
to
577a948
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thank you for resuming this work 🙌
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
data: | ||
config.yaml: | | ||
prometheusK8s: | ||
collectionProfile: full |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum values should be PascalCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'll send out a patch to support camel and pascal cases (instead of just the former), and deprecate it in a later release. Does that sound good?
Also, for future references, is there a guide that outlines such practices that we follow across OpenShift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general k8s API convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can get it updated to support both, but document only the PascalCase versions going forward that's the best approach, SGTM
And yep, an upstream K8s convention
// metrics that are exposed by the platform components. In the `minimal` | ||
// profile, Prometheus only collects metrics necessary for the default | ||
// platform alerts, recording rules, telemetry and console dashboards. | ||
CollectionProfile CollectionProfile `json:"collectionProfile,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field required or optional?
What happens when it is not set?
How does the upgrade work for existing clusters, is there any action needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field required or optional?
What happens when it is not set?
The field's optional, when unset, the operator behaves the same way as it did before this change (which is also exactly same as the full
collection profile in all respects).
How does the upgrade work for existing clusters, is there any action needed?
Upon upgrading, setting this field to full
has no change in the behavior of the operator compared to as it was before.
However, setting this to minimal
will, in addition to kubelet
, etcd
, kube-state-metrics
, and node-exporter
, prompt the operator to look for any similar "minimal" marked service or pod monitors and apply those targets only.
Besides setting the field itself, there's no action needed. OOTB the field will be unset which has the same implications as the full
profile, i.e., the same behavior as earlier (all targets are discovered).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explaining in the godoc what the behaviours are when it's unset would be helpful for both our generated docs on the main docs site, and also for those using oc explain
to understand their APIs
|
||
OpenShift teams can decide if they want to adopt this feature. Without any | ||
change to a monitor, if a user picks a profile in the CMO config, things | ||
will work as they did before. When an OpenShift team wants to implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide later to add an additional profile, how would that impact existing teams? What would you have to do before you could introduce the new profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Profiles need service or pod monitors to accompany them into transitioning the cluster's metrics targets' scope to the desired one.
If a third profile is planned, the monitoring team will need to make sure we have the adequate set of monitors that will be deployed to create as much of a complete base experience as possible, as expected from that profile, before introducing it. These monitors will need to be created by the component owners.
Once it goes live, all teams that initially created (and other that will do so later on) monitors for that profile will be able to support it for their workloads should the cluster admin choose to use that profile.
As such, teams that do not have monitors for the newer profile will have their metric targets excluded, until they deploy a corresponding monitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is useful context and something you may want to capture in a "What is required when we expand the profiles in the future" heading or something similar
It may also be worth you creating something in origin that checks that every monitor that defines a profile, has a mirror in the payload for every profile that you support. That way, when you do add another, you can update the test, add exceptions, and then work from that list to remove all of the exceptions that are missing the new profile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, a historical profile-monitor mapping will certainly prove useful. I'm not sure how this data will be preserved/persisted as a source of truth between runs, though (is there a similar case I can look at?).
Also, could you please elaborate a bit on what you meant by "a mirror" in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By mirror, I meant one of the metrics collection resources for each of the profiles.
And I don't think you need to have data stored across runs, I think you just need to have a test that is aware of all profiles, and, when you update the test, it would check that if the component defines one profile, it defines all profiles. It will only fail because you deliberately updated it to include the new profile, and when you do that, you'll get a list of all those that have defined the previous profiles, because they won't be defining the new profile (most likely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pointer, this makes a lot of sense.
I think at this point we were the only ones in OpenShift shipping these profile-specific monitors out, so the idea of making sure they exist between payloads was missed by me, but this will definitely be helpful, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a use-case for a utility I made back in the day, for various operations linked to these profiles (read, a MetricsCollectionProfilesCTL-esque CLI).
data: | ||
config.yaml: | | ||
prometheusK8s: | ||
collectionProfile: full |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general k8s API convention.
9d6a0a2
to
0bf8482
Compare
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@rexagod: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
7e4f324
to
eb5b85b
Compare
Also opened openshift/cluster-monitoring-operator#2613 for CMO-side changes. |
Pinging @JoelSpeed for another look here. |
- `full` (same as today) | ||
- `minimal` (only collect metrics necessary for recording rules, alerts, | ||
dashboards, HPA, VPA and telemetry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we've discussed changing these to Full
and Minimal
to match K8s conventions, can we fix the EP to represent the PascalCase versions?
|
||
### Open Questions | ||
|
||
## Test Plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So perhaps we add here
E2E test that ensures that for every monitor that is labelled as `Full` collection profile, there also exists one for `Minimal, and vice versa
Though I'm not sure how you'd actually work out that pairing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe rexagod/cpv
can help us with that. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be integrated into the testing plan?
57608ea
to
5901987
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no further feedback, I think we will probably want some adjustments to the APi when we come to making it first party rather than configmap based, but that can be discussed at that point
5901987
to
c13077f
Compare
Re-opening the KEP PR to backfill on the required proposal context. This commit squashes over 25 previous ones from its predecessor. Signed-off-by: Pranshu Srivastava <[email protected]>
c13077f
to
8257f54
Compare
Squashed. |
/cc @simonpasquier |
Pinging @simonpasquier for an LGTM here (if all looks good) 🙂 |
Re-ping @simonpasquier for a look here 🙇🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve Letting @jan--f the opportunity to review it once more. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoaoBraveCoding, simonpasquier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
MetricsCollectionProfiles
: Reword and update KEP
@rexagod: This pull request references MON-2692 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@rexagod: This pull request references MON-2692 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
(bump) |
/hold cancel |
@rexagod: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Re-opening the KEP PR to backfill on the required proposal context.
Signed-off-by: Pranshu Srivastava [email protected]
Continues: #1298