Enhancement proposal for new APIs to support alerts management UI #1822

sradco · 2025-07-24T15:49:59Z

This PR includes the enhancement proposal for a new Alerts Management UI.

openshift-ci · 2025-07-24T15:56:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jan--f for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

enhancements/monitoring/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sradco · 2025-07-24T18:20:20Z

@machadovilaca @avlitman @nunnatsa please review this enhancement proposal.

enhancements/monitoring/alerts-ui-management.md

simonpasquier

Reading the proposal, it's not clear to me:

whether we need a new supported API or only UI improvements are required.
what are the gaps in the existing implementation that we want to fix (e.g. AlertingRule & AlertRelabel CRDs already exist but what make them hard to use?).

enhancements/monitoring/alerts-ui-management.md

simonpasquier · 2025-07-28T13:37:54Z

enhancements/monitoring/alerts-ui-management.md

+
+## Goals
+1. CRUD operations on user‑defined alerts via UI and API.
+2. Clone platform or user alerts to customize thresholds or scopes.


Do we need to support customization for user-defined alerting rules? Right now, alert customization via AlertingRule is only meant for platform alerting rules.

We need to support user-defined alerting rules and create the Prometheusrules same as its already possible today in the CLI

But by support we mean I mean the creation, and lifecycle management, but not faulty expressions/summary/descriptions or any other user defined detail .

We need to support user-defined alerting rules and create the Prometheusrules same as its already possible today in the CLI

you mean applying manifests with oc?

We would wrap the existing API(oc) that allows to create the alert rules, yes.
We want to keep APIs with same prefix as the new planned APIS, for clear definition for the UI.

I'm not following your last comment, sorry. Which APIs are we talking about? For me CRDs are also APIs.

I mean we would wrap only the API for creating the new user defined alerts.
Other APIs require more complicated logic, since they will handle both User alerts and alert rules and platform alerts .

simonpasquier · 2025-07-28T13:43:37Z

enhancements/monitoring/alerts-ui-management.md

+1. CRUD operations on user‑defined alerts via UI and API.
+2. Clone platform or user alerts to customize thresholds or scopes.
+3. Disable/enable alerts by creating/updating entries in the `AlertRelabelConfig` CR.
+4. Create/manage silences in Alertmanager and reflect this in the UI.


What are the gaps in the current product? It already supports silence management in the console. The same remark applies to the following bullet points.

This effort will integrate the silences as part of the overall feature and will use the existing silences APIs, UI may require some updates for a unified view and management.

See my comment above, I still don't understand what's missing in the console wrt silences.

enhancements/monitoring/alerts-ui-management.md

sradco · 2025-07-28T15:40:27Z

Reading the proposal, it's not clear to me:

whether we need a new supported API or only UI improvements are required.
what are the gaps in the existing implementation that we want to fix (e.g. AlertingRule & AlertRelabel CRDs already exist but what make them hard to use?).

@simonpasquier We need REST APIs to support adding a UI for managing alerts.
There are no gaps for doing this plainly with CLI, but the gap is of missing REST APIs to do this through the UI.

enhancements/monitoring/alerts-ui-management.md

simonpasquier · 2025-07-30T10:03:19Z

@simonpasquier We need REST APIs to support adding a UI for managing alerts.
There are no gaps for doing this plainly with CLI, but the gap is of missing REST APIs to do this through the UI.

I feel that we're taking the problem backwards and start from the implementation rather than user experience. As an OpenShift user/cluster admin, do I need a REST API? Or is the main problem that the UI displays information which is inconsistent currently?

enhancements/monitoring/alerts-ui-management.md

simonpasquier · 2025-08-06T14:58:45Z

enhancements/monitoring/alerts-ui-management.md

+   so that I can quickly mute a noisy alert or restore it once the underlying issue is resolved.
+
+3. **Clone and customize platform alerts**
+   As a cluster admin, I want to clone an existing built‑in alert, adjust its threshold and severity, and save it as a user‑defined rule,


and save it as a user‑defined rule

IMHO this part isn't needed. The use case here is that you want to "replace" the definition of an built-in alerting rule because it doesn't fit your environment and that the new rule behaves as another platform alerting rule.

I would also recommend concrete examples (e.g. "by default the NodeNetworkReceiveErrs alert fires when there's more than 1% of received packets in error but I want the threshold to be 2%" and "I want the SamplesDegraded alerting rule's severity to be info rather than warning").

Not necessarily. Sometimes users will clone and add another similar alert and keep the existing alert.
For example if you want to fire one alert for warning in 70% and another for critical at 90%.
Or just use the source alert expression and details as a reference for your new alert and you want to inspect the expression.
We will not disable the source alert as part of the clone process, users are expected to do that.

Agreed but I would separate the use cases for clarity.

Replace an existing platform alerting rule by a custom definition.

Create a new platform alerting rule from an existing platform alerting rule.

Create a new user-defined alerting rule from a plaform or user-defined alerting rule.

@simonpasquier How can a user create a new "Platform alerting rule"? (line 2)

see https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/monitoring/managing-alerts#creating-new-alerting-rules_managing-alerts-as-an-administrator

Create a new platform alerting rule from an existing platform alerting rule.

The link you provided is about creating alerting rules out of platform metrics, not alerts.

And below it, it says for updating platform alerts:

"As a cluster administrator, you can modify core platform alerts before Alertmanager routes them to a receiver. For example, you can change the severity label of an alert, add a custom label, or exclude an alert from being sent to Alertmanager."

"You must create the AlertRelabelConfig object in the openshift-monitoring namespace. Otherwise, the alert label will not change."

enhancements/monitoring/alerts-ui-management.md

simonpasquier · 2025-08-06T15:22:18Z

enhancements/monitoring/alerts-ui-management.md

+   so that I can tailor default monitoring to suit my team’s SLAs without modifying upstream operators.
+
+4. **Create alert from a metric query**
+   As a developer, I want to write or paste a PromQL expression in a “Create Alert” form, specify duration and severity, and save it,


What if you could create an alerting rule from the Metrics > Observe page?

Not in scope of this effort, but possible later on I assume.

I think that it would simplify the workflow a lot: you use existing data to tune the expression and then you turn it into a rule.

@simonpasquier We can simplify the process a bit by not allowing any updates to Platform alerts and always assume updates require creating a new alert and disabling the Platform one.
This could be a valid alternative that we can consider.

So that we only use the alertRelabelConfig to disable/enable alerts and to set Group and Component labels. WDYT?

enhancements/monitoring/alerts-ui-management.md

simonpasquier · 2025-08-06T15:26:35Z

enhancements/monitoring/alerts-ui-management.md

+
+## Goals
+1. CRUD operations on user‑defined alerts via UI and API.
+2. Clone platform or user alerts to customize thresholds or scopes.


I'm not following your last comment, sorry. Which APIs are we talking about? For me CRDs are also APIs.

simonpasquier · 2025-08-06T15:27:26Z

enhancements/monitoring/alerts-ui-management.md

+1. CRUD operations on user‑defined alerts via UI and API.
+2. Clone platform or user alerts to customize thresholds or scopes.
+3. Disable/enable alerts by creating/updating entries in the `AlertRelabelConfig` CR.
+4. Create/manage silences in Alertmanager and reflect this in the UI.


See my comment above, I still don't understand what's missing in the console wrt silences.

simonpasquier · 2025-08-14T15:09:06Z

enhancements/monitoring/alerts-ui-management.md

+   so that I can quickly mute a noisy alert or restore it once the underlying issue is resolved.
+
+3. **Clone and customize platform alerts**
+   As a cluster admin, I want to clone an existing built‑in alert, adjust its threshold and severity, and save it as a user‑defined rule,


Agreed but I would separate the use cases for clarity.

Replace an existing platform alerting rule by a custom definition.

Create a new platform alerting rule from an existing platform alerting rule.

Create a new user-defined alerting rule from a plaform or user-defined alerting rule.

simonpasquier · 2025-08-14T15:16:19Z

enhancements/monitoring/alerts-ui-management.md

+   so that I can tailor default monitoring to suit my team’s SLAs without modifying upstream operators.
+
+4. **Create alert from a metric query**
+   As a developer, I want to write or paste a PromQL expression in a “Create Alert” form, specify duration and severity, and save it,


I think that it would simplify the workflow a lot: you use existing data to tune the expression and then you turn it into a rule.

enhancements/monitoring/alerts-ui-management.md

sradco · 2025-08-18T14:16:08Z

enhancements/monitoring/alerts-ui-management.md

+  - "@jan--f"
+  - "@jgbernalp"
+api-approvers:
+  - TBD


@simonpasquier who can be the api approvers?

The api-approvers field is meant for custom resource definitions which we're not going to add/modify IIUC.
If we implement a new REST API, I assume that the approvers depend on where's going to live.

We plan to add the new REST APIs to CMO

CMO is a single replica deployment so not a good fit for an API server since it won't be highly available.

Additionally that intention should perhaps be mentioned in this proposal.

CMO is a single replica deployment so not a good fit for an API server since it won't be highly available.

@simonpasquier, @jan--f What do you propose?
When we discussed this with @jgbernalp I believe that we agreed this would be part of CMO.

Additionally that intention should perhaps be mentioned in this proposal.
@jan--f I updated the proposal. Please see if this is better.

sradco · 2025-08-20T13:30:10Z

Hi @jan--f, @jgbernalp, I would appreciate your review of this proposal

jan--f

I think improving how we deal with alerts in the UI is a great effort. We could do with a more unified and bulk friendly approach there. Some mockups for this would be helpful to illustrate what is being proposed (and where the current implementation falls short).

The proposed REST API doesn't make sense to me. Iiuc we already have the k8s API that covers all the the proposed functionality. The motivation section mentions While it's possible to customize built-in alerting rules with the AlertingRule+AlertRelabelConfig CRDs, the process is cumbersome and error-prone. It requires creating YAML manifests manually and there's no practical way to verify the correctness of the configuration. however this proposal doesn't demonstrate how to pass data to the new proposed API nor does it outline how correctness should be verified a priori.

simonpasquier · 2025-08-27T12:31:26Z

enhancements/monitoring/alerts-ui-management.md

+# New APIs to support Alerts UI Management in OpenShift
+
+## Summary
+Provide a user‑friendly UI and REST API for defining, viewing, editing, disabling/enabling and silencing Prometheus alerts without manual YAML edits, reducing alert fatigue and improving operational efficiency.


IMHO the summary needs to be revised, it mixes up the what and the how.

Suggested change

Provide a user‑friendly UI and REST API for defining, viewing, editing, disabling/enabling and silencing Prometheus alerts without manual YAML edits, reducing alert fatigue and improving operational efficiency.

Improve alert management in the OCP console to expose a unified view of the alerting rules (and associated alerts), provide flexible filtering capabilities and allow rules management without manual YAML edits.

simonpasquier · 2025-08-27T12:32:10Z

enhancements/monitoring/alerts-ui-management.md

+
+## Summary
+Provide a user‑friendly UI and REST API for defining, viewing, editing, disabling/enabling and silencing Prometheus alerts without manual YAML edits, reducing alert fatigue and improving operational efficiency.
+Platform alerts will be overriden leveraging the  `AlertRelabelConfig` CR (in `cluster-monitoring-config`) rather than editing their original AlertingRules in the `PrometheusRule`.


IMHO it doesn't fit in the summary but later. Also there's no AlertingRules.

simonpasquier · 2025-08-27T12:33:57Z

enhancements/monitoring/alerts-ui-management.md

+Platform alerts will be overriden leveraging the  `AlertRelabelConfig` CR (in `cluster-monitoring-config`) rather than editing their original AlertingRules in the `PrometheusRule`.
+
+## Motivation
+- While it's possible to customize built-in alerting rules with the `AlertingRule` + `AlertRelabelConfig` CRDs, the process is cumbersome and error-prone. It requires creating YAML manifests manually and there's no practical way to verify the correctness of the configuration.


Suggested change

- While it's possible to customize built-in alerting rules with the `AlertingRule` + `AlertRelabelConfig` CRDs, the process is cumbersome and error-prone. It requires creating YAML manifests manually and there's no practical way to verify the correctness of the configuration.

- While it's possible to customize built-in alerting rules with the `AlertingRule` + `AlertRelabelConfig` CRDs, the process is cumbersome and error-prone. It requires creating YAML manifests manually and there's no practical way to verify the correctness of the configuration. Built-in alerting rules and alerts are still visible in the OCP console after they've been overridden .

simonpasquier · 2025-08-27T12:34:19Z

enhancements/monitoring/alerts-ui-management.md

+
+## Motivation
+- While it's possible to customize built-in alerting rules with the `AlertingRule` + `AlertRelabelConfig` CRDs, the process is cumbersome and error-prone. It requires creating YAML manifests manually and there's no practical way to verify the correctness of the configuration.
+- Some operational teams prefer an interactive console and API to manage alerts safely, guided by best practices.


Suggested change

- Some operational teams prefer an interactive console and API to manage alerts safely, guided by best practices.

- Some operational teams prefer an interactive console and API to manage alerts.

Updated. Thanks.

simonpasquier · 2025-08-27T12:35:58Z

enhancements/monitoring/alerts-ui-management.md

+
+### User Stories
+
+1. **Bulk disable during maintenance**


I think that I commented before but "disable during maintenance" is a wrong use case. Otherwise it's similar to the next user story.

True. I thought I updated it. Sorry.

simonpasquier · 2025-08-27T12:49:50Z

enhancements/monitoring/alerts-ui-management.md

+- **Alerts View**: show current firing/pending instances, silence status, relabel context
+- **Silencing Panel**: define matchers, duration, comment - Keep
+
+### 3. Data Model


this part doesn't make sense to me.

Sorry, this was already updates in the design and not reflected here but in the UX doc.

simonpasquier · 2025-08-27T12:51:37Z

enhancements/monitoring/alerts-ui-management.md

+
+Rule identity
+
+- A rule is uniquely identified by the tuple: `(namespace, prometheusrule, ruleName, severity)`.


this assumption is wrong IMHO: I can find examples where 2 different rules would have the tuple.

Can you please expend on this? I dont think there should be the same alert rule, with the same severity.

Note: The namespace and prometheus where added and must be specified, so they will identify where it is saved to.

One example is KubeAPIErrorBudgetBurn which has 2 alert definitions for each severity level (the distinction is based on the long and short labels).

@simonpasquier oh, this is bad. Thank you for pointing this out!
I was not aware this was possible.
Ill go back and review this.

the primary key for a rule is composed of

the namespace &name of the PrometheusRule resource

the rule group

the alert or record name

the label name/value pairs

Hi @simonpasquier , I updated based on your suggestion.
I think group is not needed, but if its a must I this we can add it to the path.

enhancements/monitoring/alerts-ui-management.md

simonpasquier · 2025-08-27T12:55:46Z

enhancements/monitoring/alerts-ui-management.md

+- `namespace`
+- `component`
+- `severity`
+- `state` (one of: `enabled`, `disabled`, `silenced`)


(nit) only alerts can be silenced, rules are not.

Today we show in the alerting rules page if the alert rule is silenced.
When you create a silence for example: silence all info alerts.
You will see that the info alert rules are silenced..

@simonpasquier do you know where is this calculated today ? on the UI side?

I suppose that it's computed on the frontend side. @jgbernalp would know :)

You will see that the info alert rules are silenced..

Sorry for being nit-picky but the UI tells how many alerts associated to the alerting rule are silenced.

simonpasquier · 2025-08-27T12:57:51Z

enhancements/monitoring/alerts-ui-management.md

+
+## Proposal
+
+### 1. API Endpoints


I still fail to see the reason for a new API on top of what exists today:

Prometheus/Thanos APIs (/api/v1/rules and /api/v1/alerts)

Alertmanager API

Kubernetes API (for CRDs)

simonpasquier · 2025-09-16T14:06:53Z

enhancements/monitoring/alerts-ui-management.md

+## Summary
+Provide a user‑friendly UI in the OpenShift Console to manage Prometheus alerts. Including defining, viewing, editing, disabling/enabling and silencing alerts without manual YAML edits, and provide alert grouping in order to reduce alerts fatigue and improve operational efficiency.
+
+This includes adding a new REST APIs, Platform alerts will be overriden leveraging the  `AlertRelabelConfig` CR (in `cluster-monitoring-config`) rather than editing their original AlertingRules in the `PrometheusRule`.


ee

Suggested change

This includes adding a new REST APIs, Platform alerts will be overriden leveraging the `AlertRelabelConfig` CR (in `cluster-monitoring-config`) rather than editing their original AlertingRules in the `PrometheusRule`.

As described in the existing [alert overrides](https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alert-overrides.md) proposal, platform alerts will be overriden leveraging the `AlertRelabelConfig` CR (in `cluster-monitoring-config`) rather than editing their original AlertingRules in the `PrometheusRule`.

simonpasquier · 2025-09-16T14:09:30Z

enhancements/monitoring/alerts-ui-management.md

+
+1. **Bulk disable that are not required**
+   As a cluster administrator, I want to select and disable multiple alerts in one action,
+   so that I can permanently suppress non‑critical notifications and reduce noise.


I'm not sure if "disable alerts" means "silence active alerts" or "disable an existing alerting rule". If the latter, it overlaps with the next story.

one is for bulk update and second is for single alert. Its "disable"

We've talked about it before and my preference would be to limit this capability to platform alerting rules. Otherwise we need to expand how it will function for user-defined rules.

simonpasquier · 2025-09-16T14:10:58Z

enhancements/monitoring/alerts-ui-management.md

+4. **Create a custom alerting rule**
+   As a cluster admin/developer, I want to create an alerting rule using a form which allows me to specify mandatory fields (PromQL expression, name) and recommended fields (for duration, well-known labels/annotations such as severity, summary).
+
+5. **Clone an alert base on platform or user-defined alerting rule**


Suggested change

5. **Clone an alert base on platform or user-defined alerting rule**

5. **Clone an alert based on an existing alerting rule (platform or user-defined)**

simonpasquier · 2025-09-16T14:11:48Z

enhancements/monitoring/alerts-ui-management.md

+   after I used it to tune the expression that I need.
+
+## Goals
+1. Add a Console UI for managing alerts.


not really adding since there's already an Alerting page, right?

No UI for managing alerts today.

Sorry for being nit-picky but we need to be precise in the document and avoid confusing alerts and alerting rules. I feel that sometimes one name is used for the other.
I could argue that we can (at least partially) manage alerts from the UI since we can silence. IIUC what we're mostly after is being able to edit an alerting rule without editing the YAML manifest of the PrometheusRule resource holding that rule.

simonpasquier · 2025-09-16T14:12:40Z

enhancements/monitoring/alerts-ui-management.md

+
+## Goals
+1. Add a Console UI for managing alerts.
+2. Standardize `group` and `component` labels on alert rules to clearly surface priority and impact, to help administrators to understand what to address first.


while related to the UI improvements, it could be a separate enhancement proposal since it will impact all component owners (OCP core as well as layered products).

This is part of the effort. The UI uses these and its part of the MVP.

UI will allow to show aggregated alerts based on labels at the to in the "summery" in alerts page.

Then there's no detail in this enhancement about what it entails for existing operators.

To be more specific, we have https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md which describes the requirements and guidelines for OCP alerting rules (including layered products). It's fine to add new recommended labels but it should be documented there since it's the primary source of information for RH dev engineers.

simonpasquier · 2025-09-16T14:22:05Z

enhancements/monitoring/alerts-ui-management.md

+
+Key flows:
+- Web clients authenticate and access the OpenShift Console UI.
+- Console calls the Alerting API to list/manage alerts and rules.


@jan--f @jgbernalp this is where I'm not clear if the Alerting API component is absolutely needed. What prevents the aggregation to happen in the console (client-side)?

If we deploy this intermediary API service then it needs to have access to the user's credentials to ensure that it gets only the resources allowed for the user.

simonpasquier · 2025-09-16T14:26:00Z

enhancements/monitoring/alerts-ui-management.md

+## Open Questions
+1. **Per‑Alert vs. Single‑File**: Should each user‑defined alert reside in its own `PrometheusRule` file, or group all into one? A customer noted per‑alert files may simplify GitOps/Argo CD maintenance—does that hold true at scale?
+2. **Read‑Only Detection**: Which annotations, labels or ownerReferences reliably indicate GitOps‑managed resources to render them read‑only in our UI?
+3. **Concurrent Operator Updates**: How should we handle cases where upstream operators update their own `PrometheusRule` CRs—should we reconcile `AlertRelabelConfig` entries periodically?


It's handled by CMO. If the question is "what happens when an operator changes the definition and the related AlertRelabelConfig no longer matches?" then (as of now) the onus is on the cluster admin who need to detect the drift and update their customization.

simonpasquier · 2025-09-16T14:27:43Z

enhancements/monitoring/alerts-ui-management.md

+- Announce deprecation and support policy of the existing feature
+- Deprecate the feature
+
+## Upgrade / Downgrade Strategy


An important upgrade consideration is that existing URLs should remain operational because we know that users depend on them.

simonpasquier · 2025-09-16T14:30:03Z

enhancements/monitoring/alerts-ui-management.md

+- **GA**: best‑practice guidance in UI; multi‑namespace filtering; full test coverage; complete documentation
+
+## Open Questions
+1. **Per‑Alert vs. Single‑File**: Should each user‑defined alert reside in its own `PrometheusRule` file, or group all into one? A customer noted per‑alert files may simplify GitOps/Argo CD maintenance—does that hold true at scale?


IMHO the flexibility should be left to the users: when they go through the creation wizard, the final step should ask whether they want to create a new PrometheusRule (and possibly in which namespace) or if they want to append to an existing PrometheusRule (and to which group).

Signed-off-by: Shirly Radco <[email protected]>

openshift-ci · 2025-09-16T17:09:12Z

@sradco: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`68975c4`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tremes · 2025-09-18T08:52:41Z

enhancements/monitoring/alerts-ui-management.md

+5. Disable/enable alerts by creating/updating entries in the `AlertRelabelConfig` CR.
+6. Create/manage silences in Alertmanager and reflect this in the UI. - Already exists today.
+7. Aggregate view of all alerting rules, showing definitions plus relabel context.
+8. Aggregate view of all alerts, showing status (Pending, Firing, Silenced) and relabel context.


Do these two points assume some kind of mapping between the alert/alerting rule and its corresponding relabel context? There's no explicit connection or contract right now if I am not mistaken.

tremes · 2025-09-18T10:09:27Z

enhancements/monitoring/alerts-ui-management.md

+- This is not in the current scope.
+
+**Sub-tab: Manage Groups**
+- Default groups will include **Impact Group: Cluster** and **Impact Group: Namespace**.


Would it make sense for the group values to reflect the Source values? platform and user

Nah it doesn't probably make sense. The source and impact are two different things. Feel free to close this comment

tremes · 2025-09-18T10:15:17Z

enhancements/monitoring/alerts-ui-management.md

+    - `specHash` is server‑generated from the rule spec to ensure uniqueness and stability. It is computed as SHA‑256 (hex, lowercase) of the normalized fields:
+      - `expr`: trimmed with consecutive whitespace collapsed
+      - `for`: normalized to seconds
+      - `labels`: static labels only; drop empty values; sort by key ascending; join as `key=value` lines separated by `\n`


What is meant by static labels please?

openshift-ci bot requested review from jan--f and simonpasquier July 24, 2025 15:55

sradco force-pushed the alert_managment_enhancment_proposal branch 2 times, most recently from c7c9d56 to 019bb16 Compare July 24, 2025 17:37

sradco changed the title ~~Enhancment proposal for alerts managment~~ Enhancement proposal for alerts management Jul 27, 2025

sradco force-pushed the alert_managment_enhancment_proposal branch from 019bb16 to 1f7975e Compare July 27, 2025 07:39

machadovilaca reviewed Jul 28, 2025

View reviewed changes

enhancements/monitoring/alerts-ui-management.md Outdated Show resolved Hide resolved

simonpasquier reviewed Jul 28, 2025

View reviewed changes

sradco force-pushed the alert_managment_enhancment_proposal branch from 1f7975e to e67b7c0 Compare July 28, 2025 15:11

sradco force-pushed the alert_managment_enhancment_proposal branch 2 times, most recently from 6ef5ca0 to 31dbb17 Compare July 29, 2025 08:23

sradco commented Jul 29, 2025

View reviewed changes

enhancements/monitoring/alerts-ui-management.md Outdated Show resolved Hide resolved

sradco force-pushed the alert_managment_enhancment_proposal branch from 31dbb17 to e2704cf Compare July 29, 2025 12:16

sradco force-pushed the alert_managment_enhancment_proposal branch 4 times, most recently from 08d669d to bb3093a Compare July 30, 2025 11:55

sradco changed the title ~~Enhancement proposal for alerts management~~ Enhancement proposal for add API to support alerts management UI Jul 30, 2025

sradco changed the title ~~Enhancement proposal for add API to support alerts management UI~~ Enhancement proposal for new APIs to support alerts management UI Jul 30, 2025

sradco force-pushed the alert_managment_enhancment_proposal branch 4 times, most recently from ae4f850 to 9c285f2 Compare July 31, 2025 09:17

simonpasquier reviewed Aug 6, 2025

View reviewed changes

simonpasquier reviewed Aug 14, 2025

View reviewed changes

jgbernalp reviewed Aug 14, 2025

View reviewed changes

enhancements/monitoring/alerts-ui-management.md Outdated Show resolved Hide resolved

sradco force-pushed the alert_managment_enhancment_proposal branch from 9c285f2 to 5ecd10a Compare August 18, 2025 14:11

sradco commented Aug 18, 2025

View reviewed changes

sradco force-pushed the alert_managment_enhancment_proposal branch 2 times, most recently from f086b26 to e141f54 Compare August 20, 2025 11:34

jan--f reviewed Aug 27, 2025

View reviewed changes

simonpasquier reviewed Aug 27, 2025

View reviewed changes

sradco force-pushed the alert_managment_enhancment_proposal branch 6 times, most recently from 2a192fc to 7fcde81 Compare September 1, 2025 14:55

simonpasquier reviewed Sep 16, 2025

View reviewed changes

sradco force-pushed the alert_managment_enhancment_proposal branch from 7fcde81 to ec558d0 Compare September 16, 2025 16:43

Enhancement proposal for alerts management

68975c4

Signed-off-by: Shirly Radco <[email protected]>

sradco force-pushed the alert_managment_enhancment_proposal branch from ec558d0 to 68975c4 Compare September 16, 2025 16:57

tremes reviewed Sep 18, 2025

View reviewed changes

	Provide a user‑friendly UI and REST API for defining, viewing, editing, disabling/enabling and silencing Prometheus alerts without manual YAML edits, reducing alert fatigue and improving operational efficiency.
	Improve alert management in the OCP console to expose a unified view of the alerting rules (and associated alerts), provide flexible filtering capabilities and allow rules management without manual YAML edits.

	- While it's possible to customize built-in alerting rules with the `AlertingRule` + `AlertRelabelConfig` CRDs, the process is cumbersome and error-prone. It requires creating YAML manifests manually and there's no practical way to verify the correctness of the configuration.
	- While it's possible to customize built-in alerting rules with the `AlertingRule` + `AlertRelabelConfig` CRDs, the process is cumbersome and error-prone. It requires creating YAML manifests manually and there's no practical way to verify the correctness of the configuration. Built-in alerting rules and alerts are still visible in the OCP console after they've been overridden .

	- Some operational teams prefer an interactive console and API to manage alerts safely, guided by best practices.
	- Some operational teams prefer an interactive console and API to manage alerts.


		Rule identity

		- A rule is uniquely identified by the tuple: `(namespace, prometheusrule, ruleName, severity)`.

	This includes adding a new REST APIs, Platform alerts will be overriden leveraging the `AlertRelabelConfig` CR (in `cluster-monitoring-config`) rather than editing their original AlertingRules in the `PrometheusRule`.
	As described in the existing [alert overrides](https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alert-overrides.md) proposal, platform alerts will be overriden leveraging the `AlertRelabelConfig` CR (in `cluster-monitoring-config`) rather than editing their original AlertingRules in the `PrometheusRule`.

	5. Clone an alert base on platform or user-defined alerting rule
	5. Clone an alert based on an existing alerting rule (platform or user-defined)

Enhancement proposal for new APIs to support alerts management UI #1822

Are you sure you want to change the base?

Enhancement proposal for new APIs to support alerts management UI #1822

Uh oh!

Conversation

sradco commented Jul 24, 2025

Uh oh!

openshift-ci bot commented Jul 24, 2025

Uh oh!

sradco commented Jul 24, 2025

Uh oh!

Uh oh!

simonpasquier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sradco Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sradco commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

simonpasquier commented Jul 30, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sradco Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sradco Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sradco Jul 28, 2025 •

edited

Loading

sradco commented Jul 28, 2025 •

edited

Loading

sradco Aug 7, 2025 •

edited

Loading

sradco Aug 20, 2025 •

edited

Loading

simonpasquier Aug 27, 2025 •

edited

Loading

sradco Aug 28, 2025 •

edited

Loading