RHDEVDOCS-3423-managing-user-defined-alerting-rules

bburt-rh · bburt-rh · commit 110f634a68ef · 2022-08-17T10:30:04.000-04:00
diff --git a/modules/monitoring-creating-new-alerting-rules.adoc b/modules/monitoring-creating-new-alerting-rules.adoc
@@ -0,0 +1,53 @@
+// Module included in the following assemblies:
+//
+// * monitoring/managing-alerts.adoc
+
+:_content-type: PROCEDURE
+[id="creating-new-alerting-rules_{context}"]
+= Creating new alerting rules
+
+As a cluster administrator, you can create new alerting rules based on platform metrics. 
+These alerting rules trigger alerts based on the values of chosen metrics.
+
+[NOTE]
+====
+If you create a customized `AlertingRule` resource based on an existing platform alerting rule, silence the original alert to avoid receiving conflicting alerts.
+====
+
+.Prerequisites
+
+* You are logged in as a user that has the `cluster-admin` role.
+* You have installed the OpenShift CLI (`oc`).
+* You have enabled Technology Preview features, and all nodes in the cluster are ready.
+
+
+.Procedure
+
+. Create a new YAML configuration file named `example-alerting-rule.yaml` in the `openshift-monitoring` namespace.
+
+. Add an `AlertingRule` resource to the YAML file. 
+The following example creates a new alerting rule named `example`, similar to the default `watchdog` alert:
++
+[source,yaml]
+----
+apiVersion: monitoring.openshift.io/v1alpha1
+kind: AlertingRule
+metadata:
+  name: example
+  namespace: openshift-monitoring
+spec:
+  groups:
+  - name: example-rules
+    rules:
+    - alert: ExampleAlert <1>
+      expr: vector(1) <2>
+----
+<1> The name of the alerting rule you want to create.
+<2> The PromQL query expression that defines the new rule.
+
+. Apply the configuration file to the cluster:
++
+[source,terminal]
+----
+$ oc apply -f example-alerting-rule.yaml
+----
diff --git a/modules/monitoring-managing-alerting-rules-for-user-defined-projects.adoc b/modules/monitoring-managing-alerting-rules-for-user-defined-projects.adoc
@@ -3,8 +3,8 @@
 // * monitoring/managing-alerts.adoc
 
 :_content-type: CONCEPT
-[id="managing-alerting-rules_{context}"]
-= Managing alerting rules
+[id="managing-alerting-rules-for-user-defined-projects_{context}"]
+= Managing alerting rules for user-defined projects
 
 {product-title} monitoring ships with a set of default alerting rules. As a cluster administrator, you can view the default alerting rules.
 
diff --git a/modules/monitoring-managing-core-platform-alerting-rules.adoc b/modules/monitoring-managing-core-platform-alerting-rules.adoc
@@ -0,0 +1,28 @@
+// Module included in the following assemblies:
+//
+// * monitoring/managing-alerts.adoc
+
+:_content-type: CONCEPT
+[id="managing-core-platform-alerting-rules_{context}"]
+= Managing alerting rules for core platform monitoring
+
+:FeatureName: Creating and modifying alerting rules for core platform monitoring
+include::snippets/technology-preview.adoc[leveloffset=+1]
+
+{product-title} {product-version} monitoring ships with a large set of default alerting rules for platform metrics.
+As a cluster administrator, you can customize this set of rules in two ways:
+
+* Modify the settings for existing platform alerting rules by adjusting thresholds or by adding and modifying labels.
+For example, you can change the `severity` label for an alert from `warning` to `critical` to help you route and triage issues flagged by an alert.
+
+* Define and add new custom alerting rules by constructing a query expression based on core platform metrics in the `openshift-monitoring` namespace.
+
+.Core platform alerting rule considerations
+
+* New alerting rules must be based on the default {product-title} monitoring metrics.
+
+* You can only add and modify alerting rules. You cannot create new recording rules or modify existing recording rules.
+
+* If you modify existing platform alerting rules by using an `AlertRelabelConfig` object, your modifications are not reflected in the Prometheus alerts API. 
+Therefore, any dropped alerts still appear in the {product-title} web console even though they are no longer forwarded to Alertmanager. 
+Additionally, any modifications to alerts, such as a changed `severity` label, do not appear in the web console.
diff --git a/modules/monitoring-modifying-core-platform-alerting-rules.adoc b/modules/monitoring-modifying-core-platform-alerting-rules.adoc
@@ -0,0 +1,54 @@
+// Module included in the following assemblies:
+//
+// * monitoring/managing-alerts.adoc
+
+:_content-type: PROCEDURE
+[id="modifying-core-platform-alerting-rules_{context}"]
+= Modifying core platform alerting rules
+
+As a cluster administrator, you can modify core platform alerts before Alertmanager routes them to a receiver. 
+For example, you can change the severity label of an alert, add a custom label, or exclude an alert from being sent to Alertmanager.
+
+.Prerequisites
+
+* You have access to the cluster as a user with the `cluster-admin` role.
+* You have installed the OpenShift CLI (`oc`).
+* You have enabled Technology Preview features, and all nodes in the cluster are ready.
+
+
+.Procedure
+
+. Create a new YAML configuration file named `example-modified-alerting-rule.yaml` in the `openshift-monitoring` namespace.
+
+. Add an `AlertRelabelConfig` resource to the YAML file. 
+The following example modifies the `severity` setting to `critical` for the default platform `watchdog` alerting rule: 
++
+[source,yaml]
+----
+apiVersion: monitoring.openshift.io/v1alpha1
+kind: AlertRelabelConfig
+metadata:
+  name: watchdog
+  namespace: openshift-monitoring
+spec:
+  configs:
+  - sourceLabels: [alertname,severity] <1>
+    regex: "Watchdog;none" <2>
+    targetLabel: severity <3>
+    replacement: critical <4>
+    action: Replace <5>
+----
+<1> The source labels for the values you want to modify.
+<2> The regular expression against which the value of `sourceLabels` is matched.
+<3> The target label of the value you want to modify.
+<4> The new value to replace the target label.
+<5> The relabel action that replaces the old value based on regex matching. 
+The default action is `Replace`.
+Other possible values are `Keep`, `Drop`, `HashMod`, `LabelMap`, `LabelDrop`, and `LabelKeep`.
+
+. Apply the configuration file to the cluster:
++
+[source,terminal]
+----
+$ oc apply -f example-modified-alerting-rule.yaml
+----
diff --git a/monitoring/managing-alerts.adoc b/monitoring/managing-alerts.adoc
@@ -26,13 +26,18 @@ include::modules/monitoring-searching-alerts-silences-and-alerting-rules.adoc[le
 // Getting information about alerts, silences and alerting rules
 include::modules/monitoring-getting-information-about-alerts-silences-and-alerting-rules.adoc[leveloffset=+1]
 
-// Managing alerting rules
-include::modules/monitoring-managing-alerting-rules.adoc[leveloffset=+1]
+// Managing silences
+include::modules/monitoring-managing-silences.adoc[leveloffset=+1]
+include::modules/monitoring-silencing-alerts.adoc[leveloffset=+2]
+include::modules/monitoring-editing-silences.adoc[leveloffset=+2]
+include::modules/monitoring-expiring-silences.adoc[leveloffset=+2]
+
+// Managing alerting rules for user-defined projects
+include::modules/monitoring-managing-alerting-rules-for-user-defined-projects.adoc[leveloffset=+1]
 include::modules/monitoring-optimizing-alerting-for-user-defined-projects.adoc[leveloffset=+2]
 
 [role="_additional-resources"]
 .Additional resources
-
 * See the link:https://prometheus.io/docs/practices/alerting/[Prometheus alerting documentation] for further guidelines on optimizing alerts
 * See xref:../monitoring/monitoring-overview.adoc#monitoring-overview[Monitoring overview] for details about {product-title} {product-version} monitoring architecture
 
@@ -50,11 +55,19 @@ include::modules/monitoring-removing-alerting-rules-for-user-defined-projects.ad
 
 * See the link:https://prometheus.io/docs/alerting/alertmanager/[Alertmanager documentation]
 
-// Managing silences
-include::modules/monitoring-managing-silences.adoc[leveloffset=+1]
-include::modules/monitoring-silencing-alerts.adoc[leveloffset=+2]
-include::modules/monitoring-editing-silences.adoc[leveloffset=+2]
-include::modules/monitoring-expiring-silences.adoc[leveloffset=+2]
+// Managing core platform alerting rules
+include::modules/monitoring-managing-core-platform-alerting-rules.adoc[leveloffset=+1]
+include::modules/monitoring-modifying-core-platform-alerting-rules.adoc[leveloffset=+2]
+include::modules/monitoring-creating-new-alerting-rules.adoc[leveloffset=+2]
+
+[role="_additional-resources"]
+.Additional resources
+* See xref:../monitoring/monitoring-overview.adoc#monitoring-overview[Monitoring overview] for details about {product-title} {product-version} monitoring architecture.
+* See the link:https://prometheus.io/docs/alerting/alertmanager/[Alertmanager documentation] for information about alerting rules.
+* See the link:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config[Prometheus relabeling documentation] for information about how relabeling works.
+* See the link:https://prometheus.io/docs/practices/alerting/[Prometheus alerting documentation] for further guidelines on optimizing alerts.
+
+
 
 // Sending notifications to external systems
 include::modules/monitoring-sending-notifications-to-external-systems.adoc[leveloffset=+1]