OBSDOCS-345-best-practices-alert-relabeling-for-core-monitoring

bburt-rh · bburt-rh · commit 9f21bf5775ba · 2023-07-20T09:44:16.000-04:00
diff --git a/modules/monitoring-tips-for-optimizing-alerting-rules-for-core-platform-monitoring.adoc b/modules/monitoring-tips-for-optimizing-alerting-rules-for-core-platform-monitoring.adoc
@@ -0,0 +1,33 @@
+// Module included in the following assemblies:
+//
+// * monitoring/managing-alerts.adoc
+
+:_content-type: CONCEPT
+[id="tips-for-optimizing-alerting-rules-for-core-platform-monitoring_{context}"]
+= Tips for optimizing alerting rules for core platform monitoring
+
+If you customize core platform alerting rules to meet your organization's specific needs, follow these guidelines to help ensure that the customized rules are efficient and effective.
+
+* *Minimize the number of new rules*.
+Create only rules that are essential to your specific requirements.
+By minimizing the number of rules, you create a more manageable and focused alerting system in your monitoring environment.
+
+* *Focus on symptoms rather than causes*.
+Create rules that notify users of symptoms instead of underlying causes.
+This approach ensures that users are promptly notified of a relevant symptom so that they can investigate the root cause after an alert has triggered.
+This tactic also significantly reduces the overall number of rules you need to create.
+
+* *Plan and assess your needs before implementing changes*.
+First, decide what symptoms are important and what actions you want users to take if these symptoms occur.
+Then, assess existing rules and decide if you can modify any of them to meet your needs instead of creating entirely new rules for each symptom.
+By modifying existing rules and creating new ones judiciously, you help to streamline your alerting system.
+
+* *Provide clear alert messaging*.
+When you create alert messages, describe the symptom, possible causes, and recommended actions.
+Include unambiguous, concise explanations along with troubleshooting steps or links to more information.
+Doing so helps users quickly assess the situation and respond appropriately.
+
+* *Include severity levels*.
+Assign severity levels to your rules to indicate how a user needs to react when a symptom occurs and triggers an alert.
+For example, classifying an alert as *Critical* signals that an individual or a critical response team needs to respond immediately.
+By defining severity levels, you help users know how to respond to an alert and help ensure that the most urgent issues receive prompt attention.
diff --git a/monitoring/managing-alerts.adoc b/monitoring/managing-alerts.adoc
@@ -42,6 +42,22 @@ include::modules/monitoring-silencing-alerts.adoc[leveloffset=+2]
 include::modules/monitoring-editing-silences.adoc[leveloffset=+2]
 include::modules/monitoring-expiring-silences.adoc[leveloffset=+2]
 
+// Managing core platform alerting rules
+ifndef::openshift-dedicated,openshift-rosa[]
+// Tech Preview features are not documented in the ROSA/OSD docs. However, even when GA, ROSA/OSD generally doesn't include information about core platform monitoring.
+include::modules/monitoring-managing-core-platform-alerting-rules.adoc[leveloffset=+1]
+include::modules/monitoring-tips-for-optimizing-alerting-rules-for-core-platform-monitoring.adoc[leveloffset=+2]
+include::modules/monitoring-creating-new-alerting-rules.adoc[leveloffset=+2]
+include::modules/monitoring-modifying-core-platform-alerting-rules.adoc[leveloffset=+2]
+
+[role="_additional-resources"]
+.Additional resources
+* See xref:../monitoring/monitoring-overview.adoc#monitoring-overview[Monitoring overview] for details about {product-title} {product-version} monitoring architecture.
+* See the link:https://prometheus.io/docs/alerting/alertmanager/[Alertmanager documentation] for information about alerting rules.
+* See the link:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config[Prometheus relabeling documentation] for information about how relabeling works.
+* See the link:https://prometheus.io/docs/practices/alerting/[Prometheus alerting documentation] for further guidelines on optimizing alerts.
+endif::openshift-dedicated,openshift-rosa[]
+
 // Managing alerting rules for user-defined projects
 include::modules/monitoring-managing-alerting-rules-for-user-defined-projects.adoc[leveloffset=+1]
 include::modules/monitoring-optimizing-alerting-for-user-defined-projects.adoc[leveloffset=+2]
@@ -69,21 +85,6 @@ include::modules/monitoring-removing-alerting-rules-for-user-defined-projects.ad
 
 * See the link:https://prometheus.io/docs/alerting/alertmanager/[Alertmanager documentation]
 
-// Managing core platform alerting rules
-ifndef::openshift-dedicated,openshift-rosa[]
-// Tech Preview features are not documented in the ROSA/OSD docs. However, even when GA, ROSA/OSD generally doesn't include information about core platform monitoring.
-include::modules/monitoring-managing-core-platform-alerting-rules.adoc[leveloffset=+1]
-include::modules/monitoring-modifying-core-platform-alerting-rules.adoc[leveloffset=+2]
-include::modules/monitoring-creating-new-alerting-rules.adoc[leveloffset=+2]
-
-[role="_additional-resources"]
-.Additional resources
-* See xref:../monitoring/monitoring-overview.adoc#monitoring-overview[Monitoring overview] for details about {product-title} {product-version} monitoring architecture.
-* See the link:https://prometheus.io/docs/alerting/alertmanager/[Alertmanager documentation] for information about alerting rules.
-* See the link:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config[Prometheus relabeling documentation] for information about how relabeling works.
-* See the link:https://prometheus.io/docs/practices/alerting/[Prometheus alerting documentation] for further guidelines on optimizing alerts.
-endif::openshift-dedicated,openshift-rosa[]
-
 // Sending notifications to external systems
 include::modules/monitoring-sending-notifications-to-external-systems.adoc[leveloffset=+1]
 // Configuring alert receivers