[MOPU-301] AI driven improvement on "What's happening" section for monitor templates (#22628)

vuvkar · drichards-87 · web-flow · commit cdd10a84f3c3 · 2026-02-20T10:53:40.000Z
* [MOPU-301] What's happening section for monitor templates

* Update nginx/assets/monitors/5xx.json

Co-authored-by: DeForest Richards &lt;56796055+drichards-87@users.noreply.github.com&gt;

* Update nginx/assets/monitors/upstream_peer_fails.json

Co-authored-by: DeForest Richards &lt;56796055+drichards-87@users.noreply.github.com&gt;

* [MOPU-301] Reverted monitor files formatting.

* [MOPU-301] Scope generated message withing {{is_alert}} conditional variable.

* [MOPU-301] Scope generated message withing {{is_alert}} conditional variable.

* [MOPU-301] manual enhancement over AI generated messages.

* [MOPU-301] fix type

---------

Co-authored-by: DeForest Richards &lt;56796055+drichards-87@users.noreply.github.com&gt;
diff --git a/kubernetes/assets/monitors/monitor_deployments_replicas.json b/kubernetes/assets/monitors/monitor_deployments_replicas.json
@@ -8,7 +8,7 @@
   ],
   "description": "Kubernetes replicas are clones that facilitate self-healing for pods. Each pod has a desired number of replica Pods that should be running at any given time. This monitor tracks the number of replicas that are failing per deployment.",
   "definition": {
-    "message": "More than one Deployments Replica's pods are down in Deployment {{kube_namespace.name}}/{{kube_deployment.name}}.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nThere are at least 2 or more missing replicas for Deployment {{kube_namespace.name}}/{{kube_deployment.name}} over the last 15 minutes.\n\n{{/is_alert}}",
     "name": "[Kubernetes] Monitor Kubernetes Deployments Replica Pods",
     "options": {
       "escalation_message": "",
diff --git a/kubernetes/assets/monitors/monitor_node_unavailable.json b/kubernetes/assets/monitors/monitor_node_unavailable.json
@@ -8,7 +8,7 @@
   ],
   "description": "Kubernetes nodes can either be schedulable or unschedulable. When unschedulable, the node prevents the scheduler from placing new pods onto that node. This monitor tracks the percentage of schedulable nodes.",
   "definition": {
-    "message": "More than 20% of nodes are unschedulable on ({{kube_cluster_name.name}} cluster). \n Keep in mind that this might be expected based on your infrastructure.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nThe percentage of schedulable nodes is below 80% for status:schedulable on ({{kube_cluster_name.name}} cluster over the last 15 minutes.\n\n{{/is_alert}}\n\n Keep in mind that this might be expected based on your infrastructure.",
     "name": "[Kubernetes] Monitor Unschedulable Kubernetes Nodes",
     "options": {
       "escalation_message": "",
diff --git a/kubernetes/assets/monitors/monitor_pod_crashloopbackoff.json b/kubernetes/assets/monitors/monitor_pod_crashloopbackoff.json
@@ -8,7 +8,7 @@
   ],
   "description": "The status CrashloopBackOff means that a container in the Pod is started, crashes, and is restarted, over and over again. This monitor tracks when a pod is in a CrashloopBackOff state for your Kubernetes integration.",
   "definition": {
-    "message": "pod {{pod_name.name}} is in CrashloopBackOff on {{kube_namespace.name}} \n This alert could generate several alerts for a bad deployment. Adjust the thresholds of the query to suit your infrastructure.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nAt least one container in pod {{pod_name.name}} on {{kube_namespace.name}} is in a waiting state due to reason crashloopbackoff in the last 10 minutes.\n\n{{/is_alert}}\n\n This alert could generate several alerts for a bad deployment. Adjust the thresholds of the query to suit your infrastructure.",
     "name": "[Kubernetes] Pod {{pod_name.name}} is CrashloopBackOff on namespace {{kube_namespace.name}}",
     "options": {
       "escalation_message": "",
diff --git a/kubernetes/assets/monitors/monitor_pod_imagepullbackoff.json b/kubernetes/assets/monitors/monitor_pod_imagepullbackoff.json
@@ -8,7 +8,7 @@
   ],
   "description": "The status ImagePullBackOff means that a container could not start because Kubernetes could not pull a container image. This monitor tracks when a pod is in an ImagePullBackOff state for your Kubernetes integration.",
   "definition": {
-    "message": "pod {{pod_name.name}} is in ImagePullBackOff on {{kube_namespace.name}} \n This could happen for several reasons, for example a bad image path or tag or if the credentials for pulling images are not configured properly.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nAt least one container in pod {{pod_name.name}} on namespace {{kube_namespace.name}} is in a waiting state due to an ImagePullBackOff error in the last 10 minutes.\n\n{{/is_alert}}\n\n This could happen for several reasons, for example a bad image path or tag or if the credentials for pulling images are not configured properly.",
     "name": "[Kubernetes] Pod {{pod_name.name}} is ImagePullBackOff on namespace {{kube_namespace.name}}",
     "options": {
       "escalation_message": "",
diff --git a/kubernetes/assets/monitors/monitor_pod_oomkilled.json b/kubernetes/assets/monitors/monitor_pod_oomkilled.json
@@ -8,7 +8,7 @@
   ],
   "description": "The status OOMKilled means that a container was killed because it exceeded memory limits or the node ran out of available memory. This monitor tracks when a pod is in an OOMKilled state for your Kubernetes integration.",
   "definition": {
-    "message": "pod {{pod_name.name}} is in OOMKilled on {{kube_namespace.name}} \n This could happen for several reasons, for example insufficient memory limits, memory leaks in the application, or the node running out of available memory.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nThere has been at least one container terminated in pod {{pod_name.name}} on namespace {{kube_namespace.name}} with reason oomkilled in the last 10 minutes.\n\n{{/is_alert}}\n\n This could happen for several reasons, for example insufficient memory limits, memory leaks in the application, or the node running out of available memory.",
     "name": "[Kubernetes] Pod {{pod_name.name}} is OOMKilled on namespace {{kube_namespace.name}}",
     "options": {
       "escalation_message": "",
diff --git a/kubernetes/assets/monitors/monitor_pods_failed_state.json b/kubernetes/assets/monitors/monitor_pods_failed_state.json
@@ -8,7 +8,7 @@
   ],
   "description": "When a pod is failing it means the container either exited with non-zero status or was terminated by the system. This monitor tracks when more than 10 pods are failing for a given Kubernetes cluster.",
   "definition": {
-    "message": "More than ten pods are failing in ({{kube_cluster_name.name}} cluster). \n The threshold of ten pods varies depending on your infrastructure. Change the threshold to suit your needs.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nThe number of failed pods has increased by more than 10 in ({{kube_cluster_name.name}} cluster in the last 5 minutes.\n\n{{/is_alert}}\n\n The threshold of ten pods varies depending on your infrastructure. Change the threshold to suit your needs.",
     "name": "[Kubernetes] Monitor Kubernetes Failed Pods in Namespaces",
     "options": {
       "escalation_message": "",
diff --git a/kubernetes/assets/monitors/monitor_pods_restarting.json b/kubernetes/assets/monitors/monitor_pods_restarting.json
@@ -8,7 +8,7 @@
   ],
   "description": "Kubernetes pods restart according to the restart policy. A restarting container can indicate problems with memory, CPU usage, or an application exiting prematurely. This monitor tracks when pods are restarting multiple times.",
   "definition": {
-    "message": "Pod {{pod_name.name}} restarted multiple times in the last five minutes.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nThere has been an increase of more than 5 container restarts in the pod {{pod_name.name}} in the last 5 minutes.\n\n{{/is_alert}}",
     "name": "[Kubernetes] Monitor Kubernetes Pods Restarting",
     "options": {
       "escalation_message": "",
diff --git a/kubernetes/assets/monitors/monitor_statefulset_replicas.json b/kubernetes/assets/monitors/monitor_statefulset_replicas.json
@@ -8,7 +8,7 @@
   ],
   "description": "Kubernetes replicas are clones that facilitate self-healing for pods. Each pod has a desired number of replica Pods that should be running at any given time. This monitor tracks when the number of replicas per statefulset is falling.",
   "definition": {
-    "message": "More than one Statefulset Replica's pods are down in Statefulset {{kube_namespace.name}}/{{kube_stateful_set.name}}. This might present an unsafe situation for any further manual operations, such as killing other pods.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nThere are at least 2 desired replicas that are not ready for {{kube_namespace.name}}/{{kube_stateful_set.name}} StatefulSet over the last 15 minutes.\n\n{{/is_alert}}\n\n This might present an unsafe situation for any further manual operations, such as killing other pods.",
     "name": "[Kubernetes] Monitor Kubernetes Statefulset Replicas",
     "options": {
       "escalation_message": "",
diff --git a/nginx/assets/monitors/4xx.json b/nginx/assets/monitors/4xx.json
@@ -8,7 +8,7 @@
   ],
   "description": "NGINX sends requests to upstream peers that can fail eventually. This monitor tracks the count of 4xx HTTP responses to identify issues in the communication between NGINX and the backend servers.",
   "definition": {
-    "message": "Number of 4xx errors on NGINX upstreams is at {{value}} which is higher than usual.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nAn anomaly has been detected in the number of 4xx responses from NGINX upstream peers over the last hour, with a value of {{value}}.\n\n{{/is_alert}}",
     "name": "[NGINX] 4xx Errors higher than usual",
     "options": {
       "escalation_message": "",
diff --git a/nginx/assets/monitors/5xx.json b/nginx/assets/monitors/5xx.json
@@ -8,7 +8,7 @@
   ],
   "description": "“5xx upstream request errors” are indicating server issues from backend servers. This monitor tracks the count of 5xx responses from NGINX's upstream peers to identify server-related issues in your web or application infrastructure.",
   "definition": {
-    "message": "Number of 5xx errors on NGINX upstreams is at {{value}} which is higher than usual.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nAn anomaly has been detected in the number of 5xx responses from NGINX upstream peers over the last hour, with a value of {{value}}.\n\n{{/is_alert}}\n\n",
     "name": "[NGINX] 5xx Errors higher than usual",
     "options": {
       "escalation_message": "",
diff --git a/nginx/assets/monitors/upstream_peer_fails.json b/nginx/assets/monitors/upstream_peer_fails.json
@@ -8,7 +8,7 @@
   ],
   "description": "NGINX can be configured to distribute incoming client requests to multiple upstream peers (individual web servers, application servers, or other backend services). This monitor tracks anomalies in the number of failed upstream peers to identify issues.",
   "definition": {
-    "message": "NGINX upstream peer failures are higher than usual at {{value}}.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nAn anomaly has been detected in the average number of unsuccessful upstream attempts to communicate with the server over the last hour, with a value of {{value}}.\n\n{{/is_alert}}",
     "name": "[NGINX] Upstream peers fails",
     "options": {
       "escalation_message": "",
diff --git a/postgres/assets/monitors/percent_usage_connections.json b/postgres/assets/monitors/percent_usage_connections.json
@@ -8,7 +8,7 @@
   ],
   "description": "In PostgreSQL, there is a limit of concurrent connections that can be increased. When this limit is exceeded, new users cannot establish a connection with the database. This monitor tracks the total number of connections.",
   "definition": {
-    "message": "Please check host {{host.name}}, as the Postgres connection pool is approaching saturation.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nPostgreSQL connection usage on host {{host.name}} has exceeded 90% of the maximum allowed connections over the last 15 minutes.\n\n{{/is_alert}}",
     "name": "[Postgres] Number of connections is approaching connection limit on {{host.name}}",
     "options": {
       "escalation_message": "",
diff --git a/postgres/assets/monitors/replication_delay.json b/postgres/assets/monitors/replication_delay.json
@@ -8,7 +8,7 @@
   ],
   "description": "Replication lag is the delay between the time when data is written to the primary database and the time when it is replicated to the standby databases. This monitor tracks the replication lag of the postgres database.",
   "definition": {
-    "message": "Please check host {{host.name}}, as replication delay has been abnormally high.",
+    "message": "{{#is_alert}}\n\n## What's happening?\nAnomalies in replication delay on host {{host.name}} for PostgreSQL have been detected above the expected range within the past 15 minutes, over the last hour.\n\n{{/is_alert}}",
     "name": "[Postgres] Replication delay is abnormally high on {{host.name}}",
     "options": {
       "escalation_message": "",
diff --git a/redisdb/assets/monitors/high_mem.json b/redisdb/assets/monitors/high_mem.json
@@ -8,7 +8,7 @@
   ],
   "description": "Redis servers use RAM to store data and memory is a critical resource for its performance. This monitor tracks the percentage of used memory to avoid the risk of running out of memory, which can lead to performance issues.",
   "definition": {
-    "message": "{{#is_alert}}\n\nALERT: Redis is consuming {{value}}% of total memory allocated.\n\n{{/is_alert}} \n\n{{#is_warning}}\n\nWARNING: Redis is consuming {{value}}% of total memory allocated.\n\n{{/is_warning}} \n\n",
+    "message": "## What's happening?\n{{#is_alert}}\nRedis memory usage has exceeded 90% of its allocated limit in the last 5 minutes with current value of {{value}}.\n{{/is_alert}} \n\n{{#is_warning}}\nRedis memory usage has exceeded 70% of its allocated limit in the last 5 minutes with current value of {{value}}.\n{{/is_warning}}",
     "name": "[Redis] High memory consumption",
     "options": {
       "escalation_message": "",