@@ -75,21 +75,18 @@ The following elements may need to be adjusted to match your specific environmen
7575
7676# ### 4. Alert Thresholds
7777Adjust thresholds based on your environment size and requirements :
78- - `cas/cas-thread-count-high.yaml` : > 400 threads
7978- `cas/cas-memory-usage-high.yaml` : > 300 GB
80- - `database/postgresql-connection-utilization-high.yaml` : > 85%
8179 - `platform/rabbitmq-ready-queue-backlog.yaml` : > 10,000 messages
8280 - `platform/rabbitmq-unacked-queue-backlog.yaml` : > 5,000 messages
8381 - `platform/viya-pod-restart-count-high.yaml` : > 20 restarts
8482 - `other/nfs-share-high-usage.yaml` : > 85% full
85- - `platform/high-viya-api-latency.yaml` : > 1 second (95th percentile)
8683 - `database/crunchy-pgdata-usage-high.yaml` and `database/crunchy-backrest-repo.yaml` : > 50% full
8784
8885# ### 5. Verify Metric Availability
8986Ensure the following metrics are available in your Prometheus instance :
9087- CAS metrics : ` cas_thread_count` , `cas_grid_uptime_seconds_total`
9188- Database metrics : ` sas_db_pool_connections` , `pg_stat_activity_count`, `pg_settings_max_connections`
92- - RabbitMQ metrics : ` rabbitmq_queue_messages_ready` , `rabbitmq_queue_messages_unacknowledged `
89+ - RabbitMQ metrics : ` rabbitmq_queue_messages_ready` , `rabbitmq_queue_messages_unacked `
9390- Kubernetes metrics : ` kube_pod_container_status_restarts_total` , `kube_pod_container_status_ready`
9491- HTTP metrics : ` http_server_requests_duration_seconds_bucket`
9592- SAS Job Launcher : ` :sas_launcher_pod_status:` (recording rule)
0 commit comments