@@ -75,21 +75,18 @@ The following elements may need to be adjusted to match your specific environmen
75
75
76
76
# ### 4. Alert Thresholds
77
77
Adjust thresholds based on your environment size and requirements :
78
- - `cas/cas-thread-count-high.yaml` : > 400 threads
79
78
- `cas/cas-memory-usage-high.yaml` : > 300 GB
80
- - `database/postgresql-connection-utilization-high.yaml` : > 85%
81
79
- `platform/rabbitmq-ready-queue-backlog.yaml` : > 10,000 messages
82
80
- `platform/rabbitmq-unacked-queue-backlog.yaml` : > 5,000 messages
83
81
- `platform/viya-pod-restart-count-high.yaml` : > 20 restarts
84
82
- `other/nfs-share-high-usage.yaml` : > 85% full
85
- - `platform/high-viya-api-latency.yaml` : > 1 second (95th percentile)
86
83
- `database/crunchy-pgdata-usage-high.yaml` and `database/crunchy-backrest-repo.yaml` : > 50% full
87
84
88
85
# ### 5. Verify Metric Availability
89
86
Ensure the following metrics are available in your Prometheus instance :
90
87
- CAS metrics : ` cas_thread_count` , `cas_grid_uptime_seconds_total`
91
88
- Database metrics : ` sas_db_pool_connections` , `pg_stat_activity_count`, `pg_settings_max_connections`
92
- - RabbitMQ metrics : ` rabbitmq_queue_messages_ready` , `rabbitmq_queue_messages_unacknowledged `
89
+ - RabbitMQ metrics : ` rabbitmq_queue_messages_ready` , `rabbitmq_queue_messages_unacked `
93
90
- Kubernetes metrics : ` kube_pod_container_status_restarts_total` , `kube_pod_container_status_ready`
94
91
- HTTP metrics : ` http_server_requests_duration_seconds_bucket`
95
92
- SAS Job Launcher : ` :sas_launcher_pod_status:` (recording rule)
0 commit comments