File tree Expand file tree Collapse file tree 5 files changed +333
-26
lines changed Expand file tree Collapse file tree 5 files changed +333
-26
lines changed Original file line number Diff line number Diff line change @@ -11,17 +11,77 @@ groups:
11
11
CAS thread count is higher than 400. May indicate overloaded CAS
12
12
server.
13
13
summary : CAS is using more than 400 threads.
14
- condition : A
14
+ condition : C
15
15
data :
16
16
- datasourceUid : prometheus
17
17
model :
18
- expr : cas_thread_count > 400
18
+ expr : cas_thread_count
19
19
instant : true
20
+ refId : A
20
21
refId : A
21
22
relativeTimeRange :
22
23
from : 300
23
24
to : 0
25
+ - datasourceUid : __expr__
26
+ model :
27
+ conditions :
28
+ - evaluator :
29
+ params : []
30
+ type : gt
31
+ operator :
32
+ type : and
33
+ query :
34
+ params :
35
+ - B
36
+ reducer :
37
+ params : []
38
+ type : last
39
+ type : query
40
+ datasource :
41
+ type : __expr__
42
+ uid : __expr__
43
+ expression : A
44
+ intervalMs : 1000
45
+ maxDataPoints : 43200
46
+ reducer : last
47
+ refId : B
48
+ type : reduce
49
+ refId : B
50
+ relativeTimeRange :
51
+ from : 300
52
+ to : 0
53
+ - datasourceUid : __expr__
54
+ model :
55
+ conditions :
56
+ - evaluator :
57
+ params :
58
+ - 400
59
+ type : gt
60
+ operator :
61
+ type : and
62
+ query :
63
+ params :
64
+ - C
65
+ reducer :
66
+ params : []
67
+ type : last
68
+ type : query
69
+ datasource :
70
+ type : __expr__
71
+ uid : __expr__
72
+ expression : B
73
+ intervalMs : 1000
74
+ maxDataPoints : 43200
75
+ refId : C
76
+ type : threshold
77
+ refId : C
78
+ relativeTimeRange :
79
+ from : 300
80
+ to : 0
81
+ execErrState : Error
24
82
for : 5m
83
+ isPaused : false
25
84
labels :
26
85
severity : warning
86
+ noDataState : NoData
27
87
uid : cas_thread_count
Original file line number Diff line number Diff line change 1
1
apiVersion : 1
2
2
groups :
3
3
- interval : 5m
4
- name : SAS Viya Alerts
5
4
folder : Database Alerts
5
+ name : SAS Viya Alerts
6
6
orgId : 1
7
7
rules :
8
8
- title : PostgreSQL Connection Utilization High
9
9
annotations :
10
- description : PostgreSQL database connection usage is above 85% of max connections.
11
- summary : Database is nearing connection limit .
12
- condition : A
10
+ description : PostgreSQL connection utilization is high (> 85%). May exhaust available connections.
11
+ summary : PostgreSQL connection utilization is high .
12
+ condition : C
13
13
data :
14
14
- datasourceUid : prometheus
15
15
model :
16
- expr : ( pg_stat_activity_count / pg_settings_max_connections) * 100 > 85
16
+ expr : pg_stat_activity_count / pg_settings_max_connections * 100
17
17
instant : true
18
+ intervalMs : 1000
19
+ maxDataPoints : 43200
20
+ refId : A
18
21
refId : A
19
22
relativeTimeRange :
20
23
from : 300
21
24
to : 0
25
+ - datasourceUid : __expr__
26
+ model :
27
+ conditions :
28
+ - evaluator :
29
+ params : []
30
+ type : gt
31
+ operator :
32
+ type : and
33
+ query :
34
+ params :
35
+ - B
36
+ reducer :
37
+ params : []
38
+ type : last
39
+ type : query
40
+ datasource :
41
+ type : __expr__
42
+ uid : __expr__
43
+ expression : A
44
+ intervalMs : 1000
45
+ maxDataPoints : 43200
46
+ reducer : last
47
+ refId : B
48
+ type : reduce
49
+ refId : B
50
+ relativeTimeRange :
51
+ from : 300
52
+ to : 0
53
+ - datasourceUid : __expr__
54
+ model :
55
+ conditions :
56
+ - evaluator :
57
+ params :
58
+ - 85
59
+ type : gt
60
+ operator :
61
+ type : and
62
+ query :
63
+ params :
64
+ - C
65
+ reducer :
66
+ params : []
67
+ type : last
68
+ type : query
69
+ datasource :
70
+ type : __expr__
71
+ uid : __expr__
72
+ expression : B
73
+ intervalMs : 1000
74
+ maxDataPoints : 43200
75
+ refId : C
76
+ type : threshold
77
+ refId : C
78
+ relativeTimeRange :
79
+ from : 300
80
+ to : 0
81
+ execErrState : Error
22
82
for : 5m
83
+ isPaused : false
23
84
labels :
24
85
severity : warning
25
- uid : postgres_connection_utilization
86
+ noDataState : NoData
87
+ uid : postgresql_connection_utilization
Original file line number Diff line number Diff line change @@ -11,21 +11,82 @@ groups:
11
11
summary :
12
12
NFS share > 85% full. Typically, it is due to users filling their home
13
13
directory or backups.
14
- condition : A
14
+ condition : C
15
15
data :
16
16
- datasourceUid : prometheus
17
17
model :
18
- expr :
19
- " ((kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=\" cas-default-data\" \
20
- }\n - kubelet_volume_stats_available_bytes{persistentvolumeclaim=\" cas-default-data\" \
21
- })\n / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=\" cas-default-data\" \
22
- }) * 100 > 85"
18
+ expr : |
19
+ (kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="cas-default-data"}
20
+ - kubelet_volume_stats_available_bytes{persistentvolumeclaim="cas-default-data"})
21
+ / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="cas-default-data"} * 100
23
22
instant : true
23
+ intervalMs : 1000
24
+ maxDataPoints : 43200
25
+ refId : A
24
26
refId : A
25
27
relativeTimeRange :
26
28
from : 21600
27
29
to : 0
30
+ - datasourceUid : __expr__
31
+ model :
32
+ conditions :
33
+ - evaluator :
34
+ params : []
35
+ type : gt
36
+ operator :
37
+ type : and
38
+ query :
39
+ params :
40
+ - B
41
+ reducer :
42
+ params : []
43
+ type : last
44
+ type : query
45
+ datasource :
46
+ type : __expr__
47
+ uid : __expr__
48
+ expression : A
49
+ intervalMs : 1000
50
+ maxDataPoints : 43200
51
+ reducer : last
52
+ refId : B
53
+ type : reduce
54
+ refId : B
55
+ relativeTimeRange :
56
+ from : 21600
57
+ to : 0
58
+ - datasourceUid : __expr__
59
+ model :
60
+ conditions :
61
+ - evaluator :
62
+ params :
63
+ - 85
64
+ type : gt
65
+ operator :
66
+ type : and
67
+ query :
68
+ params :
69
+ - C
70
+ reducer :
71
+ params : []
72
+ type : last
73
+ type : query
74
+ datasource :
75
+ type : __expr__
76
+ uid : __expr__
77
+ expression : B
78
+ intervalMs : 1000
79
+ maxDataPoints : 43200
80
+ refId : C
81
+ type : threshold
82
+ refId : C
83
+ relativeTimeRange :
84
+ from : 21600
85
+ to : 0
86
+ execErrState : Error
28
87
for : 5m
88
+ isPaused : false
29
89
labels :
30
90
severity : warning
91
+ noDataState : NoData
31
92
uid : nfs_share_usage
Original file line number Diff line number Diff line change @@ -11,19 +11,79 @@ groups:
11
11
Viya service API response latency is high (95th percentile over
12
12
1s).
13
13
summary : Degraded performance on Viya APIs.
14
- condition : A
14
+ condition : C
15
15
data :
16
16
- datasourceUid : prometheus
17
17
model :
18
- expr :
19
- histogram_quantile(0.95, sum(rate(http_server_requests_duration_seconds_bucket{job=~"sas-.*"}[5m]))
20
- by (le)) > 1
18
+ expr : histogram_quantile(0.95, sum(rate(http_server_requests_duration_seconds_bucket{job=~"sas-.*"}[5m])) by (le))
21
19
instant : true
20
+ intervalMs : 1000
21
+ maxDataPoints : 43200
22
+ refId : A
22
23
refId : A
23
24
relativeTimeRange :
24
25
from : 300
25
26
to : 0
27
+ - datasourceUid : __expr__
28
+ model :
29
+ conditions :
30
+ - evaluator :
31
+ params : []
32
+ type : gt
33
+ operator :
34
+ type : and
35
+ query :
36
+ params :
37
+ - B
38
+ reducer :
39
+ params : []
40
+ type : last
41
+ type : query
42
+ datasource :
43
+ type : __expr__
44
+ uid : __expr__
45
+ expression : A
46
+ intervalMs : 1000
47
+ maxDataPoints : 43200
48
+ reducer : last
49
+ refId : B
50
+ type : reduce
51
+ refId : B
52
+ relativeTimeRange :
53
+ from : 300
54
+ to : 0
55
+ - datasourceUid : __expr__
56
+ model :
57
+ conditions :
58
+ - evaluator :
59
+ params :
60
+ - 1
61
+ type : gt
62
+ operator :
63
+ type : and
64
+ query :
65
+ params :
66
+ - C
67
+ reducer :
68
+ params : []
69
+ type : last
70
+ type : query
71
+ datasource :
72
+ type : __expr__
73
+ uid : __expr__
74
+ expression : B
75
+ intervalMs : 1000
76
+ maxDataPoints : 43200
77
+ refId : C
78
+ type : threshold
79
+ refId : C
80
+ relativeTimeRange :
81
+ from : 300
82
+ to : 0
83
+ execErrState : Error
26
84
for : 5m
85
+ isPaused : false
27
86
labels :
28
87
severity : warning
88
+ noDataState : NoData
29
89
uid : viya_api_latency
You can’t perform that action at this time.
0 commit comments