You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: csp-mixin/alerts/azure-alerts.yml
+127-1Lines changed: 127 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -22,9 +22,135 @@ groups:
22
22
keep_firing_for: 10m
23
23
labels:
24
24
severity: critical
25
-
service: 'Azure Virtual Machines.'
25
+
service: 'Azure Virtual Machines'
26
26
namespace: cloud-provider-azure
27
27
annotations:
28
28
summary: 'VM unavailable.'
29
29
description: 'The VM {{ $labels.resourceName }} is not functioning or crashed, which may require immediate action.'
30
30
dashboard_uid: '58f33c50e66c911b0ad8a25aa438a96e'
31
+
32
+
- alert: AzureHighDtuConsumption
33
+
expr: |
34
+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_dtu_consumption_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 90
35
+
for: 10m
36
+
keep_firing_for: 10m
37
+
labels:
38
+
severity: critical
39
+
service: 'Azure SQL database'
40
+
namespace: cloud-provider-azure
41
+
annotations:
42
+
summary: 'High DTU consumption.'
43
+
description: 'Check active queries and optimize indexes or consider scaling up DTUs to handle load in {{ $labels.resourceName }} database.'
44
+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
45
+
46
+
- alert: AzureHighStorageUsage
47
+
expr: |
48
+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_storage_percent_maximum_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 85
49
+
for: 10m
50
+
keep_firing_for: 10m
51
+
labels:
52
+
severity: critical
53
+
service: 'Azure SQL database'
54
+
namespace: cloud-provider-azure
55
+
annotations:
56
+
summary: 'High Storage usage.'
57
+
description: 'Archive or delete old data, or scale up storage capacity in {{ $labels.resourceName }} database.'
58
+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
59
+
60
+
- alert: AzureHighDeadlockCount
61
+
expr: |
62
+
sum by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_deadlock_total_count{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 5
63
+
for: 10m
64
+
keep_firing_for: 10m
65
+
labels:
66
+
severity: info
67
+
service: 'Azure SQL database'
68
+
namespace: cloud-provider-azure
69
+
annotations:
70
+
summary: 'High Deadlock count.'
71
+
description: 'Check {{ $labels.resourceName }} database logs for deadlock details and optimize affected queries.'
72
+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
73
+
74
+
- alert: AzureHighUserCpuUsage
75
+
expr: |
76
+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_cpu_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 80
77
+
for: 10m
78
+
keep_firing_for: 10m
79
+
labels:
80
+
severity: warning
81
+
service: 'Azure SQL database'
82
+
namespace: cloud-provider-azure
83
+
annotations:
84
+
summary: 'High User CPU usage.'
85
+
description: 'Identify high CPU queries on {{ $labels.resourceName }} database and optimize them.'
86
+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
87
+
88
+
- alert: AzureHighSystemFailedConnections
89
+
expr: |
90
+
sum by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_connection_failed_total_count{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 10
91
+
for: 5m
92
+
keep_firing_for: 10m
93
+
labels:
94
+
severity: warning
95
+
service: 'Azure SQL database'
96
+
namespace: cloud-provider-azure
97
+
annotations:
98
+
summary: 'High number of System Failed connections.'
99
+
description: 'Check network problems, firewall restrictions or high resource consumption affecting application access to the database {{ $labels.resourceName }}.'
100
+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
101
+
102
+
- alert: AzureHighUserFailedConnections
103
+
expr: |
104
+
sum by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_connection_failed_user_error_total_count{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 10
105
+
for: 5m
106
+
keep_firing_for: 10m
107
+
labels:
108
+
severity: warning
109
+
service: 'Azure SQL database'
110
+
namespace: cloud-provider-azure
111
+
annotations:
112
+
summary: 'High number of User Failed connections.'
113
+
description: 'Check for authentication problems, network configuration errors, firewall issues, or resource constraints, affecting database accessibility for users on database {{ $labels.resourceName }}.'
114
+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
115
+
116
+
- alert: AzureHighWorkerUsage
117
+
expr: |
118
+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_workers_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 60
119
+
for: 5m
120
+
keep_firing_for: 10m
121
+
labels:
122
+
severity: critical
123
+
service: 'Azure SQL database'
124
+
namespace: cloud-provider-azure
125
+
annotations:
126
+
summary: 'High worker usage.'
127
+
description: 'Look for long execution queries, review the number of concurrent queries and requests being sent to the database or check if there are any blocking sessions or deadlocks into the {{ $labels.resourceName }} database.'
128
+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
129
+
130
+
- alert: AzureHighDataIoUsage
131
+
expr: |
132
+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_physical_data_read_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 90
133
+
for: 15m
134
+
keep_firing_for: 10m
135
+
labels:
136
+
severity: info
137
+
service: 'Azure SQL database'
138
+
namespace: cloud-provider-azure
139
+
annotations:
140
+
summary: 'High data IO usage.'
141
+
description: 'Review queries with high read or write activity, check if there are missing indexes or inefficient indexes that result in full table scans and assess the volume of transactions into the {{ $labels.resourceName }} database.'
142
+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
143
+
144
+
- alert: AzureLowTempdbLogSpace
145
+
expr: |
146
+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_tempdb_log_used_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 60
147
+
for: 5m
148
+
keep_firing_for: 10m
149
+
labels:
150
+
severity: critical
151
+
service: 'Azure SQL database'
152
+
namespace: cloud-provider-azure
153
+
annotations:
154
+
summary: 'Low tempdb log space.'
155
+
description: 'Look for active sessions that might be using TempDB intensively, identify stored procedures or queries that create temporary tables or objects, and also look for long-running or memory-intensive queries that rely heavily on TempDB into the {{ $labels.resourceName }} database.'
0 commit comments