Skip to content

Commit da81b34

Browse files
yduartepanaivanov
andauthored
add new azure sql database alerts (#1357)
* add new azure sql database alerts * add database to summaries and alert name * sync threshold and lookback period with azure * Update csp-mixin/alerts/azure-alerts.yml Co-authored-by: Ana Ivanov <[email protected]> * update deadlock alert to include rate * fix lint * add rate to counter metrics --------- Co-authored-by: Ana Ivanov <[email protected]>
1 parent 5a6b86b commit da81b34

File tree

1 file changed

+128
-2
lines changed

1 file changed

+128
-2
lines changed

csp-mixin/alerts/azure-alerts.yml

Lines changed: 128 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ groups:
1111
service: 'Azure Virtual Machines'
1212
namespace: cloud-provider-azure
1313
annotations:
14-
summary: 'CPU utilization is too high.'
14+
summary: 'VM CPU utilization is too high.'
1515
description: 'The VM {{ $labels.resourceName }} is under heavy load and may become unresponsive.'
1616
dashboard_uid: '58f33c50e66c911b0ad8a25aa438a96e'
1717

@@ -22,9 +22,135 @@ groups:
2222
keep_firing_for: 10m
2323
labels:
2424
severity: critical
25-
service: 'Azure Virtual Machines.'
25+
service: 'Azure Virtual Machines'
2626
namespace: cloud-provider-azure
2727
annotations:
2828
summary: 'VM unavailable.'
2929
description: 'The VM {{ $labels.resourceName }} is not functioning or crashed, which may require immediate action.'
3030
dashboard_uid: '58f33c50e66c911b0ad8a25aa438a96e'
31+
32+
- alert: AzureDatabaseHighDtuConsumption
33+
expr: |
34+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_dtu_consumption_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 90
35+
for: 10m
36+
keep_firing_for: 10m
37+
labels:
38+
severity: critical
39+
service: 'Azure SQL database'
40+
namespace: cloud-provider-azure
41+
annotations:
42+
summary: 'High database DTU consumption.'
43+
description: 'Check active queries and optimize indexes or consider scaling up DTUs to handle load in {{ $labels.resourceName }} database.'
44+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
45+
46+
- alert: AzureDatabaseHighStorageUsage
47+
expr: |
48+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_storage_percent_maximum_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 95
49+
for: 15m
50+
keep_firing_for: 10m
51+
labels:
52+
severity: critical
53+
service: 'Azure SQL database'
54+
namespace: cloud-provider-azure
55+
annotations:
56+
summary: 'High database Storage usage.'
57+
description: 'Archive or delete old data, or scale up storage capacity in {{ $labels.resourceName }} database.'
58+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
59+
60+
- alert: AzureDatabaseHighDeadlockCount
61+
expr: |
62+
sum by (job,resourceGroup,subscriptionName,resourceName) (rate(azure_microsoft_sql_servers_databases_deadlock_total_count{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}[5m])) > 5
63+
for: 10m
64+
keep_firing_for: 10m
65+
labels:
66+
severity: info
67+
service: 'Azure SQL database'
68+
namespace: cloud-provider-azure
69+
annotations:
70+
summary: 'High database Deadlock count.'
71+
description: 'Check {{ $labels.resourceName }} database logs for deadlock details and optimize affected queries.'
72+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
73+
74+
- alert: AzureDatabaseHighUserCpuUsage
75+
expr: |
76+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_cpu_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 90
77+
for: 10m
78+
keep_firing_for: 10m
79+
labels:
80+
severity: warning
81+
service: 'Azure SQL database'
82+
namespace: cloud-provider-azure
83+
annotations:
84+
summary: 'High database User CPU usage.'
85+
description: 'Identify high CPU queries on {{ $labels.resourceName }} database and optimize them.'
86+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
87+
88+
- alert: AzureDatabaseHighSystemFailedConnections
89+
expr: |
90+
sum by (job,resourceGroup,subscriptionName,resourceName) (rate(azure_microsoft_sql_servers_databases_connection_failed_total_count{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}[5m])) > 10
91+
for: 5m
92+
keep_firing_for: 10m
93+
labels:
94+
severity: warning
95+
service: 'Azure SQL database'
96+
namespace: cloud-provider-azure
97+
annotations:
98+
summary: 'High number of database System Failed connections.'
99+
description: 'Check network problems, firewall restrictions or high resource consumption affecting application access to the {{ $labels.resourceName }} database.'
100+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
101+
102+
- alert: AzureDatabaseHighUserFailedConnections
103+
expr: |
104+
sum by (job,resourceGroup,subscriptionName,resourceName) (rate(azure_microsoft_sql_servers_databases_connection_failed_user_error_total_count{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}[5m])) > 10
105+
for: 15m
106+
keep_firing_for: 10m
107+
labels:
108+
severity: warning
109+
service: 'Azure SQL database'
110+
namespace: cloud-provider-azure
111+
annotations:
112+
summary: 'High number of database User Failed connections.'
113+
description: 'Check for authentication problems, network configuration errors, firewall issues, or resource constraints, affecting database accessibility for users on database {{ $labels.resourceName }}.'
114+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
115+
116+
- alert: AzureDatabaseHighWorkerUsage
117+
expr: |
118+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_workers_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 60
119+
for: 5m
120+
keep_firing_for: 10m
121+
labels:
122+
severity: critical
123+
service: 'Azure SQL database'
124+
namespace: cloud-provider-azure
125+
annotations:
126+
summary: 'High database worker usage.'
127+
description: 'Look for long execution queries, review the number of concurrent queries and requests being sent to the database or check if there are any blocking sessions or deadlocks into the {{ $labels.resourceName }} database.'
128+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
129+
130+
- alert: AzureDatabaseHighDataIoUsage
131+
expr: |
132+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_physical_data_read_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 90
133+
for: 15m
134+
keep_firing_for: 10m
135+
labels:
136+
severity: info
137+
service: 'Azure SQL database'
138+
namespace: cloud-provider-azure
139+
annotations:
140+
summary: 'High database data IO usage.'
141+
description: 'Review queries with high read or write activity, check if there are missing indexes or inefficient indexes that result in full table scans and assess the volume of transactions into the {{ $labels.resourceName }} database.'
142+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'
143+
144+
- alert: AzureDatabaseLowTempdbLogSpace
145+
expr: |
146+
avg by (job,resourceGroup,subscriptionName,resourceName) (azure_microsoft_sql_servers_databases_tempdb_log_used_percent_average_percent{job=~".+",resourceGroup=~".+",subscriptionName=~".+",resourceName=~".+"}) > 60
147+
for: 5m
148+
keep_firing_for: 10m
149+
labels:
150+
severity: critical
151+
service: 'Azure SQL database'
152+
namespace: cloud-provider-azure
153+
annotations:
154+
summary: 'Low database tempdb log space.'
155+
description: 'Look for active sessions that might be using TempDB intensively, identify stored procedures or queries that create temporary tables or objects, and also look for long-running or memory-intensive queries that rely heavily on TempDB into the {{ $labels.resourceName }} database.'
156+
dashboard_uid: '82c5b6cf30db5b601c5cc3f5d8d4284d'

0 commit comments

Comments
 (0)