Skip to content

Commit 2c2a86e

Browse files
authored
[ops] Introduce GitpodWsManagerMk2BackupFailureError and GitpodWsManagerMk2BackupFailureCritical (#20259)
* [ops] Introduce GitpodWsManagerMk2BackupFailureError and GitpodWsManagerMk2BackupFailureCritical * Fix
1 parent e63652e commit 2c2a86e

File tree

1 file changed

+20
-0
lines changed

1 file changed

+20
-0
lines changed

operations/observability/mixins/workspace/rules/satellite/workspaces.yaml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,23 @@ spec:
4545
sum by(cluster) (avg_over_time(gitpod_workspace_regular_not_active_percentage_mk2[1m]) > 0)
4646
AND
4747
sum by(cluster) (rate(gitpod_ws_manager_mk2_workspace_startup_seconds_sum{type="Regular"}[1m])) == 0
48+
- alert: GitpodWsManagerMk2BackupFailureError
49+
labels:
50+
severity: error
51+
team: engine
52+
annotations:
53+
runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/WorkspaceBackupFailures.md
54+
summary: Workspace backups failed recently in cluster {{ $labels.cluster }}
55+
description: This can happen when a single node has failed in the cloud provider
56+
expr: |
57+
sum by (cluster) (increase(gitpod_ws_manager_mk2_workspace_backups_failure_total{cluster!~"ephemeral.*"}[1h])) <= 16
58+
- alert: GitpodWsManagerMk2BackupFailureCritical
59+
labels:
60+
severity: critical
61+
team: engine
62+
annotations:
63+
runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/WorkspaceBackupFailures.md
64+
summary: Workspace backups failed recently in cluster {{ $labels.cluster }}
65+
description: This can be an indicator of two or more nodes failing in a cloud provider
66+
expr: |
67+
sum by (cluster) (increase(gitpod_ws_manager_mk2_workspace_backups_failure_total{cluster!~"ephemeral.*"}[1h])) > 16

0 commit comments

Comments
 (0)