Skip to content

Commit bc51939

Browse files
committed
Merge branch 'feature/alert-healing-with-observation-window' into develop
2 parents 7ae82d1 + 132f692 commit bc51939

File tree

12 files changed

+1492
-70
lines changed

12 files changed

+1492
-70
lines changed

docs/alerting/database-design.md

Lines changed: 44 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,15 @@
22

33
## 概述
44

5-
本文档为最新数据库设计,总计包含 6 张表:
5+
本文档为最新数据库设计,总计包含 7 张表:
66

77
- alert_issues
88
- alert_issue_comments
99
- alert_meta_change_logs
1010
- alert_rules
1111
- alert_rule_metas
1212
- service_states
13+
- heal_actions
1314

1415
## 数据表设计
1516

@@ -111,7 +112,7 @@
111112

112113
---
113114

114-
### 7) service_states(服务状态表)
115+
### 6) service_states(服务状态表)
115116

116117
追踪服务在某一版本上的健康状态与处置进度。
117118

@@ -127,6 +128,34 @@
127128
**索引建议:**
128129
- PRIMARY KEY: `(service, version)`
129130

131+
---
132+
133+
### 7) heal_actions(告警治愈解决方案表)
134+
135+
存储不同故障域对应的治愈方案和规则。
136+
137+
| 字段名 | 类型 | 说明 |
138+
|--------|------|------|
139+
| id | varchar(255) PK | 治愈方案 ID |
140+
| desc | text | 简单描述,如 action 是处理什么告警场景的 |
141+
| type | varchar(255) | 对应的故障域类型 |
142+
| rules | jsonb | 条件规则:{condition1: action1, condition2: action2} |
143+
144+
**索引建议:**
145+
- PRIMARY KEY: `id`
146+
- INDEX: `(type)`
147+
148+
**示例数据:**
149+
```sql
150+
INSERT INTO heal_actions (id, desc, type, rules) VALUES
151+
('service_version_rollback', '服务版本回滚方案', 'service_version_issue',
152+
'{"deployment_status": "deploying", "action": "rollback", "target": "previous_version"}'),
153+
('service_version_alert', '服务版本告警方案', 'service_version_issue',
154+
'{"deployment_status": "deployed", "action": "alert", "message": "版本已发布,暂不支持自动回滚"}');
155+
```
156+
157+
TODO: health_state映射逻辑
158+
130159
## 数据关系(ER)
131160

132161
```mermaid
@@ -175,13 +204,25 @@ erDiagram
175204
text content
176205
}
177206
207+
heal_actions {
208+
varchar id PK
209+
text desc
210+
varchar type
211+
jsonb rules
212+
}
213+
178214
%% 通过 service 等标签在应用层逻辑关联
179215
alert_rule_metas ||..|| alert_rules : "by alert_name"
180216
service_states ||..|| alert_rule_metas : "by service/version labels"
217+
heal_actions ||..|| alert_issues : "by fault domain analysis"
181218
```
182219

183220
## 数据流转
184221

185222
1.`alert_rules` 为模版,结合 `alert_rule_metas` 渲染出面向具体服务/版本等的规则(labels 可为空 `{}` 表示全局默认,或包含如 service/version 等标签)。
186223
2. 指标或规则参数发生调整时,记录到 `alert_meta_change_logs`
187-
3. 规则触发创建 `alert_issues`;处理过程中的动作写入 `alert_issue_comments`
224+
3. 规则触发创建 `alert_issues`;处理过程中的动作写入 `alert_issue_comments`
225+
4. **告警治愈流程**
226+
- P0 告警:根据 `alert_issues.labels` 识别故障域,查询 `heal_actions` 获取治愈方案
227+
- 执行治愈操作(如回滚),成功后更新 `alert_issues``service_states` 状态
228+
- P1/P2 告警:直接进入下钻分析,记录分析结果到 `alert_issue_comments`

go.mod

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,15 @@ require (
99
github.com/lib/pq v1.10.9
1010
github.com/redis/go-redis/v9 v9.5.1
1111
github.com/rs/zerolog v1.34.0
12+
github.com/stretchr/testify v1.11.1
1213
)
1314

1415
require (
1516
github.com/bytedance/sonic v1.13.3 // indirect
1617
github.com/bytedance/sonic/loader v0.2.4 // indirect
1718
github.com/cespare/xxhash/v2 v2.2.0 // indirect
1819
github.com/cloudwego/base64x v0.1.5 // indirect
20+
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
1921
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
2022
github.com/gabriel-vasile/mimetype v1.4.9 // indirect
2123
github.com/gin-contrib/cors v1.7.6 // indirect
@@ -39,6 +41,7 @@ require (
3941
github.com/modern-go/reflect2 v1.0.2 // indirect
4042
github.com/natefinch/lumberjack v2.0.0+incompatible // indirect
4143
github.com/pelletier/go-toml/v2 v2.2.4 // indirect
44+
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
4245
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
4346
github.com/ugorji/go/codec v1.3.0 // indirect
4447
golang.org/x/arch v0.18.0 // indirect

internal/alerting/database/database.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,8 @@ func (d *Database) ExecContext(ctx context.Context, q string, args ...any) (sql.
3838
func (d *Database) QueryContext(ctx context.Context, q string, args ...any) (*sql.Rows, error) {
3939
return d.db.QueryContext(ctx, q, args...)
4040
}
41+
42+
// QueryRowContext exposes database/sql QueryRowContext for single row SELECT queries.
43+
func (d *Database) QueryRowContext(ctx context.Context, q string, args ...any) *sql.Row {
44+
return d.db.QueryRowContext(ctx, q, args...)
45+
}

0 commit comments

Comments
 (0)