Skip to content

Commit b9c532c

Browse files
committed
feat(Metrics&Alert): 新增/v1/integrations/alertmanager/webhook接口于告警接收与处理功能,接收告警存入postgres数据库
1 parent 9fd57bb commit b9c532c

File tree

26 files changed

+1322
-62
lines changed

26 files changed

+1322
-62
lines changed

ENV_SETUP.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,46 @@ export SEARCH_BACKEND=DuckDuckGo
5959
export NO_FORCE_TERMINAL=false
6060
```
6161

62+
## Alerting 服务环境变量(数据库 + Webhook 鉴权)
63+
64+
用于接收 Alertmanager Webhook 并将事件入库。
65+
66+
### macOS/Linux
67+
```bash
68+
# 数据库连接(示例:本机 Docker Postgres)
69+
export DB_HOST=localhost
70+
export DB_PORT=5432
71+
export DB_USER=postgres
72+
export DB_PASSWORD=postgres
73+
export DB_NAME=zeroops
74+
export DB_SSLMODE=disable
75+
76+
# Webhook 鉴权(与 Alertmanager http_config 对齐,二选一)
77+
# 1) Basic Auth
78+
export ALERT_WEBHOOK_BASIC_USER=alert
79+
export ALERT_WEBHOOK_BASIC_PASS=REDACTED
80+
# 2) Bearer Token(如使用该方式,注释掉上面的 Basic)
81+
# export ALERT_WEBHOOK_BEARER=your_token_here
82+
```
83+
84+
### Windows(PowerShell)
85+
```powershell
86+
$env:DB_HOST="localhost"
87+
$env:DB_PORT="5432"
88+
$env:DB_USER="postgres"
89+
$env:DB_PASSWORD="postgres"
90+
$env:DB_NAME="zeroops"
91+
$env:DB_SSLMODE="disable"
92+
93+
# Basic Auth
94+
$env:ALERT_WEBHOOK_BASIC_USER="alert"
95+
$env:ALERT_WEBHOOK_BASIC_PASS="REDACTED"
96+
# 或 Bearer
97+
# $env:ALERT_WEBHOOK_BEARER="your_token_here"
98+
```
99+
100+
> 启动服务后,可用 README 中的 curl 示例向 `/v1/integrations/alertmanager/webhook` 发送事件并在数据库中验证。
101+
62102
## 环境变量详细说明
63103

64104
### 必需配置

cmd/zeroops/main.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package main
22

33
import (
44
"github.com/fox-gonic/fox"
5+
alertapi "github.com/qiniu/zeroops/internal/alerting/api"
56
"github.com/qiniu/zeroops/internal/config"
67
"github.com/qiniu/zeroops/internal/middleware"
78
servicemanager "github.com/qiniu/zeroops/internal/service_manager"
@@ -27,6 +28,7 @@ func main() {
2728

2829
router := fox.New()
2930
router.Use(middleware.Authentication)
31+
alertapi.NewApiWithConfig(router, cfg)
3032
if err := serviceManagerSrv.UseApi(router); err != nil {
3133
log.Fatal().Err(err).Msg("bind serviceManagerApi failed.")
3234
}

docs/alerting/api.md

Lines changed: 90 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,19 @@
44

55
本文档描述了监控告警服务的 RESTful API 接口,包括告警列表查询、详情获取等核心功能。
66

7+
人工模拟prometheus调用我们的接受告警接口,收到告警事件
8+
9+
10+
实现状态说明:
11+
- 已实现:接收 Alertmanager Webhook(/v1/integrations/alertmanager/webhook)
12+
- 规划中:告警列表与详情查询接口(本文档描述为对外契约,后续实现)
13+
14+
715
## 基础信息
816

917
- **Base URL**: `/v1`
1018
- **Content-Type**: `application/json`
11-
- **认证方式**: Bearer Token(具体实现待定)
19+
- **认证方式**: Webhook 端点可通过环境变量启用 Basic 或 Bearer 认证(见下文)。其他查询接口在实现时将采用 Bearer Token。
1220

1321
## 接口列表
1422

@@ -34,23 +42,19 @@ GET /v1/issues?start={start}&limit={limit}[&state={state}]
3442
{
3543
"items": [
3644
{
37-
"id": "issue_20250505_001",
45+
"id": "xxx",
3846
"state": "Closed",
3947
"level": "P0",
4048
"alertState": "Restored",
4149
"title": "yzh S3APIV2s3apiv2.putobject 0_64K上传响应时间95值:50012ms > 450ms",
4250
"labels": [
4351
{"key": "api", "value": "s3apiv2.putobject"},
44-
{"key": "idc", "value": "yzh"},
45-
{"key": "service", "value": "s3api"}
52+
{"key": "idc", "value": "yzh"}
4653
],
4754
"alertSince": "2025-05-05T11:00:00.000Z"
4855
}
4956
],
50-
"pagination": {
51-
"nextStart": "issue_20250505_002",
52-
"hasMore": true
53-
}
57+
"next": "xxxx"
5458
}
5559
```
5660

@@ -89,7 +93,6 @@ GET /v1/issues/{issueID}
8993
{"key": "service", "value": "s3api"}
9094
],
9195
"alertSince": "2025-05-05T11:00:00.000Z",
92-
"resolvedAt": "2025-05-05T11:15:00.000Z",
9396
"comments": [
9497
{
9598
"createAt": "2025-05-05T11:00:30.000Z",
@@ -126,7 +129,6 @@ GET /v1/issues/{issueID}
126129
| title | string | 告警标题描述 |
127130
| labels | Label[] | 标签数组 |
128131
| alertSince | string | 告警发生时间(ISO 8601格式) |
129-
| resolvedAt | string | 问题解决时间(仅在已解决时存在) |
130132
| comments | Comment[] | 处理评论列表(仅详情接口返回) |
131133

132134
### Label 对象
@@ -208,6 +210,84 @@ const detailResponse = await fetch(`/v1/issues/${issueId}`, {
208210
const detail = await detailResponse.json();
209211
```
210212

213+
### 3. 接收 Alertmanager Webhook(告警接入)
214+
215+
用于接收 Alertmanager 推送的告警事件。
216+
217+
**请求:**
218+
```http
219+
POST /v1/integrations/alertmanager/webhook
220+
Content-Type: application/json
221+
```
222+
223+
**认证:**
224+
- 可选鉴权(通过环境变量开启):
225+
- Basic:设置 `ALERT_WEBHOOK_BASIC_USER``ALERT_WEBHOOK_BASIC_PASS`
226+
- Bearer:设置 `ALERT_WEBHOOK_BEARER`
227+
- 若上述变量均未设置,则该端点不强制鉴权(开发/测试便捷)
228+
229+
**请求体(示例 - firing):**
230+
```json
231+
{
232+
"receiver": "our-webhook",
233+
"status": "firing",
234+
"alerts": [
235+
{
236+
"status": "firing",
237+
"labels": {
238+
"alertname": "HighRequestLatency",
239+
"service": "serviceA",
240+
"severity": "P1",
241+
"idc": "yzh"
242+
},
243+
"annotations": {
244+
"summary": "p95 latency over threshold",
245+
"description": "apitime p95 > 450ms"
246+
},
247+
"startsAt": "2025-05-05T11:00:00Z",
248+
"endsAt": "0001-01-01T00:00:00Z",
249+
"generatorURL": "http://prometheus/graph?g0.expr=...",
250+
"fingerprint": "3b1b7f4e8f0e"
251+
}
252+
],
253+
"groupLabels": {"alertname": "HighRequestLatency"},
254+
"commonLabels": {"service": "serviceA", "severity": "P1"},
255+
"version": "4"
256+
}
257+
```
258+
259+
**字段要点:**
260+
- `status`: `firing` | `resolved`
261+
- `alerts[]`: 多条告警,关键字段 `labels``annotations``startsAt``fingerprint`
262+
- `fingerprint + startsAt`:用于应用层幂等
263+
264+
**响应:**
265+
- `200 OK {"ok": true, "created": <n>}``status=firing` 时返回本次创建条数
266+
- `200 OK {"ok": true, "msg": "ignored (not firing)"}` 当非 `firing` 时快速返回
267+
268+
**curl 示例:**
269+
```bash
270+
# firing
271+
curl -X POST http://localhost:8080/v1/integrations/alertmanager/webhook \
272+
-H 'Content-Type: application/json' \
273+
-d '{
274+
"receiver":"our-webhook",
275+
"status":"firing",
276+
"alerts":[{
277+
"status":"firing",
278+
"labels":{"alertname":"HighRequestLatency","service":"serviceA","severity":"P1","idc":"yzh"},
279+
"annotations":{"summary":"p95 latency over threshold","description":"apitime p95 > 450ms"},
280+
"startsAt":"2025-05-05T11:00:00Z",
281+
"endsAt":"0001-01-01T00:00:00Z",
282+
"generatorURL":"http://prometheus/graph?g0.expr=...",
283+
"fingerprint":"3b1b7f4e8f0e"
284+
}],
285+
"groupLabels":{"alertname":"HighRequestLatency"},
286+
"commonLabels":{"service":"serviceA","severity":"P1"},
287+
"version":"4"
288+
}'
289+
```
290+
211291
## 版本历史
212292

213293
- **v1.0** (2025-09-11): 初始版本,支持基础的告警列表和详情查询

0 commit comments

Comments
 (0)