Skip to content

Commit 056b686

Browse files
[doc] update threshold alarm doc (apache#1983)
Co-authored-by: zhangshenghang <[email protected]> Co-authored-by: tomsun28 <[email protected]>
1 parent 4a77aaf commit 056b686

File tree

12 files changed

+193
-125
lines changed

12 files changed

+193
-125
lines changed

home/docs/help/alert_threshold.md

Lines changed: 40 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,54 @@
11
---
2-
id: alert_threshold
3-
title: Threshold alarm configuration
4-
sidebar_label: Threshold alarm configuration
2+
id: alert_threshold
3+
title: Threshold Alert Configuration
4+
sidebar_label: Threshold Alert Configuration
55
---
6+
> Configure alert thresholds for monitoring metrics (warning alert, critical alert, emergency alert). The system triggers alerts based on threshold configuration and collected metric data.
67
7-
> Configure the alarm threshold (warning alarm, critical alarm, emergency alarm) for the monitoring Metrics, and the system calculates and triggers the alarm according to the threshold configuration and the collected Metric data.
8+
## Operational Steps
89

9-
### Operation steps
10+
### 1. Setting Labels for Monitoring Services (Optional)
1011

11-
1. **【Alarm configuration】->【Add new threshold】-> 【Confirm after configuration】**
12+
If you need to categorize alerts, you can set labels for the monitored targets. For example: If you have multiple Linux systems to monitor, and each system has different monitoring metrics, such as: Server A has available memory greater than 1G, Server B has available memory greater than 2G, then you can set labels for Server A and Server B respectively, and then configure alerts based on these labels.
1213

13-
![threshold](/img/docs/help/alert-threshold-1.png)
14+
#### Creating Labels
1415

15-
As shown above:
16+
Navigate to **Label Management -> Add Label**
1617

17-
**Metric object**:Select the monitoring Metric object for which we need to configure the threshold. Eg:website monitoring type -> summary Metric set -> responseTime-response time Metric
18-
**Threshold trigger expression**:Calculate and judge whether to trigger the threshold according to this expression. See the page prompts for expression environment variables and operators. Eg:set the response time greater than 50 to trigger an alarm, and the expression is `responseTime > 50`. For detailed help on threshold expression, see [Threshold expression help](alert_threshold_expr)
19-
**Alarm level**:The alarm level that triggers the threshold, from low to high: warning, critical, emergency.
20-
**Trigger times**:How many times will the threshold be triggered before the alarm is really triggered.
21-
**Notification template**:Notification information Template sent after alarm triggering, See page prompts for template environment variables, eg:`${app}.${metrics}.${metric} Metric's value is ${responseTime}, greater than 50 triggers an alarm`
22-
**Global default**: Set whether this threshold is valid for such global Metrics, and the default is No. After adding a new threshold, you need to associate the threshold with the monitoring object, so that the threshold will take effect for this monitoring.
23-
**Enable alarm**:This alarm threshold configuration is enabled or disabled.
18+
![threshold](/img/docs/help/alert-threshold-2-en.png)
2419

25-
2. **Threshold association monitoring⚠️ 【Alarm configuration】-> 【Threshold just set】-> 【Configure associated monitoring】-> 【Confirm after configuration】**
20+
As shown in the image above, add a new label. Here we set the label as: linux:dev (Linux used in development environment).
2621

27-
> **Note⚠️ After adding a new threshold, you need to associate the threshold with the monitoring object(That is, to set this threshold for which monitoring is effective), so that the threshold will take effect for this monitoring.**
22+
#### Configuring Labels
2823

29-
![threshold](/img/docs/help/alert-threshold-2.png)
24+
TODO Update image name
25+
![threshold](/img/docs/help/alert-threshold-3-en.png)
3026

31-
![threshold](/img/docs/help/alert-threshold-3.png)
27+
As shown in the image above, click on `Add Label`.
3228

33-
**After the threshold alarm is configured, the alarm information that has been successfully triggered can be seen in 【alarm center】.**
34-
**If you need to notify the relevant personnel of the alarm information by email, Wechat, DingDing and Feishu, it can be configured in 【alarm notification】.**
29+
![threshold](/img/docs/help/alert-threshold-4-en.png)
3530

36-
Other issues can be fed back through the communication group ISSUE!
31+
Select our label, here demonstrated as selecting the `linux:dev` label.
32+
33+
### Creating Threshold Rules
34+
35+
Navigate to **[Threshold Rules] -> [Add Threshold Rule] -> [Confirm Configuration]**
36+
37+
![threshold](/img/docs/help/alert-threshold-1-en.png)
38+
39+
The above image explains the configuration details:
40+
41+
- **Metric Object**: Select the monitoring metric object for which we need to configure the threshold. For example: Under website monitoring type -> under the summary metric set -> responseTime metric.
42+
- **Threshold Rule**: Use this expression to calculate whether to trigger the threshold. Expression variables and operators are provided on the page for reference. For example: Set an alert to trigger if response time is greater than 50, the expression would be `responseTime > 50`. For detailed help on threshold expressions, see [Threshold Expression Help](alert_threshold_expr).
43+
- **Alert Level**: The alert level triggered by the threshold, from low to high: warning, critical, emergency.
44+
- **Trigger Count**: Set how many times the threshold must be triggered before the alert is actually triggered.
45+
- **Notification Template**: The template for the notification message sent after the alert is triggered. Template variables are provided on the page. For example: `${app}.${metrics}.${metric} metric value is ${responseTime}, which is greater than 50 triggering the alert`.
46+
- **Bind Label**: Select the label we need to apply. If no label is selected, it will apply to all services corresponding to the set metric object.
47+
- **Apply Globally**: Set whether this threshold applies globally to all such metrics, default is no. After adding a threshold, it needs to be associated with the monitoring object for the threshold to take effect.
48+
- **Recovery Notification**: Whether to send a recovery notification after the alert is triggered, default is not to send.
49+
- **Enable Alert**: Enable or disable this alert threshold configuration.
50+
51+
**The threshold alert configuration is complete, and alerts that have been successfully triggered can be viewed in the [Alert Center].**
52+
**If you need to send alert notifications via email, WeChat, DingTalk, or Feishu, you can configure it in [Alert Notifications].**
53+
54+
For other issues, you can provide feedback through the community chat group or issue tracker!
Lines changed: 62 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,66 @@
11
---
22
id: alert_threshold_expr
3-
title: Threshold trigger expression
4-
sidebar_label: Threshold trigger expression
3+
title: Threshold Trigger Expression
4+
sidebar_label: Threshold Trigger Expression
55
---
66

7-
> When we configure the threshold alarm, we need to configure the threshold trigger expression. The system calculates whether to trigger the alarm according to the expression and the monitoring index value. Here is a detailed introduction to the use of the expression.
8-
9-
#### Operators supported by expressions
10-
11-
```
12-
equals(str1,str2)
13-
==
14-
<
15-
<=
16-
>
17-
>=
18-
!=
19-
( )
20-
+
21-
-
22-
&&
23-
||
24-
```
25-
26-
Rich operators allow us to define expressions freely.
27-
Note⚠️ For the equality of string, please use `equals(str1,str2)`, while for the equality judgment of number, please use == or !=
28-
29-
#### Supported environment variables
30-
> Environment variables, i.e. supported variables such as Metric values, are used in the expression. When the threshold value is calculated and judged, the variables will be replaced with actual values for calculation.
31-
32-
Non fixed environment variables:These variables will change dynamically according to the monitoring Metric object we choose. For example, if we choose **response time Metric of website monitoring**, the environment variables will have `responseTime - This is the response time variable`
33-
If we want to set **when the response time of website monitoring is greater than 400** to trigger an alarm,the expression is `responseTime>400`
34-
35-
Fixed environment variables(Rarely used):`instance : Row instance value`
36-
This variable is mainly used to calculate multiple instances. For example, we collected `usage`(`usage is non fixed environment variables`) of disk C and disk D, but we only want to set the alarm when **the usage of C disk is greater than 80**. Then the expression is `equals(instance,"c")&&usage>80`
37-
38-
#### Expression setting case
39-
40-
1. Website monitoring -> Trigger alarm when the response time is greater than or equal to 400ms
41-
`responseTime>=400`
42-
2. API monitoring -> Trigger alarm when the response time is greater than 3000ms
43-
`responseTime>3000`
44-
3. Entire site monitoring -> Trigger alarm when URL(instance) path is `https://baidu.com/book/3` and the response time is greater than 200ms
45-
`equals(instance,"https://baidu.com/book/3")&&responseTime>200`
46-
4. MYSQL monitoring -> status Metric group -> Trigger alarm when hreads_running(number of running threads) Metric is greater than 7
47-
`threads_running>7`
48-
49-
Other issues can be fed back through the communication group ISSUE!
7+
> When configuring threshold alerts, it is necessary to set up threshold trigger expressions. The system calculates whether to trigger an alert based on the expression and the monitored metric values. Here, we provide a detailed explanation of expression usage.
8+
9+
#### Supported Operators in Expressions
10+
11+
| Operator (Visual Configuration) | Operator (Expression Configuration) | Supported Types | Description |
12+
| ------------------------------- | ----------------------------------- | ------------------------- | -------------------------- |
13+
| Equals | equals(str1,str2) | String | Check if strings are equal |
14+
| Not Equals | !equals(str1,str2) | String | Check if strings are not equal |
15+
| Contains | contains(str1,str2) | String | Check if string contains |
16+
| Not Contains | !contains(str1,str2) | String | Check if string does not contain |
17+
| Matches | matches(str1,str2) | String | Check if string matches regex |
18+
| Not Matches | !matches(str1,str2) | String | Check if string does not match regex |
19+
| Exists | exists(obj) | String, Numeric, Time | Check if value exists |
20+
| Not Exists | !exists(obj) | String, Numeric, Time | Check if value does not exist |
21+
| Greater than | obj1 > obj2 | Numeric, Time | Check if value is greater than |
22+
| Less than | obj1 < obj2 | Numeric, Time | Check if value is less than |
23+
| Greater than or Equal to | obj1 >= obj2 | Numeric, Time | Check if value is greater than or equal to |
24+
| Less than or Equal to | obj1 <= obj2 | Numeric, Time | Check if value is less than or equal to |
25+
| Not Equal to | obj1 != obj2 | Numeric, Time | Check if values are not equal |
26+
| Equal to | obj1 == obj2 | Numeric, Time | Check if values are equal |
27+
28+
#### Expression Function Library List
29+
30+
| Supported Function Library | Description |
31+
| -------------------------------- | -------------------------------------------------------------- |
32+
| condition ? trueExpression : falseExpression | Ternary operator |
33+
| toDouble(str) | Convert string to Double type |
34+
| toBoolean(str) | Convert string to Boolean type |
35+
| toInteger(str) | Convert string to Integer type |
36+
| array[n] | Retrieve the nth element of an array |
37+
| * | Multiplication |
38+
| / | Division |
39+
| % | Modulo |
40+
| ( and ) | Parentheses for controlling the order of operations in logical or mathematical expressions |
41+
| + | Addition |
42+
| - | Subtraction |
43+
| && | Logical AND operator |
44+
| \|\| | Logical OR operator |
45+
46+
#### Supported Environment Variables
47+
48+
> Environment variables refer to variables supported by metric values, used in expressions. During threshold calculation and judgment, these variables will be replaced with actual values.
49+
50+
Non-fixed Environment Variables: These variables change dynamically based on the selected monitoring metric. For example, if we choose **response time metric for website monitoring**, the environment variable would be `responseTime - this represents response time variable`. If we want to set an alert trigger for **response time greater than 400 for website monitoring**, the expression would be `responseTime>400`.
51+
52+
Fixed Environment Variables (Less commonly used): `instance: instance value`
53+
This variable is mainly used for calculations involving multiple instances. For instance, if we collect usage metrics for C drive and D drive (`usage` being a non-fixed environment variable), and we only want to set an alert for **usage greater than 80 for the C drive**, the expression would be `equals(instance,"c")&&usage>80`.
54+
55+
#### Expression Configuration Examples
56+
57+
1. Website Monitoring -> Alert when response time is greater than or equal to 400ms
58+
`responseTime>=400`
59+
2. API Monitoring -> Alert when response time is greater than 3000ms
60+
`responseTime>3000`
61+
3. Overall Monitoring -> Alert when response time for URL (instance) path 'https://baidu.com/book/3' is greater than 200ms
62+
`equals(instance,"https://baidu.com/book/3")&&responseTime>200`
63+
4. MYSQL Monitoring -> Alert when 'threads_running' metric under 'status' exceeds 7
64+
`threads_running>7`
65+
66+
If you encounter any issues, feel free to discuss and provide feedback through our community group or ISSUE tracker!
Lines changed: 39 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,54 @@
11
---
22
id: alert_threshold
3-
title: 阈值告警配置
4-
sidebar_label: 阈值告警配置
3+
title: 阈值告警配置
4+
sidebar_label: 阈值告警配置
55
---
6+
> 对监控指标配置告警阈值(警告告警,严重告警,紧急告警),系统根据阈值配置和采集指标数据计算触发告警。
67
7-
> 对监控指标配置告警阈值(警告告警,严重告警,紧急告警),系统根据阈值配置和采集指标数据计算触发告警。
8+
## 操作步骤
89

9-
### 操作步骤
10+
### 1. 监控服务设置标签(可选)
1011

11-
1. **【告警配置】->【新增阈值】-> 【配置后确定】**
12+
如果您需要对告警进行分类,可以对监控的目标设置标签。如:您有多个Linux系统需要监控,并且每个系统监控指标不同,比如:A服务器可用内存大于1G,B服务器可用内存大于2G,那么您可以为A服务器和B服务器分别设置标签,然后针对标签进行告警配置。
1213

13-
![threshold](/img/docs/help/alert-threshold-1.png)
14+
#### 创建标签
1415

15-
如上图:
16+
依次点击 **标签管理 -> 新增标签**
1617

17-
**指标对象**:选择我们需要配置阈值的监控指标对象 例如:网站监控类型下的 -> summary指标集合下的 -> responseTime响应时间指标
18-
**阈值触发表达式**:根据此表达式来计算判断是否触发阈值,表达式环境变量和操作符见页面提示,例如:设置响应时间大于50触发告警,表达式为 `responseTime > 50`。阈值表达式详细帮助见 [阈值表达式帮助](alert_threshold_expr)
19-
**告警级别**:触发阈值的告警级别,从低到高依次为:警告-warning,严重-critical,紧急-emergency
20-
**触发次数**:设置触发阈值多少次之后才会真正的触发告警
21-
**通知模版**:告警触发后发送的通知信息模版,模版环境变量见页面提示,例如:`${app}.${metrics}.${metric}指标的值为${responseTime},大于50触发告警`
22-
**全局默认**: 设置此阈值是否对全局的此类指标都应用有效,默认否。新增阈值后还需将阈值与监控对象关联,这样阈值才会对此监控生效。
23-
**启用告警**:此告警阈值配置开启生效或关闭
18+
![threshold](/img/docs/help/alert-threshold-2.png)
2419

25-
2. ** 阈值关联监控⚠️ 【告警配置】-> 【将刚设置的阈值】-> 【配置关联监控】-> 【配置后确定】**
20+
如上图所示,新增标签,这里我们设置标签为: linux:dev (开发环境使用Linux)
2621

27-
> ** 注意⚠️ 新增阈值后还需将阈值与监控对象关联(即设置此阈值对哪些监控有效),这样阈值才会对此监控生效 **
22+
#### 配置标签
23+
TODO 图片名称更新
24+
![threshold](/img/docs/help/alert-threshold-3.png)
2825

29-
![threshold](/img/docs/help/alert-threshold-2.png)
26+
如上图所示,我们点击`新增标签`
3027

31-
![threshold](/img/docs/help/alert-threshold-3.png)
28+
![threshold](/img/docs/help/alert-threshold-4.png)
3229

33-
**阈值告警配置完毕,已经被成功触发的告警信息可以在【告警中心】看到。**
34-
**若需要将告警信息邮件,微信,钉钉飞书通知给相关人员,可以在【告警通知】配置。**
30+
选择我们的标签,这里演示选择`linux:dev`标签
3531

36-
其它问题可以通过交流群ISSUE反馈哦!
32+
### 创建阈值规则
33+
34+
依次点击 **【阈值规则】->【新增阈值规则】-> 【配置后确定】**
35+
36+
![threshold](/img/docs/help/alert-threshold-1.png)
37+
38+
上图配置具体说明:
39+
40+
- **指标对象**:选择我们需要配置阈值的监控指标对象 例如:网站监控类型下的 -> summary指标集合下的 -> responseTime响应时间指标
41+
- **阈值规则**:根据此表达式来计算判断是否触发阈值,表达式环境变量和操作符见页面提示,例如:设置响应时间大于50触发告警,表达式为 `responseTime > 50`。阈值表达式详细帮助见 [阈值表达式帮助](alert_threshold_expr)
42+
- **告警级别**:触发阈值的告警级别,从低到高依次为:警告-warning,严重-critical,紧急-emergency
43+
- **触发次数**:设置触发阈值多少次之后才会真正的触发告警
44+
- **通知模版**:告警触发后发送的通知信息模版,模版环境变量见页面提示,例如:`${app}.${metrics}.${metric}指标的值为${responseTime},大于50触发告警`
45+
- **绑定标签**:选择我们需要应用的标签,如果不选择标签则会应用到设置指标对象对应的所有服务上。
46+
- **应用全局**:设置此阈值是否对全局的此类指标都应用有效,默认否。新增阈值后还需将阈值与监控对象关联,这样阈值才会对此监控生效。
47+
- **恢复通知**:告警触发后是否发送恢复通知,默认不发送。
48+
- **启用告警**:此告警阈值配置开启生效或关闭
49+
50+
51+
**阈值告警配置完毕,已经被成功触发的告警信息可以在【告警中心】看到。**
52+
**若需要将告警信息邮件,微信,钉钉飞书通知给相关人员,可以在【告警通知】配置。**
53+
54+
其它问题可以通过交流群ISSUE反馈哦!

0 commit comments

Comments
 (0)