You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alerting tools in Elasticsearch and Kibana provide functionality to monitor data and notify you about significant changes or events in real time. This page provides an overview of how the key components work.
For example, when monitoring a set of servers, a rule might:
19
19
20
-
% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
20
+
* Check for average CPU usage > 0.9 on each server for the last two minutes (condition).
21
+
* Check every minute (schedule).
22
+
* Send a warning email message via SMTP with subject `CPU on {{server}} is high` (action).
21
23
22
-
$$$alerting-getting-started$$$
24
+
### Conditions [rules-conditions]
23
25
24
-
$$$rules-alerts$$$
26
+
Each project type supports a specific set of rule types. Each *rule type* provides its own way of defining the conditions to detect, but an expression formed by a series of clauses is a common pattern. For example, in an {{es}} query rule, you specify an index, a query, and a threshold, which uses a metric aggregation operation (`count`, `average`, `max`, `min`, or `sum`):
:alt: UI for defining rule conditions in an {{es}} query rule
30
+
:class: screenshot
31
+
:::
27
32
28
-
$$$alerting-concepts-scheduling$$$
33
+
### Schedule [rules-schedule]
29
34
30
-
$$$alerting-concepts-actions$$$
35
+
All rules must have a check interval, which defines how often to evaluate the rule conditions. Checks are queued; they run as close to the defined value as capacity allows.
36
+
37
+
::::{important}
38
+
The intervals of rule checks in {{kib}} are approximate. Their timing is affected by factors such as the frequency at which tasks are claimed and the task load on the system. Refer to [Alerting production considerations](../../deploy-manage/production-guidance/kibana-alerting-production-considerations.md)
39
+
40
+
::::
41
+
42
+
### Actions [rules-actions]
43
+
44
+
You can add one or more actions to your rule to generate notifications when its conditions are met. Recovery actions likewise run when rule conditions are no longer met.
45
+
46
+
When defining actions in a rule, you specify:
47
+
48
+
* A connector
49
+
* An action frequency
50
+
* A mapping of rule values to properties exposed for that type of action
51
+
52
+
Each action uses a connector, which provides connection information for a {{kib}} service or third party integration, depending on where you want to send the notifications. The specific list of connectors that you can use in your rule vary by project type. Refer to [{{connectors-app}}](../../deploy-manage/manage-connectors.md).
53
+
54
+
After you select a connector, set the *action frequency*. If you want to reduce the number of notifications you receive without affecting their timeliness, some rule types support alert summaries. For example, if you create an {{es}} query rule, you can set the action frequency such that you receive summaries of the new, ongoing, and recovered alerts on a custom interval:
:alt: UI for defining rule conditions in an {{es}} query rule
58
+
:class: screenshot
59
+
:::
60
+
61
+
Alternatively, you can set the action frequency such that the action runs for each alert. If the rule type does not support alert summaries, this is your only available option. You must choose when the action runs (for example, at each check interval, only when the alert status changes, or at a custom action interval). You must also choose an action group, which affects whether the action runs. Each rule type has a specific set of valid action groups. For example, you can set *Run when* to `Query matched` or `Recovered` for the {{es}} query rule:
Each connector supports a specific set of actions for each action group and enables different action properties. For example, you can have actions that create an {{opsgenie}} alert when rule conditions are met and recovery actions that close the {{opsgenie}} alert.
69
+
70
+
Some types of rules enable you to further refine the conditions under which actions run. For example, you can specify that actions run only when an alert occurs within a specific time frame or when it matches a KQL query.
71
+
72
+
::::{tip}
73
+
If you are not using alert summaries, actions are triggered per alert and a rule can end up generating a large number of actions. Take the following example where a rule is monitoring three servers every minute for CPU usage > 0.9, and the action frequency is `On check intervals`:
74
+
75
+
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
76
+
* Minute 2: X123 and Y456 > 0.9. *Two emails* are sent, one for X123 and one for Y456.
77
+
* Minute 3: X123, Y456, Z789 > 0.9. *Three emails* are sent, one for each of X123, Y456, Z789.
78
+
79
+
In this example, three emails are sent for server X123 in the span of 3 minutes for the same rule. Often, it’s desirable to suppress these re-notifications. If you set the action frequency to `On custom action intervals` with an interval of 5 minutes, you reduce noise by getting emails only every 5 minutes for servers that continue to exceed the threshold:
80
+
81
+
* Minute 1: server X123 > 0.9. *One email* will be sent for server X123.
82
+
* Minute 2: X123 and Y456 > 0.9. *One email* will be sent for Y456.
83
+
* Minute 3: X123, Y456, Z789 > 0.9. *One email* will be sent for Z789.
84
+
85
+
To get notified only once when a server exceeds the threshold, you can set the action frequency to `On status changes`. Alternatively, if the rule type supports alert summaries, consider using them to reduce the volume of notifications.
86
+
87
+
::::
88
+
89
+
#### Action variables [rules-action-variables]
90
+
91
+
You can pass rule values to an action at the time a condition is detected. To view the list of variables available for your rule, click the "add rule variable" button:
For more information about common action variables, refer to [Rule actions variables](../../explore-analyze/alerts-cases/alerts/rule-action-variables.md)
99
+
100
+
### Alerts [rules-alerts]
101
+
102
+
When checking for a condition, a rule might identify multiple occurrences of the condition. {{kib}} tracks each of these alerts separately. Depending on the action frequency, an action occurs per alert or at the specified alert summary interval.
103
+
104
+
Using the server monitoring example, each server with average CPU > 0.9 is tracked as an alert. This means a separate email is sent for each server that exceeds the threshold whenever the alert status changes.
105
+
106
+
### Putting it all together [rules-putting-it-all-together]
107
+
108
+
A rule consists of conditions, actions, and a schedule. When conditions are met, alerts are created that render actions and invoke them. To make action setup and update easier, actions use connectors that centralize the information used to connect with {{kib}} services and third-party integrations. The following example ties these concepts together:
1. Any time a rule’s conditions are met, an alert is created. This example checks for servers with average CPU > 0.9. Three servers meet the condition, so three alerts are created.
116
+
2. Alerts create actions according to the action frequency, as long as they are not muted or throttled. When actions are created, its properties are filled with actual values. In this example, three actions are created when the threshold is met, and the template string `{{server}}` is replaced with the appropriate server name for each alert.
117
+
3. {{kib}} runs the actions, sending notifications by using a third party integration like an email service.
118
+
4. If the third party integration has connection parameters or credentials, {{kib}} fetches these from the appropriate connector.
Copy file name to clipboardExpand all lines: explore-analyze/alerts-cases/alerts/alerting-common-issues.md
-9Lines changed: 0 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,6 @@ mapped_pages:
7
7
8
8
This page describes how to resolve common problems you might encounter with Alerting.
9
9
10
-
11
10
## Rules with small check intervals run late [rules-small-check-interval-run-late]
12
11
13
12
**Problem**
@@ -22,7 +21,6 @@ Either tweak the [{{kib}} Task Manager settings](https://www.elastic.co/guide/en
22
21
23
22
For more details, see [Tasks with small schedule intervals run late](../../../troubleshoot/kibana/task-manager.md#task-manager-health-scheduled-tasks-small-schedule-interval-run-late).
24
23
25
-
26
24
## Rules with the inconsistent cadence [scheduled-rules-run-late]
27
25
28
26
**Problem**
@@ -39,7 +37,6 @@ Alerting tasks always begin with `alerting:`. For example, the `alerting:.index-
39
37
40
38
For more details on monitoring and diagnosing tasks in Task Manager, refer to [Health monitoring](../../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md).
41
39
42
-
43
40
## Connectors have TLS errors when running actions [connector-tls-settings]
44
41
45
42
**Problem**
@@ -50,7 +47,6 @@ A connector gets a TLS socket error when connecting to the server to run an acti
50
47
51
48
Configuration options are available to specialize connections to TLS servers, including ignoring server certificate validation and providing certificate authority data to verify servers using custom certificates. For more details, see [Action settings](https://www.elastic.co/guide/en/kibana/current/alert-action-settings-kb.html#action-settings).
52
49
53
-
54
50
## Rules take a long time to run [rules-long-run-time]
55
51
56
52
**Problem**
@@ -62,7 +58,6 @@ By default, only users with a `superuser` role can query the [preview] {{kib}} e
62
58
63
59
::::
64
60
65
-
66
61
**Solution**
67
62
68
63
By default, rules have a `5m` timeout. Rules that run longer than this timeout are automatically canceled to prevent them from consuming too much of {{kib}}'s resources. Alerts and actions that may have been scheduled before the rule timed out are discarded. When a rule times out, you will see this error in the {{kib}} logs:
@@ -157,7 +152,6 @@ GET /.kibana-event-log*/_search
157
152
4. This interval buckets the `event.duration_in_seconds` runtime field into 1 second intervals. Update this value to change the granularity of the buckets. If you are unable to use runtime fields, make sure this aggregation targets `event.duration` and use nanoseconds for the interval.
158
153
5. This retrieves the top 10 rule ids for this duration interval. Update this value to retrieve more rule ids.
159
154
160
-
161
155
This query returns the following:
162
156
163
157
```json
@@ -232,10 +226,8 @@ This query returns the following:
232
226
1. Most run durations fall within the first bucket (0 - 1 seconds).
233
227
2. A single rule with id `41893910-6bca-11eb-9e0d-85d233e3ee35` took between 30 and 31 seconds to run.
234
228
235
-
236
229
Use the get rule API to retrieve additional information about rules that take a long time to run.
237
230
238
-
239
231
## Rule cannot decrypt API key [rule-cannot-decrypt-api-key]
240
232
241
233
**Problem**:
@@ -252,7 +244,6 @@ This error happens when the `xpack.encryptedSavedObjects.encryptionKey` value us
252
244
| If another {{kib}} instance with a different encryption key connects to the cluster. | The other {{kib}} instance might be trying to run the rule using a different encryption key than what the rule was created with. Ensure the encryption keys among all the {{kib}} instances are the same, and setting [decryption only keys](https://www.elastic.co/guide/en/kibana/current/security-settings-kb.html#xpack-encryptedSavedObjects-keyRotation-decryptionOnlyKeys) for previously used encryption keys. |
253
245
| If other scenarios don’t apply. | Generate a new API key for the rule. For example, in **{{stack-manage-app}} > {{rules-ui}}**, select **Update API key** from the action menu. |
254
246
255
-
256
247
## Rules stop running after upgrade [known-issue-upgrade-rule]
Copy file name to clipboardExpand all lines: explore-analyze/alerts-cases/alerts/alerting-getting-started.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,7 @@ This section will clarify some of the important differences in the function and
112
112
Functionally, the {{alert-features}} differ in that:
113
113
114
114
* Scheduled checks are run on {{kib}} instead of {es}
115
-
* {{kib}} [rules hide the details of detecting conditions](../../../explore-analyze/alerts-cases.md#alerting-concepts-conditions) through rule types, whereas watches provide low-level control over inputs, conditions, and transformations.
115
+
* {{kib}} [rules hide the details of detecting conditions](../../../explore-analyze/alerts-cases/alerts/alerting-getting-started.md#alerting-concepts-conditions) through rule types, whereas watches provide low-level control over inputs, conditions, and transformations.
116
116
* {{kib}} rules track and persist the state of each detected condition through alerts. This makes it possible to mute and throttle individual alerts, and detect changes in state such as resolution.
117
117
* Actions are linked to alerts. Actions are fired for each occurrence of a detected condition, rather than for the entire rule.
{{kib}} {alert-features} are automatically enabled, but might require some additional configuration.
13
-
9
+
{{kib}} {{alert-features}} are automatically enabled, but might require some additional configuration.
14
10
15
11
## Prerequisites [alerting-prerequisites]
16
12
@@ -25,19 +21,16 @@ If you are using an **on-premises** {{stack}} deployment with [**security**](../
25
21
26
22
The alerting framework uses queries that require the `search.allow_expensive_queries` setting to be `true`. See the scripts [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-query.html#_allow_expensive_queries_4).
27
23
28
-
29
24
## Production considerations and scaling guidance [alerting-setup-production]
30
25
31
26
When relying on alerting and actions as mission critical services, make sure you follow the [alerting production considerations](../../../deploy-manage/production-guidance/kibana-alerting-production-considerations.md).
32
27
33
28
For more information on the scalability of {{alert-features}}, go to [Scaling guidance](../../../deploy-manage/production-guidance/kibana-alerting-production-considerations.md#alerting-scaling-guidance).
34
29
35
-
36
30
## Security [alerting-security]
37
31
38
32
To use {{alert-features}} in a {{kib}} app, you must have the appropriate feature privileges:
39
33
40
-
41
34
### Give full access to manage alerts, connectors, and rules in **{{stack-manage-app}}**[_give_full_access_to_manage_alerts_connectors_and_rules_in_stack_manage_app]
42
35
43
36
**{{kib}} privileges**
@@ -57,8 +50,6 @@ The rule type also affects the privileges that are required. For example, to cre
57
50
58
51
::::
59
52
60
-
61
-
62
53
### Give view-only access to alerts, connectors, and rules in **{{stack-manage-app}}**[_give_view_only_access_to_alerts_connectors_and_rules_in_stack_manage_app]
63
54
64
55
**{{kib}} privileges**
@@ -72,15 +63,12 @@ The rule type also affects the privileges that are required. For example, to vie
72
63
73
64
::::
74
65
75
-
76
-
77
66
### Give view-only access to alerts in **Discover** or **Dashboards**[_give_view_only_access_to_alerts_in_discover_or_dashboards]
78
67
79
68
**{{kib}} privileges**
80
69
81
70
*`Read` index privileges for the `.alerts-*` system indices.
82
71
83
-
84
72
### Revoke all access to alerts, connectors, and rules in **{{stack-manage-app}}**, **Discover**, or **Dashboards**[_revoke_all_access_to_alerts_connectors_and_rules_in_stack_manage_app_discover_or_dashboards]
85
73
86
74
**{{kib}} privileges**
@@ -90,12 +78,10 @@ The rule type also affects the privileges that are required. For example, to vie
90
78
*`None` for the **Management > {{connectors-feature}}** feature.
91
79
* No index privileges for the `.alerts-*` system indices.
92
80
93
-
94
81
### More details [_more_details]
95
82
96
83
For more information on configuring roles that provide access to features, go to [Feature privileges](../../../deploy-manage/users-roles/cluster-or-deployment-auth/kibana-privileges.md#kibana-feature-privileges).
97
84
98
-
99
85
### API keys [alerting-authorization]
100
86
101
87
Rules are authorized using an API key. Its credentials are used to run all background tasks associated with the rule, including condition checks like {{es}} queries and triggered actions.
@@ -113,18 +99,14 @@ If a rule requires certain privileges, such as index privileges, to run and a us
For security reasons you may wish to limit the extent to which {{kib}} can connect to external services. You can use [Action settings](https://www.elastic.co/guide/en/kibana/current/alert-action-settings-kb.html#action-settings) to disable certain [*Connectors*](../../../deploy-manage/manage-connectors.md) and allowlist the hostnames that {{kib}} can connect with.
121
105
122
-
123
106
## Space isolation [alerting-spaces]
124
107
125
108
Rules and connectors are isolated to the {{kib}} space in which they were created. A rule or connector created in one space will not be visible in another.
126
109
127
-
128
110
## {{ccs-cap}} [alerting-ccs-setup]
129
111
130
112
If you want to use alerting rules with {{ccs}}, you must configure privileges for {{ccs-init}} and {{kib}}. Refer to [Remote clusters](../../../deploy-manage/remote-clusters.md).
0 commit comments