Skip to content
11 changes: 0 additions & 11 deletions explore-analyze/alerts-cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,6 @@ mapped_urls:

# Alerts and cases

% What needs to be done: Write from scratch

% Use migrated content from existing pages that map to this page:

% - [ ] ./raw-migrated-files/kibana/kibana/alerting-getting-started.md
% - [ ] ./raw-migrated-files/docs-content/serverless/project-settings-alerts.md

$$$alerting-concepts-actions$$$

$$$alerting-concepts-conditions$$$

Alerting tools in Elasticsearch and Kibana provide functionality to monitor data and notify you about significant changes or events in real time. This page provides an overview of how the key components work.

## Alerts
Expand Down
112 changes: 100 additions & 12 deletions explore-analyze/alerts-cases/alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,24 +7,112 @@ mapped_urls:

# Alerts

% What needs to be done: Align serverless/stateful
## {{rules-app}} [rules]

% Scope notes: connection to kibana connectors reference prod considerations
In general, a rule consists of three parts:

% Use migrated content from existing pages that map to this page:
* *Conditions*: what needs to be detected?
* *Schedule*: when/how often should detection checks run?
* *Actions*: what happens when a condition is detected?

% - [ ] ./raw-migrated-files/kibana/kibana/alerting-getting-started.md
% - [ ] ./raw-migrated-files/docs-content/serverless/rules.md
% - [ ] ./raw-migrated-files/cloud/cloud/ec-organizations-notifications-domain-allowlist.md
For example, when monitoring a set of servers, a rule might:

% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
* Check for average CPU usage > 0.9 on each server for the last two minutes (condition).
* Check every minute (schedule).
* Send a warning email message via SMTP with subject `CPU on {{server}} is high` (action).

$$$alerting-getting-started$$$
### Conditions [rules-conditions]

$$$rules-alerts$$$
Each project type supports a specific set of rule types. Each *rule type* provides its own way of defining the conditions to detect, but an expression formed by a series of clauses is a common pattern. For example, in an {{es}} query rule, you specify an index, a query, and a threshold, which uses a metric aggregation operation (`count`, `average`, `max`, `min`, or `sum`):

$$$alerting-concepts-conditions$$$
:::{image} ../../images/serverless-es-query-rule-conditions.png
:alt: UI for defining rule conditions in an {{es}} query rule
:class: screenshot
:::

$$$alerting-concepts-scheduling$$$
### Schedule [rules-schedule]

$$$alerting-concepts-actions$$$
All rules must have a check interval, which defines how often to evaluate the rule conditions. Checks are queued; they run as close to the defined value as capacity allows.

::::{important}
The intervals of rule checks in {{kib}} are approximate. Their timing is affected by factors such as the frequency at which tasks are claimed and the task load on the system. Refer to [Alerting production considerations](../../deploy-manage/production-guidance/kibana-alerting-production-considerations.md)

::::

### Actions [rules-actions]

You can add one or more actions to your rule to generate notifications when its conditions are met. Recovery actions likewise run when rule conditions are no longer met.

When defining actions in a rule, you specify:

* A connector
* An action frequency
* A mapping of rule values to properties exposed for that type of action

Each action uses a connector, which provides connection information for a {{kib}} service or third party integration, depending on where you want to send the notifications. The specific list of connectors that you can use in your rule vary by project type. Refer to [{{connectors-app}}](../../deploy-manage/manage-connectors.md).

After you select a connector, set the *action frequency*. If you want to reduce the number of notifications you receive without affecting their timeliness, some rule types support alert summaries. For example, if you create an {{es}} query rule, you can set the action frequency such that you receive summaries of the new, ongoing, and recovered alerts on a custom interval:

:::{image} ../../images/serverless-es-query-rule-action-summary.png
:alt: UI for defining rule conditions in an {{es}} query rule
:class: screenshot
:::

Alternatively, you can set the action frequency such that the action runs for each alert. If the rule type does not support alert summaries, this is your only available option. You must choose when the action runs (for example, at each check interval, only when the alert status changes, or at a custom action interval). You must also choose an action group, which affects whether the action runs. Each rule type has a specific set of valid action groups. For example, you can set *Run when* to `Query matched` or `Recovered` for the {{es}} query rule:

:::{image} ../../images/serverless-es-query-rule-recovery-action.png
:alt: UI for defining a recovery action
:class: screenshot
:::

Each connector supports a specific set of actions for each action group and enables different action properties. For example, you can have actions that create an {{opsgenie}} alert when rule conditions are met and recovery actions that close the {{opsgenie}} alert.

Some types of rules enable you to further refine the conditions under which actions run. For example, you can specify that actions run only when an alert occurs within a specific time frame or when it matches a KQL query.

::::{tip}
If you are not using alert summaries, actions are triggered per alert and a rule can end up generating a large number of actions. Take the following example where a rule is monitoring three servers every minute for CPU usage > 0.9, and the action frequency is `On check intervals`:

* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
* Minute 2: X123 and Y456 > 0.9. *Two emails* are sent, one for X123 and one for Y456.
* Minute 3: X123, Y456, Z789 > 0.9. *Three emails* are sent, one for each of X123, Y456, Z789.

In this example, three emails are sent for server X123 in the span of 3 minutes for the same rule. Often, it’s desirable to suppress these re-notifications. If you set the action frequency to `On custom action intervals` with an interval of 5 minutes, you reduce noise by getting emails only every 5 minutes for servers that continue to exceed the threshold:

* Minute 1: server X123 > 0.9. *One email* will be sent for server X123.
* Minute 2: X123 and Y456 > 0.9. *One email* will be sent for Y456.
* Minute 3: X123, Y456, Z789 > 0.9. *One email* will be sent for Z789.

To get notified only once when a server exceeds the threshold, you can set the action frequency to `On status changes`. Alternatively, if the rule type supports alert summaries, consider using them to reduce the volume of notifications.

::::

#### Action variables [rules-action-variables]

You can pass rule values to an action at the time a condition is detected. To view the list of variables available for your rule, click the "add rule variable" button:

:::{image} ../../images/serverless-es-query-rule-action-variables.png
:alt: Passing rule values to an action
:class: screenshot
:::

For more information about common action variables, refer to [Rule actions variables](../../explore-analyze/alerts-cases/alerts/rule-action-variables.md)

### Alerts [rules-alerts]

When checking for a condition, a rule might identify multiple occurrences of the condition. {{kib}} tracks each of these alerts separately. Depending on the action frequency, an action occurs per alert or at the specified alert summary interval.

Using the server monitoring example, each server with average CPU > 0.9 is tracked as an alert. This means a separate email is sent for each server that exceeds the threshold whenever the alert status changes.

### Putting it all together [rules-putting-it-all-together]

A rule consists of conditions, actions, and a schedule. When conditions are met, alerts are created that render actions and invoke them. To make action setup and update easier, actions use connectors that centralize the information used to connect with {{kib}} services and third-party integrations. The following example ties these concepts together:

:::{image} ../../images/serverless-rule-concepts-summary.svg
:alt: Rules
:class: screenshot
:::

1. Any time a rule’s conditions are met, an alert is created. This example checks for servers with average CPU > 0.9. Three servers meet the condition, so three alerts are created.
2. Alerts create actions according to the action frequency, as long as they are not muted or throttled. When actions are created, its properties are filled with actual values. In this example, three actions are created when the threshold is met, and the template string `{{server}}` is replaced with the appropriate server name for each alert.
3. {{kib}} runs the actions, sending notifications by using a third party integration like an email service.
4. If the third party integration has connection parameters or credentials, {{kib}} fetches these from the appropriate connector.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ mapped_pages:

This page describes how to resolve common problems you might encounter with Alerting.


## Rules with small check intervals run late [rules-small-check-interval-run-late]

**Problem**
Expand All @@ -22,7 +21,6 @@ Either tweak the [{{kib}} Task Manager settings](https://www.elastic.co/guide/en

For more details, see [Tasks with small schedule intervals run late](../../../troubleshoot/kibana/task-manager.md#task-manager-health-scheduled-tasks-small-schedule-interval-run-late).


## Rules with the inconsistent cadence [scheduled-rules-run-late]

**Problem**
Expand All @@ -39,7 +37,6 @@ Alerting tasks always begin with `alerting:`. For example, the `alerting:.index-

For more details on monitoring and diagnosing tasks in Task Manager, refer to [Health monitoring](../../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md).


## Connectors have TLS errors when running actions [connector-tls-settings]

**Problem**
Expand All @@ -50,7 +47,6 @@ A connector gets a TLS socket error when connecting to the server to run an acti

Configuration options are available to specialize connections to TLS servers, including ignoring server certificate validation and providing certificate authority data to verify servers using custom certificates. For more details, see [Action settings](https://www.elastic.co/guide/en/kibana/current/alert-action-settings-kb.html#action-settings).


## Rules take a long time to run [rules-long-run-time]

**Problem**
Expand All @@ -62,7 +58,6 @@ By default, only users with a `superuser` role can query the [preview] {{kib}} e

::::


**Solution**

By default, rules have a `5m` timeout. Rules that run longer than this timeout are automatically canceled to prevent them from consuming too much of {{kib}}'s resources. Alerts and actions that may have been scheduled before the rule timed out are discarded. When a rule times out, you will see this error in the {{kib}} logs:
Expand Down Expand Up @@ -157,7 +152,6 @@ GET /.kibana-event-log*/_search
4. This interval buckets the `event.duration_in_seconds` runtime field into 1 second intervals. Update this value to change the granularity of the buckets. If you are unable to use runtime fields, make sure this aggregation targets `event.duration` and use nanoseconds for the interval.
5. This retrieves the top 10 rule ids for this duration interval. Update this value to retrieve more rule ids.


This query returns the following:

```json
Expand Down Expand Up @@ -232,10 +226,8 @@ This query returns the following:
1. Most run durations fall within the first bucket (0 - 1 seconds).
2. A single rule with id `41893910-6bca-11eb-9e0d-85d233e3ee35` took between 30 and 31 seconds to run.


Use the get rule API to retrieve additional information about rules that take a long time to run.


## Rule cannot decrypt API key [rule-cannot-decrypt-api-key]

**Problem**:
Expand All @@ -252,7 +244,6 @@ This error happens when the `xpack.encryptedSavedObjects.encryptionKey` value us
| If another {{kib}} instance with a different encryption key connects to the cluster. | The other {{kib}} instance might be trying to run the rule using a different encryption key than what the rule was created with. Ensure the encryption keys among all the {{kib}} instances are the same, and setting [decryption only keys](https://www.elastic.co/guide/en/kibana/current/security-settings-kb.html#xpack-encryptedSavedObjects-keyRotation-decryptionOnlyKeys) for previously used encryption keys. |
| If other scenarios don’t apply. | Generate a new API key for the rule. For example, in **{{stack-manage-app}} > {{rules-ui}}**, select **Update API key** from the action menu. |


## Rules stop running after upgrade [known-issue-upgrade-rule]

**Problem**:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ This section will clarify some of the important differences in the function and
Functionally, the {{alert-features}} differ in that:

* Scheduled checks are run on {{kib}} instead of {es}
* {{kib}} [rules hide the details of detecting conditions](../../../explore-analyze/alerts-cases.md#alerting-concepts-conditions) through rule types, whereas watches provide low-level control over inputs, conditions, and transformations.
* {{kib}} [rules hide the details of detecting conditions](../../../explore-analyze/alerts-cases/alerts/alerting-getting-started.md#alerting-concepts-conditions) through rule types, whereas watches provide low-level control over inputs, conditions, and transformations.
* {{kib}} rules track and persist the state of each detected condition through alerts. This makes it possible to mute and throttle individual alerts, and detect changes in state such as resolution.
* Actions are linked to alerts. Actions are fired for each occurrence of a detected condition, rather than for the entire rule.

Expand Down
20 changes: 1 addition & 19 deletions explore-analyze/alerts-cases/alerts/alerting-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,9 @@ mapped_pages:
- https://www.elastic.co/guide/en/kibana/current/alerting-setup.html
---



# Set up [alerting-setup]


{{kib}} {alert-features} are automatically enabled, but might require some additional configuration.

{{kib}} {{alert-features}} are automatically enabled, but might require some additional configuration.

## Prerequisites [alerting-prerequisites]

Expand All @@ -25,19 +21,16 @@ If you are using an **on-premises** {{stack}} deployment with [**security**](../

The alerting framework uses queries that require the `search.allow_expensive_queries` setting to be `true`. See the scripts [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-query.html#_allow_expensive_queries_4).


## Production considerations and scaling guidance [alerting-setup-production]

When relying on alerting and actions as mission critical services, make sure you follow the [alerting production considerations](../../../deploy-manage/production-guidance/kibana-alerting-production-considerations.md).

For more information on the scalability of {{alert-features}}, go to [Scaling guidance](../../../deploy-manage/production-guidance/kibana-alerting-production-considerations.md#alerting-scaling-guidance).


## Security [alerting-security]

To use {{alert-features}} in a {{kib}} app, you must have the appropriate feature privileges:


### Give full access to manage alerts, connectors, and rules in **{{stack-manage-app}}** [_give_full_access_to_manage_alerts_connectors_and_rules_in_stack_manage_app]

**{{kib}} privileges**
Expand All @@ -57,8 +50,6 @@ The rule type also affects the privileges that are required. For example, to cre

::::



### Give view-only access to alerts, connectors, and rules in **{{stack-manage-app}}** [_give_view_only_access_to_alerts_connectors_and_rules_in_stack_manage_app]

**{{kib}} privileges**
Expand All @@ -72,15 +63,12 @@ The rule type also affects the privileges that are required. For example, to vie

::::



### Give view-only access to alerts in **Discover** or **Dashboards** [_give_view_only_access_to_alerts_in_discover_or_dashboards]

**{{kib}} privileges**

* `Read` index privileges for the `.alerts-*` system indices.


### Revoke all access to alerts, connectors, and rules in **{{stack-manage-app}}**, **Discover**, or **Dashboards** [_revoke_all_access_to_alerts_connectors_and_rules_in_stack_manage_app_discover_or_dashboards]

**{{kib}} privileges**
Expand All @@ -90,12 +78,10 @@ The rule type also affects the privileges that are required. For example, to vie
* `None` for the **Management > {{connectors-feature}}** feature.
* No index privileges for the `.alerts-*` system indices.


### More details [_more_details]

For more information on configuring roles that provide access to features, go to [Feature privileges](../../../deploy-manage/users-roles/cluster-or-deployment-auth/kibana-privileges.md#kibana-feature-privileges).


### API keys [alerting-authorization]

Rules are authorized using an API key. Its credentials are used to run all background tasks associated with the rule, including condition checks like {{es}} queries and triggered actions.
Expand All @@ -113,18 +99,14 @@ If a rule requires certain privileges, such as index privileges, to run and a us

::::



### Restrict actions [alerting-restricting-actions]

For security reasons you may wish to limit the extent to which {{kib}} can connect to external services. You can use [Action settings](https://www.elastic.co/guide/en/kibana/current/alert-action-settings-kb.html#action-settings) to disable certain [*Connectors*](../../../deploy-manage/manage-connectors.md) and allowlist the hostnames that {{kib}} can connect with.


## Space isolation [alerting-spaces]

Rules and connectors are isolated to the {{kib}} space in which they were created. A rule or connector created in one space will not be visible in another.


## {{ccs-cap}} [alerting-ccs-setup]

If you want to use alerting rules with {{ccs}}, you must configure privileges for {{ccs-init}} and {{kib}}. Refer to [Remote clusters](../../../deploy-manage/remote-clusters.md).
Loading
Loading