Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 19 additions & 10 deletions reference/fleet/alert-templates.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/fleet/current/data-streams.html
applies_to:
stack: ga 9.2
serverless: ga
Expand All @@ -17,23 +15,34 @@ navigation_title: Built-in alerts and templates
When you install or upgrade {{agent}}, new alert rules are created automatically. You can configure and customize out-of-the-box alerts to get them up and running quickly.

::::{note}
The built-in alerts feature for {{agent}} is available only for some subscription levels. The license (or a trial license) must be in place before you install or upgrade {{agent}} before this feature is available.
The built-in alerts feature for {{agent}} is available only for some subscription levels. The license (or a trial license) must be in place _before_ you install or upgrade {{agent}} for the alert rules to be available.

Refer [Elastic subscriptions](https://www.elastic.co/subscriptions) for more information.
Refer to [Elastic subscriptions](https://www.elastic.co/subscriptions) for more information.
::::

In {{kib}}, you can enable out-of-the-box rules pre-configured with reasonable defaults to provide immediate value for managing agents.
You can use [ES|QL](/explore-analyze/discover/try-esql.md) to author conditions for each rule.

Connectors are not added to rules automatically, but you can attach a connector to route alerts to your platform of choice -- Slack or email, for example.
In addition, you can add filters for policies, tags, or hostnames to scope alerts to specific sets of agents
You can use [{{esql}}](/explore-analyze/discover/try-esql.md) to author conditions for each rule.

You can find these rules in **Stack Management** > **Alerts and Insights** > **Rules**.

### Available rules [available-alert-rules]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Available rules [available-alert-rules]
### Available alert rules [available-alert-rules]


| Alert | Description |
| -------- | -------- |
| [Elastic Agent] CPU usage spike| Checks if {{agent}} or any of its processes were pegged at a high CPU for a specified window of time. This could signal a bug in an application and warrant further investigation.<br> - Condition: `system.process.cpu.total.time.ms` > 80% for 5 minutes<br>- Default: Enabled |
| [Elastic Agent] Dropped events | Checks if percentage of events dropped to acked events from the pipeline are greater than or equal to 5%. Rows are distinct by agent id and component id. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [Elastic Agent] Dropped events | Checks if percentage of events dropped to acked events from the pipeline are greater than or equal to 5%. Rows are distinct by agent id and component id. |
| [Elastic Agent] Dropped events | Checks if the percentage of events dropped to acked events from the pipeline is greater than or equal to 5%. Rows are distinguished by agent ID and component ID. |

IDK what "events dropped to acked events from the pipeline are" but if we're talking about "the percentage", we want an "is" not an "are" :)

| [Elastic Agent] Excessive memory usage| Checks if {{agent}} or any of its processes have a high memory usage or memory usage that is trending higher. This could signal a memory leak in an application and warrant further investigation.<br>- Condition: Alert on `system.process.memory.rss.pct` > 80%<br>- Default: Enabled (perhaps the threshold should be higher if this is on by default) |
Copy link
Contributor Author

@karenzone karenzone Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80%
- Default: Enabled (perhaps the threshold should be higher if this is on by default)

What did we decide?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the threshold here is currently set to 50% - @MichelLosier is that correct?

| [Elastic Agent] Excessive restarts| Checks if excessive restarts on a host which require further investigation. Some of these restarts could have a business impact and getting an alert for them would allow us to act quickly to mitigate.<br>- Condition: Alert on (not sure) > 10 times in a 5 minute window<br>- Default: Enabled |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alert on (not sure) > 10 times in a 5 minute window

What did we decide?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is correct. currently set to greater than 10 restarts in the5 min window

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [Elastic Agent] Excessive restarts| Checks if excessive restarts on a host which require further investigation. Some of these restarts could have a business impact and getting an alert for them would allow us to act quickly to mitigate.<br>- Condition: Alert on (not sure) > 10 times in a 5 minute window<br>- Default: Enabled |
| [Elastic Agent] Excessive restarts| Checks for excessive restarts on a host which require further investigation. Some restarts can have business impacts, and getting alerts for them can enable timely mitigation efforts.<br>- Condition: Alert on (not sure) > 10 times in a 5 minute window<br>- Default: Enabled |

| [Elastic Agent] High pipeline queue | Checks if max of `beat.stats.libbeat.pipeline.queue.filled.pct` exceeds 90%. Rows are distinct by agent id and component id. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [Elastic Agent] High pipeline queue | Checks if max of `beat.stats.libbeat.pipeline.queue.filled.pct` exceeds 90%. Rows are distinct by agent id and component id. |
| [Elastic Agent] High pipeline queue | Checks if max of `beat.stats.libbeat.pipeline.queue.filled.pct` exceeds 90%. Rows are distinguished by agent ID and component ID. |

Should we specify the exact names of the agent ID and component ID fields? e.g. agent_ID (not sure that's accurate just an example)

| [Elastic Agent] Output errors | Checks if the errors per minute from an agent component is greater than 5. Rows are distinct by agent id and component id. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [Elastic Agent] Output errors | Checks if the errors per minute from an agent component is greater than 5. Rows are distinct by agent id and component id. |
| [Elastic Agent] Output errors | Checks if the errors per minute from an agent component is greater than 5. Rows are distinguished by agent ID and component ID. |

| [Elastic Agent] Unhealthy status | Checks for log occurrence of an agent status change to `error` using the new elastic_agent.status_change datastreams. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [Elastic Agent] Unhealthy status | Checks for log occurrence of an agent status change to `error` using the new elastic_agent.status_change datastreams. |
| [Elastic Agent] Unhealthy status | Checks logs for an agent status change to `error` using the new `elastic_agent.status_change` datastreams. |


**Connectors** are not added to rules automatically, but you can attach a connector to route alerts to your Slack, email, or other notification platforms.
In addition, you can add filters for policies, tags, or hostnames to scope alerts to specific sets of agents.

## Alert templates assets for integrations [alert-templates]
## Alert template assets for integrations [alert-templates]

Some integration packages include alerting rule template assets that provide pre-made definitions of alerting rules. You can use the templates to create your own custom alerting rules that you can enable and fine tune.
Some integration packages include alerting rule template assets that provide pre-made definitions of alerting rules. You can use the templates to create your own custom alerting rules that you can enable and fine-tune.

When you click a template, you get a pre-filled rule creation form. You can define and adjust values, set up connectors, and define rule actions to create your custom alerting rule.

Expand Down
2 changes: 1 addition & 1 deletion reference/fleet/manage-integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,4 @@ You can perform a variety of actions in the **Integrations** app in {{kib}}. Som

## Customize integrations [customize-integrations]

After you've started using integrations to ingest data, you can customize how the data is managed over time. Refer to [Index lifecycle management](/reference/fleet/data-streams.md#data-streams-ilm) to learn more.
After you've started using integrations to ingest data, you can customize how the data is managed over time. Refer to [{{ilm-cap}}](/reference/fleet/data-streams.md#data-streams-ilm) to learn more.