-
Notifications
You must be signed in to change notification settings - Fork 165
Add list of agent OOB alert rules with descriptions #3608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,6 +1,4 @@ | ||||||
| --- | ||||||
| mapped_pages: | ||||||
| - https://www.elastic.co/guide/en/fleet/current/data-streams.html | ||||||
| applies_to: | ||||||
| stack: ga 9.2 | ||||||
| serverless: ga | ||||||
|
|
@@ -17,23 +15,34 @@ navigation_title: Built-in alerts and templates | |||||
| When you install or upgrade {{agent}}, new alert rules are created automatically. You can configure and customize out-of-the-box alerts to get them up and running quickly. | ||||||
|
|
||||||
| ::::{note} | ||||||
| The built-in alerts feature for {{agent}} is available only for some subscription levels. The license (or a trial license) must be in place before you install or upgrade {{agent}} before this feature is available. | ||||||
| The built-in alerts feature for {{agent}} is available only for some subscription levels. The license (or a trial license) must be in place _before_ you install or upgrade {{agent}} for the alert rules to be available. | ||||||
|
|
||||||
| Refer [Elastic subscriptions](https://www.elastic.co/subscriptions) for more information. | ||||||
| Refer to [Elastic subscriptions](https://www.elastic.co/subscriptions) for more information. | ||||||
| :::: | ||||||
|
|
||||||
| In {{kib}}, you can enable out-of-the-box rules pre-configured with reasonable defaults to provide immediate value for managing agents. | ||||||
| You can use [ES|QL](/explore-analyze/discover/try-esql.md) to author conditions for each rule. | ||||||
|
|
||||||
| Connectors are not added to rules automatically, but you can attach a connector to route alerts to your platform of choice -- Slack or email, for example. | ||||||
| In addition, you can add filters for policies, tags, or hostnames to scope alerts to specific sets of agents | ||||||
| You can use [{{esql}}](/explore-analyze/discover/try-esql.md) to author conditions for each rule. | ||||||
|
|
||||||
| You can find these rules in **Stack Management** > **Alerts and Insights** > **Rules**. | ||||||
|
|
||||||
| ### Available rules [available-alert-rules] | ||||||
|
|
||||||
| | Alert | Description | | ||||||
| | -------- | -------- | | ||||||
| | [Elastic Agent] CPU usage spike| Checks if {{agent}} or any of its processes were pegged at a high CPU for a specified window of time. This could signal a bug in an application and warrant further investigation.<br> - Condition: `system.process.cpu.total.time.ms` > 80% for 5 minutes<br>- Default: Enabled | | ||||||
| | [Elastic Agent] Dropped events | Checks if percentage of events dropped to acked events from the pipeline are greater than or equal to 5%. Rows are distinct by agent id and component id. | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
IDK what "events dropped to acked events from the pipeline are" but if we're talking about "the percentage", we want an "is" not an "are" :) |
||||||
| | [Elastic Agent] Excessive memory usage| Checks if {{agent}} or any of its processes have a high memory usage or memory usage that is trending higher. This could signal a memory leak in an application and warrant further investigation.<br>- Condition: Alert on `system.process.memory.rss.pct` > 80%<br>- Default: Enabled (perhaps the threshold should be higher if this is on by default) | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What did we decide? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the threshold here is currently set to 50% - @MichelLosier is that correct? |
||||||
| | [Elastic Agent] Excessive restarts| Checks if excessive restarts on a host which require further investigation. Some of these restarts could have a business impact and getting an alert for them would allow us to act quickly to mitigate.<br>- Condition: Alert on (not sure) > 10 times in a 5 minute window<br>- Default: Enabled | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What did we decide? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is correct. currently set to greater than 10 restarts in the5 min window There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| | [Elastic Agent] High pipeline queue | Checks if max of `beat.stats.libbeat.pipeline.queue.filled.pct` exceeds 90%. Rows are distinct by agent id and component id. | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Should we specify the exact names of the agent ID and component ID fields? e.g. |
||||||
| | [Elastic Agent] Output errors | Checks if the errors per minute from an agent component is greater than 5. Rows are distinct by agent id and component id. | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| | [Elastic Agent] Unhealthy status | Checks for log occurrence of an agent status change to `error` using the new elastic_agent.status_change datastreams. | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| **Connectors** are not added to rules automatically, but you can attach a connector to route alerts to your Slack, email, or other notification platforms. | ||||||
| In addition, you can add filters for policies, tags, or hostnames to scope alerts to specific sets of agents. | ||||||
|
|
||||||
| ## Alert templates assets for integrations [alert-templates] | ||||||
| ## Alert template assets for integrations [alert-templates] | ||||||
|
|
||||||
| Some integration packages include alerting rule template assets that provide pre-made definitions of alerting rules. You can use the templates to create your own custom alerting rules that you can enable and fine tune. | ||||||
| Some integration packages include alerting rule template assets that provide pre-made definitions of alerting rules. You can use the templates to create your own custom alerting rules that you can enable and fine-tune. | ||||||
|
|
||||||
| When you click a template, you get a pre-filled rule creation form. You can define and adjust values, set up connectors, and define rule actions to create your custom alerting rule. | ||||||
|
|
||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.