Skip to content

Commit effa440

Browse files
authored
Add docs for advanced monitoring options (#1361)
* Add docs for advanced monitoring options * Remove the 'override the default monitoring port' section * Address Craig's comments
1 parent ac94c1d commit effa440

File tree

3 files changed

+56
-27
lines changed

3 files changed

+56
-27
lines changed

docs/en/ingest-management/agent-policies.asciidoc

Lines changed: 49 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ The following table illustrates the {fleet} user actions available to different
9696
|{y}
9797
|{n}
9898

99-
|<<change-policy-enable-agent-monitoring,Enable agent monitoring>>
99+
|<<change-policy-enable-agent-monitoring,Configure agent monitoring>>
100100
|{y}
101101
|{n}
102102

@@ -116,10 +116,6 @@ The following table illustrates the {fleet} user actions available to different
116116
|{y}
117117
|{n}
118118

119-
|<<agent-policy-http-monitoring>>
120-
|{y}
121-
|{n}
122-
123119
|<<agent-policy-log-level>>
124120
|{y}
125121
|{n}
@@ -310,19 +306,63 @@ Note that adding custom tags is not supported for a small set of inputs:
310306

311307
[discrete]
312308
[[change-policy-enable-agent-monitoring]]
313-
== Enable agent monitoring
309+
== Configure agent monitoring
314310

315-
Use this setting to collect monitoring logs and metrics from {agent}. All monitoring data will be written to the specified **Default namespace**.
311+
Use these settings to collect monitoring logs and metrics from {agent}. All monitoring data will be written to the specified **Default namespace**.
316312

317313
. In {fleet}, click **Agent policies**.
318314
Select the name of the policy you want to edit.
319315

320-
. Click the **Settings** tab and scroll to **Enable agent monitorings**.
316+
. Click the **Settings** tab and scroll to **Agent monitoring**.
321317

322318
. Select whether to collect agent logs, agent metrics, or both, from the {agents} that use the policy.
323-
319+
+
324320
When this setting is enabled an {agent} integration is created automatically.
325321

322+
. Expand the **Advanced monitoring options** section to access <<advanced-agent-monitoring-settings,advanced settings>>.
323+
324+
. Save your changes for the updated monitoring settings to take effect.
325+
326+
[discrete]
327+
[[advanced-agent-monitoring-settings]]
328+
=== Advanced agent monitoring settings
329+
330+
**HTTP monitoring endpoint**
331+
332+
Enabling this setting exposes a `/liveness` API endpoint that you can use to monitor {agent} health according to the following HTTP codes:
333+
334+
* `200`: {agent} is healthy. The endpoint returns a `200` OK status as long as {agent} is responsive and can process configuration changes.
335+
* `500`: A component or unit is in a failed state.
336+
* `503`: The agent coordinator is unresponsive.
337+
338+
You can pass a `failon` parameter to the `/liveness` endpoint to determine what component state will result in a `500` status. For example, `curl 'localhost:6792/liveness?failon=degraded'` will return `500` if a component is in a degraded state.
339+
340+
The possible values for `failon` are:
341+
342+
* `degraded`: Return an error if a component is in a degraded state or failed state, or if the agent coordinator is unresponsive.
343+
* `failed`: Return an error if a unit is in a failed state, or if the agent coordinator is unresponsive.
344+
* `heartbeat`: Return an error only if the agent coordinator is unresponsive.
345+
346+
If no `failon` parameter is provided, the default `failon` behavior is `heartbeat`.
347+
348+
The HTTP monitoring endpoint can also be link:https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request[used with Kubernetes], to restart the container for example.
349+
350+
When you enable this setting, you need to provide the host URL and port where the endpoint can be accessed. Using the default `localhost` is recommended.
351+
352+
When the HTTP monitoring endpoint is enabled you can also select to **Enable profiling at `/debug/pprof`**. This controls whether the {agent} exposes the `/debug/pprof/` endpoints together with the monitoring endpoints.
353+
354+
The heap profiles available from `/debug/pprof/` are included in <<elastic-agent-diagnostics-command,{agent} diagnostics>> by default. CPU profiles are also included when the `--cpu-profile` option is included. For full details about the profiles exposed by `/debug/pprof/` refer to the link:https://pkg.go.dev/net/http/pprof[pprof package documentation].
355+
356+
Profiling at `/debug/pprof` is disabled by default. Data produced by these endpoints can be useful for debugging but present a security risk. It's recommended to leave this option disabled if the monitoring endpoint is accessible over a network.
357+
358+
**Diagnostics rate limiting**
359+
360+
You can set a rate limit for the action handler for diagnostics requests coming from {fleet}. The setting affects only {fleet}-managed {agents}. By default, requests are limited to an interval of `1m` and a burst value of `1`. This setting does not affect diagnostics collected through the CLI.
361+
362+
**Diagnostics file upload**
363+
364+
This setting configures retries for the file upload client handling diagnostics requests coming from {fleet}. The setting affects only {fleet}-managed {agents}. By default, a maximum of `10` retries are allowed with an initial duration of `1s` and a backoff duration of `1m`. The client may retry failed requests with exponential backoff.
365+
326366
[discrete]
327367
[[change-policy-output]]
328368
== Change the output of a policy
@@ -414,22 +454,6 @@ Select the name of the policy you want to edit.
414454

415455
. Set **Limit CPU usage** as needed. For example, to limit Go processes supervised by {agent} to two operating system threads each, set this value to `2`.
416456

417-
[discrete]
418-
[[agent-policy-http-monitoring]]
419-
== Override the default monitoring port
420-
421-
You can override the default port that {agent} uses to send monitoring data. It's useful to be able to adjust this setting if you have an application running on the machine on which the agent is deployed, and that is using the same port.
422-
423-
. In {fleet}, click **Agent policies**.
424-
Select the name of the policy you want to edit.
425-
426-
. Click the **Settings** tab and scroll to **Advanced settings**.
427-
428-
//. Set **Agent HTTP monitoring** setting to enabled, and then specify a host and port for the monitoring data output.
429-
. Specify a host and port for the monitoring data output.
430-
431-
//. Enable **buffer.enabled** if you'd like {agent} and {beats} to collect metrics into an in-memory buffer and expose these through a `/buffer` endpoint. This data can be useful for debugging or if the {agent} has issues communicating with {es}. Enabling this option may slightly increase process memory usage.
432-
433457
[discrete]
434458
[[agent-policy-log-level]]
435459
== Set the {agent} log level

docs/en/ingest-management/commands.asciidoc

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ This command is intended for debugging purposes only. The output format and stru
7777
[source,shell]
7878
----
7979
elastic-agent diagnostics [--file <string>]
80-
[-p]
80+
[--cpu-profile]
8181
[--exclude-events]
8282
[--help]
8383
[global-flags]
@@ -92,9 +92,12 @@ Specifies the output archive name. Defaults to `elastic-agent-diagnostics-<times
9292
`--help`::
9393
Show help for the `diagnostics` command.
9494

95-
`-p`::
95+
`--cpu-profile`::
9696
Additionally runs a 30-second CPU profile on each running component. This will generate an additional `cpu.pprof` file for each component.
9797

98+
`--p`::
99+
Alias for `--cpu-profile`.
100+
98101
`--exclude-events`::
99102
Exclude the events log files from the diagnostics archive.
100103

docs/en/ingest-management/fleet/monitor-elastic-agent.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,8 @@ monitoring settings for all agents enrolled in a specific agent policy:
226226
. Under **Agent monitoring**, deselect (or select) one or both of these
227227
settings: **Collect agent logs** and **Collect agent metrics**.
228228

229+
. Under **Advanced monitoring options** you can configure additional settings including an HTTP monitoring endpoint, diagnostics rate limiting, and diagnostics file upload limits. Refer to <<change-policy-enable-agent-monitoring,configure agent monitoring>> for details.
230+
229231
. Save your changes.
230232

231233
To turn off agent monitoring when creating a new agent policy:

0 commit comments

Comments
 (0)