From e3e0d307237cb1da27776cbe0529f47c0170b4dd Mon Sep 17 00:00:00 2001
From: David Kilfoyle <david.kilfoyle@elastic.co>
Date: Tue, 1 Oct 2024 12:24:51 -0400
Subject: [PATCH 1/3] Add docs for advanced monitoring options

---
 .../ingest-management/agent-policies.asciidoc | 31 ++++++++++++++++---
 .../fleet/monitor-elastic-agent.asciidoc      |  2 ++
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/docs/en/ingest-management/agent-policies.asciidoc b/docs/en/ingest-management/agent-policies.asciidoc
index 4590a3502..e0da21d3a 100644
--- a/docs/en/ingest-management/agent-policies.asciidoc
+++ b/docs/en/ingest-management/agent-policies.asciidoc
@@ -96,7 +96,7 @@ The following table illustrates the {fleet} user actions available to different
 |{y}
 |{n}
 
-|<<change-policy-enable-agent-monitoring,Enable agent monitoring>>
+|<<change-policy-enable-agent-monitoring,Configure agent monitoring>>
 |{y}
 |{n}
 
@@ -310,19 +310,40 @@ Note that adding custom tags is not supported for a small set of inputs:
 
 [discrete]
 [[change-policy-enable-agent-monitoring]]
-== Enable agent monitoring
+== Configure agent monitoring
 
-Use this setting to collect monitoring logs and metrics from {agent}. All monitoring data will be written to the specified **Default namespace**.
+Use these settings to collect monitoring logs and metrics from {agent}. All monitoring data will be written to the specified **Default namespace**.
 
 . In {fleet}, click **Agent policies**.
 Select the name of the policy you want to edit.
 
-. Click the **Settings** tab and scroll to **Enable agent monitorings**.
+. Click the **Settings** tab and scroll to **Agent monitoring**.
 
 . Select whether to collect agent logs, agent metrics, or both, from the {agents} that use the policy.
-
++
 When this setting is enabled an {agent} integration is created automatically.
 
+. Expand the **Advanced monitoring options** section to access <<advanced-agent-monitoring-settings,other monitoring settings>>.
+
+. Save your changes for the updated monitoring settings to take effect.
+
+[discrete]
+[[advanced-agent-monitoring-settings]]
+=== Advanced agent monitoring settings
+
+**HTTP monitoring endpoint**::
+Enabling this setting exposes a `/liveness` API endpoint that you can use to monitor {agent} health. By default, the endpoint returns a `200` OK status as long as {agent}'s internal main loop is responsive and can process configuration changes. It can be configured to also monitor the component states and return an error if anything is degraded or has failed. This endpoint can be used by Kubernetes to restart the container, for example.
++
+When you enable this setting, you need to also provide the host URL and port where the endpoint can be accessed. Using the default `localhost` is recommended.
++
+You can also enable profiling at `/debug/pprof` to control whether the {agent} exposes the `/debug/pprof/` endpoints with the monitoring endpoints. This is disabled by default. Data produced by these endpoints can be useful for debugging but present a security risk. It's recommended to leave this option disabled if the monitoring endpoint is accessible over a network.
+
+**Diagnostics rate limiting**::
+You can set a rate limit for the request diagnostics action handler. By default requests are limited to an interval of `1m` and a burst value of `1`. This setting does not affect diagnostics collected through the CLI.
+
+**Diagnostics file upload**::
+This setting configures retries for the file upload client. By default, a maximum of `10` retries are allowed with an initial duration of `1s` and a backoff duration of `1m`. The client may retry failed requests with exponential backoff.
+
 [discrete]
 [[change-policy-output]]
 == Change the output of a policy
diff --git a/docs/en/ingest-management/fleet/monitor-elastic-agent.asciidoc b/docs/en/ingest-management/fleet/monitor-elastic-agent.asciidoc
index 087133a9e..a06cce345 100644
--- a/docs/en/ingest-management/fleet/monitor-elastic-agent.asciidoc
+++ b/docs/en/ingest-management/fleet/monitor-elastic-agent.asciidoc
@@ -226,6 +226,8 @@ monitoring settings for all agents enrolled in a specific agent policy:
 . Under **Agent monitoring**, deselect (or select) one or both of these
 settings: **Collect agent logs** and **Collect agent metrics**.
 
+. Under **Advanced monitoring options** you can configure additional settings including an HTTP monitoring endpoint, diagnostics rate limiting, and diagnostics file upload limits. Refer to <<change-policy-enable-agent-monitoring,configure agent monitoring>> for details.
+
 . Save your changes.
 
 To turn off agent monitoring when creating a new agent policy:

From 284bb2fbe597f944e15412c6128ebc9726f1443c Mon Sep 17 00:00:00 2001
From: David Kilfoyle <david.kilfoyle@elastic.co>
Date: Tue, 1 Oct 2024 13:08:05 -0400
Subject: [PATCH 2/3] Remove the 'override the default monitoring port' section

---
 .../ingest-management/agent-policies.asciidoc | 20 -------------------
 1 file changed, 20 deletions(-)

diff --git a/docs/en/ingest-management/agent-policies.asciidoc b/docs/en/ingest-management/agent-policies.asciidoc
index e0da21d3a..52447cb65 100644
--- a/docs/en/ingest-management/agent-policies.asciidoc
+++ b/docs/en/ingest-management/agent-policies.asciidoc
@@ -116,10 +116,6 @@ The following table illustrates the {fleet} user actions available to different
 |{y}
 |{n}
 
-|<<agent-policy-http-monitoring>>
-|{y}
-|{n}
-
 |<<agent-policy-log-level>>
 |{y}
 |{n}
@@ -435,22 +431,6 @@ Select the name of the policy you want to edit.
 
 . Set **Limit CPU usage** as needed. For example, to limit Go processes supervised by {agent} to two operating system threads each, set this value to `2`.
 
-[discrete]
-[[agent-policy-http-monitoring]]
-== Override the default monitoring port
-
-You can override the default port that {agent} uses to send monitoring data. It's useful to be able to adjust this setting if you have an application running on the machine on which the agent is deployed, and that is using the same port.
-
-. In {fleet}, click **Agent policies**.
-Select the name of the policy you want to edit.
-
-. Click the **Settings** tab and scroll to **Advanced settings**.
-
-//. Set **Agent HTTP monitoring** setting to enabled, and then specify a host and port for the monitoring data output.
-. Specify a host and port for the monitoring data output.
-
-//. Enable **buffer.enabled** if you'd like {agent} and {beats} to collect metrics into an in-memory buffer and expose these through a `/buffer` endpoint. This data can be useful for debugging or if the {agent} has issues communicating with {es}. Enabling this option may slightly increase process memory usage.
-
 [discrete]
 [[agent-policy-log-level]]
 == Set the {agent} log level

From fe8cf38e1c7307818d452e2506ce9a9c60125525 Mon Sep 17 00:00:00 2001
From: David Kilfoyle <david.kilfoyle@elastic.co>
Date: Wed, 2 Oct 2024 12:31:09 -0400
Subject: [PATCH 3/3] Address Craig's comments

---
 .../ingest-management/agent-policies.asciidoc | 45 ++++++++++++++-----
 docs/en/ingest-management/commands.asciidoc   |  7 ++-
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/docs/en/ingest-management/agent-policies.asciidoc b/docs/en/ingest-management/agent-policies.asciidoc
index 52447cb65..79dd2530e 100644
--- a/docs/en/ingest-management/agent-policies.asciidoc
+++ b/docs/en/ingest-management/agent-policies.asciidoc
@@ -319,7 +319,7 @@ Select the name of the policy you want to edit.
 +
 When this setting is enabled an {agent} integration is created automatically.
 
-. Expand the **Advanced monitoring options** section to access <<advanced-agent-monitoring-settings,other monitoring settings>>.
+. Expand the **Advanced monitoring options** section to access <<advanced-agent-monitoring-settings,advanced settings>>.
 
 . Save your changes for the updated monitoring settings to take effect.
 
@@ -327,18 +327,41 @@ When this setting is enabled an {agent} integration is created automatically.
 [[advanced-agent-monitoring-settings]]
 === Advanced agent monitoring settings
 
-**HTTP monitoring endpoint**::
-Enabling this setting exposes a `/liveness` API endpoint that you can use to monitor {agent} health. By default, the endpoint returns a `200` OK status as long as {agent}'s internal main loop is responsive and can process configuration changes. It can be configured to also monitor the component states and return an error if anything is degraded or has failed. This endpoint can be used by Kubernetes to restart the container, for example.
-+
-When you enable this setting, you need to also provide the host URL and port where the endpoint can be accessed. Using the default `localhost` is recommended.
-+
-You can also enable profiling at `/debug/pprof` to control whether the {agent} exposes the `/debug/pprof/` endpoints with the monitoring endpoints. This is disabled by default. Data produced by these endpoints can be useful for debugging but present a security risk. It's recommended to leave this option disabled if the monitoring endpoint is accessible over a network.
+**HTTP monitoring endpoint**
+
+Enabling this setting exposes a `/liveness` API endpoint that you can use to monitor {agent} health according to the following HTTP codes:
+
+* `200`: {agent} is healthy. The endpoint returns a `200` OK status as long as {agent} is responsive and can process configuration changes.
+* `500`: A component or unit is in a failed state.
+* `503`: The agent coordinator is unresponsive.
+
+You can pass a `failon` parameter to the `/liveness` endpoint to determine what component state will result in a `500` status. For example, `curl 'localhost:6792/liveness?failon=degraded'` will return `500` if a component is in a degraded state.
+
+The possible values for `failon` are:
+
+* `degraded`: Return an error if a component is in a degraded state or failed state, or if the agent coordinator is unresponsive.
+* `failed`: Return an error if a unit is in a failed state, or if the agent coordinator is unresponsive.
+* `heartbeat`: Return an error only if the agent coordinator is unresponsive.
+
+If no `failon` parameter is provided, the default `failon` behavior is `heartbeat`.
+
+The HTTP monitoring endpoint can also be link:https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request[used with Kubernetes], to restart the container for example.
+
+When you enable this setting, you need to provide the host URL and port where the endpoint can be accessed. Using the default `localhost` is recommended.
+
+When the HTTP monitoring endpoint is enabled you can also select to **Enable profiling at `/debug/pprof`**. This controls whether the {agent} exposes the `/debug/pprof/` endpoints together with the monitoring endpoints.
+
+The heap profiles available from `/debug/pprof/` are included in <<elastic-agent-diagnostics-command,{agent} diagnostics>> by default. CPU profiles are also included when the `--cpu-profile` option is included. For full details about the profiles exposed by `/debug/pprof/` refer to the link:https://pkg.go.dev/net/http/pprof[pprof package documentation].
+
+Profiling at `/debug/pprof` is disabled by default. Data produced by these endpoints can be useful for debugging but present a security risk. It's recommended to leave this option disabled if the monitoring endpoint is accessible over a network.
+
+**Diagnostics rate limiting**
+
+You can set a rate limit for the action handler for diagnostics requests coming from {fleet}. The setting affects only {fleet}-managed {agents}. By default, requests are limited to an interval of `1m` and a burst value of `1`. This setting does not affect diagnostics collected through the CLI.
 
-**Diagnostics rate limiting**::
-You can set a rate limit for the request diagnostics action handler. By default requests are limited to an interval of `1m` and a burst value of `1`. This setting does not affect diagnostics collected through the CLI.
+**Diagnostics file upload**
 
-**Diagnostics file upload**::
-This setting configures retries for the file upload client. By default, a maximum of `10` retries are allowed with an initial duration of `1s` and a backoff duration of `1m`. The client may retry failed requests with exponential backoff.
+This setting configures retries for the file upload client handling diagnostics requests coming from {fleet}. The setting affects only {fleet}-managed {agents}. By default, a maximum of `10` retries are allowed with an initial duration of `1s` and a backoff duration of `1m`. The client may retry failed requests with exponential backoff.
 
 [discrete]
 [[change-policy-output]]
diff --git a/docs/en/ingest-management/commands.asciidoc b/docs/en/ingest-management/commands.asciidoc
index a2818f3d6..6888adf82 100644
--- a/docs/en/ingest-management/commands.asciidoc
+++ b/docs/en/ingest-management/commands.asciidoc
@@ -77,7 +77,7 @@ This command is intended for debugging purposes only. The output format and stru
 [source,shell]
 ----
 elastic-agent diagnostics [--file <string>]
-                          [-p]
+                          [--cpu-profile]
                           [--exclude-events]
                           [--help]
                           [global-flags]
@@ -92,9 +92,12 @@ Specifies the output archive name. Defaults to `elastic-agent-diagnostics-<times
 `--help`::
 Show help for the `diagnostics` command.
 
-`-p`::
+`--cpu-profile`::
 Additionally runs a 30-second CPU profile on each running component. This will generate an additional `cpu.pprof` file for each component.
 
+`--p`::
+Alias for `--cpu-profile`.
+
 `--exclude-events`::
 Exclude the events log files from the diagnostics archive.