Skip to content

Commit 0d579b6

Browse files
Merge branch 'main' into 303-query-rules-gui
2 parents 6ae8d76 + 201f98f commit 0d579b6

File tree

4 files changed

+97
-5
lines changed

4 files changed

+97
-5
lines changed

reference/fleet/agent-policy.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
navigation_title: Policies
33
mapped_pages:
44
- https://www.elastic.co/guide/en/fleet/current/agent-policy.html
5+
applies_to:
6+
stack: ga
57
products:
68
- id: fleet
79
- id: elastic-agent
@@ -55,6 +57,7 @@ Hosted policies display a lock icon in the {{fleet}} UI, and actions are restric
5557
| [Edit or delete a policy](#policy-main-settings) | ![yes](images/green-check.svg "") | ![no](images/red-x.svg "") |
5658
| [Add custom fields](#add-custom-fields) | ![yes](images/green-check.svg "") | ![no](images/red-x.svg "") |
5759
| [Configure agent monitoring](#change-policy-enable-agent-monitoring) | ![yes](images/green-check.svg "") | ![no](images/red-x.svg "") |
60+
| [Configure an automatic {{agent}} upgrade](#agent-policy-automatic-agent-upgrade) {applies_to}`stack: ga 9.1.0` | ![yes](images/green-check.svg "") | ![no](images/red-x.svg "") |
5861
| [Change the output of a policy](#change-policy-output) | ![yes](images/green-check.svg "") | ![no](images/red-x.svg "") |
5962
| [Add a {{fleet-server}} to a policy](#add-fleet-server-to-policy) | ![yes](images/green-check.svg "") | ![no](images/red-x.svg "") |
6063
| [Configure secret values in a policy](#agent-policy-secret-values) | ![yes](images/green-check.svg "") | ![no](images/red-x.svg "") |
@@ -260,6 +263,19 @@ You can set a rate limit for the action handler for diagnostics requests coming
260263
This setting configures retries for the file upload client handling diagnostics requests coming from {{fleet}}. The setting affects only {{fleet}}-managed {{agents}}. By default, a maximum of `10` retries are allowed with an initial duration of `1s` and a backoff duration of `1m`. The client may retry failed requests with exponential backoff.
261264

262265

266+
## Configure an automatic {{agent}} upgrade [#agent-policy-automatic-agent-upgrade]
267+
268+
```{applies_to}
269+
stack: ga 9.1.0
270+
```
271+
272+
For a high-scale deployment of {{fleet}}, you can configure an automatic, gradual rollout of a new minor or patch version to a percentage of the {{agents}} in your policy. For more information, refer to [Auto-upgrade agents enrolled in a policy](/reference/fleet/upgrade-elastic-agent.md#auto-upgrade-agents).
273+
274+
::::{note}
275+
This feature is only available for certain subscription levels. For more information, refer to [{{stack}} subscriptions](https://www.elastic.co/subscriptions).
276+
::::
277+
278+
263279
## Change the output of a policy [change-policy-output]
264280

265281
Assuming your [{{stack}} subscription level](https://www.elastic.co/subscriptions) supports per-policy outputs, you can change the output of a policy to send data to a different output.

reference/fleet/upgrade-elastic-agent.md

Lines changed: 67 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
navigation_title: Upgrade {{agent}}s
33
mapped_pages:
44
- https://www.elastic.co/guide/en/fleet/current/upgrade-elastic-agent.html
5+
applies_to:
6+
stack: ga
57
products:
68
- id: fleet
79
- id: elastic-agent
@@ -44,7 +46,7 @@ These restrictions apply whether you are upgrading {{agents}} individually or in
4446

4547
## Upgrading {{agent}} [upgrade-agent]
4648

47-
To upgrade your {{agent}}s, go to **Management > {{fleet}} > Agents** in {{kib}}. You can perform the following upgrade-related actions:
49+
To upgrade your {{agents}}, go to **Management****{{fleet}}****Agents** in {{kib}}. You can perform the following upgrade-related actions:
4850

4951
| User action | Result |
5052
| --- | --- |
@@ -55,6 +57,8 @@ To upgrade your {{agent}}s, go to **Management > {{fleet}} > Agents** in {{kib}}
5557
| [Restart an upgrade for a single agent](#restart-upgrade-single) | Restart an upgrade process that has stalled for a single agent. |
5658
| [Restart an upgrade for multiple agents](#restart-upgrade-multiple) | Do a bulk restart of the upgrade process for a set of agents. |
5759

60+
With the right [subscription level](https://www.elastic.co/subscriptions), you can also configure an automatic, gradual upgrade of a percentage of the {{agents}} enrolled in an {{agent}} policy. For more information, refer to [Auto-upgrade agents enrolled in a policy](#auto-upgrade-agents). {applies_to}`stack: ga 9.1.0`
61+
5862

5963
## Upgrade a single {{agent}} [upgrade-an-agent]
6064

@@ -84,7 +88,6 @@ To upgrade your {{agent}}s, go to **Management > {{fleet}} > Agents** in {{kib}}
8488
:::
8589

8690

87-
8891
## Do a rolling upgrade of multiple {{agent}}s [rolling-agent-upgrade]
8992

9093
You can do rolling upgrades to avoid exhausting network resources when updating a large number of {{agent}}s.
@@ -182,7 +185,6 @@ If an upgrade fails, you can view the agent logs to find the reason:
182185
:::
183186

184187

185-
186188
## Restart an upgrade for a single agent [restart-upgrade-single]
187189

188190
An {{agent}} upgrade process may sometimes stall. This can happen for various reasons, including, for example, network connectivity issues or a delayed shutdown.
@@ -217,6 +219,68 @@ When the upgrade process for multiple agents has been detected to have stalled,
217219
5. Restart the upgrades.
218220

219221

222+
## Auto-upgrade agents enrolled in a policy [auto-upgrade-agents]
223+
224+
```{applies_to}
225+
stack: ga 9.1.0
226+
```
227+
228+
::::{note}
229+
This feature is only available for certain subscription levels. For more information, refer to [{{stack}} subscriptions](https://www.elastic.co/subscriptions).
230+
::::
231+
232+
To configure an automatic rollout of a new minor or patch version to a percentage of the agents enrolled in your {{agent}} policy. follow these steps:
233+
234+
1. In {{kib}}, go to **Management****{{fleet}}****Agent policies**.
235+
2. Select the agent policy for which you want to configure an automatic agent upgrade.
236+
3. On the agent policy's details page, find **Auto-upgrade agents**, and select **Manage** next to it.
237+
4. In the **Manage auto-upgrade agents** window, click **Add target version**.
238+
5. From the **Target agent version** dropdown, select the minor or patch version to which you want to upgrade a percentage of your agents.
239+
6. In the **% of agents to upgrade** field, enter the percentage of active agents you want to upgrade to this target version.
240+
241+
Note that:
242+
- Unenrolling, unenrolled, inactive, and uninstalled agents are not included in the count. For example, if you set the target upgrade percentage to 50% for a policy with 10 active agents and 10 inactive agents, the target is met when 5 active agents are upgraded.
243+
- Rounding is applied, and the actual percentage of the upgraded agents may vary slightly. For example, if you set the target upgrade percentage to 30% for a policy with 25 active agents, the target is met when 8 active agents are upgraded (32%).
244+
245+
7. You can then add a different target version, and specify the percentage of agents you want to be upgraded to that version. The total percentage of agents to be upgraded cannot exceed 100%.
246+
8. Click **Save**.
247+
248+
Once the configuration is saved, an asynchronous task runs every 30 minutes, gradually upgrading the agents in the policy to the specified target version.
249+
250+
In case of any failed upgrades, the upgrades are retried with exponential backoff mechanism until the upgrade is successful, or the maximum number of retries is reached. Note that the maximum number of retries is the number of [configured retry delays](#auto-upgrade-settings).
251+
252+
::::{note}
253+
Only active agents enrolled in the policy are considered for the automatic upgrade.
254+
255+
If new agents are assigned to the policy, the number of {{agents}} to be upgraded is adjusted according to the set percentages.
256+
::::
257+
258+
### Configure the auto-upgrade settings [auto-upgrade-settings]
259+
260+
On self-managed and cloud deployments of {{stack}}, you can configure the default task interval and the retry delays of the automatic upgrade in the [{{kib}} {{fleet}} settings](kibana://reference/configuration-reference/fleet-settings.md). For example:
261+
262+
```yml
263+
xpack.fleet.autoUpgrades.taskInterval: 15m <1>
264+
xpack.fleet.autoUpgrades.retryDelays: ['5m', '10m', '20m'] <2>
265+
```
266+
1. The time interval at which the auto-upgrade task should run. Defaults to `30m`.
267+
2. Array indicating how much time should pass before a failed auto-upgrade is retried. The array's length indicates the maximum number of retries. Defaults to `['30m', '1h', '2h', '4h', '8h', '16h', '24h']`.
268+
269+
For more information, refer to the [Kibana configuration reference](kibana://reference/configuration-reference.md).
270+
271+
### View the status of the automatic upgrade [auto-upgrade-view-status]
272+
273+
You can view the status of the automatic upgrade in the following ways:
274+
275+
- On the agent policy's details page, find **Auto-upgrade agents**, and select **Manage** to open the **Manage auto-upgrade agents** window.
276+
277+
The status of the upgrade is displayed next to the specified target version and percentage, and includes the percentage of agents that have already been upgraded.
278+
279+
To view any failed upgrades, hover over the **Upgrade failed** status, then click **Go to upgrade**.
280+
281+
- On the **{{fleet}}** → **Agents** page, click **Agent activity** to open a flyout showing logs of the {{agent}} activity and the progress of the automatic agent upgrade.
282+
283+
220284
## Upgrade RPM and DEB system packages [upgrade-system-packages]
221285

222286
If you have installed and enrolled {{agent}} using either a DEB (for a Debian-based Linux distribution) or RPM (for a RedHat-based Linux distribution) install package, the upgrade cannot be managed by {{fleet}}. Instead, you can perform the upgrade using these steps.

solutions/observability/apm/tail-based-sampling.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,18 @@ Policies map trace events to a sample rate. Each policy must specify a sample ra
8585
| APM Server binary | `apm-server.sampling.tail.policies` |
8686
| Fleet-managed | `Policies` |
8787

88+
### Discard On Write Failure [sampling-tail-discard-on-write-failure-ref]
89+
90+
Defines the indexing behavior when trace events fail to be written to storage (for example, when the storage limit is reached). When set to `false`, traces bypass sampling and are always indexed, which significantly increases the indexing load. When set to `true`, traces are discarded, causing data loss which can result in broken traces. The default is `false`.
91+
92+
Default: `false`. (bool)
93+
94+
| | |
95+
|------------------------------|------------------------------------------|
96+
| APM Server binary | `apm-server.sampling.tail.discard_on_write_failure` |
97+
| Fleet-managed {applies_to}`stack: ga 9.1` | `Discard On Write Failure` |
98+
99+
88100
### Storage limit [sampling-tail-storage_limit-ref]
89101

90102
The amount of storage space allocated for trace events matching tail sampling policies. Caution: Setting this limit higher than the allowed space may cause APM Server to become unhealthy.
@@ -93,7 +105,7 @@ A value of `0GB` (or equivalent) does not set a concrete limit, but rather allow
93105

94106
If this is not desired, a concrete `GB` value can be set for the maximum amount of disk used for tail-based sampling.
95107

96-
If the configured storage limit is insufficient, it logs "configured limit reached". The event will bypass sampling and will always be indexed when storage limit is reached.
108+
If the configured storage limit is insufficient, it logs "configured limit reached". When the storage limit is reached, the event will be indexed or discarded based on the [Discard On Write Failure](#sampling-tail-discard-on-write-failure-ref) configuration.
97109

98110
Default: `0GB`. (text)
99111

solutions/observability/apm/transaction-sampling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ Due to [OpenTelemetry tail-based sampling limitations](/solutions/observability/
146146

147147
Tail-based sampling (TBS), by definition, requires storing events locally temporarily, such that they can be retrieved and forwarded when a sampling decision is made.
148148

149-
In an APM Server implementation, the events are stored temporarily on disk instead of in memory for better scalability. Therefore, it requires local disk storage proportional to the APM event ingestion rate and additional memory to facilitate disk reads and writes. If the [storage limit](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-storage_limit-ref) is insufficient, sampling will be bypassed.
149+
In an APM Server implementation, the events are stored temporarily on disk instead of in memory for better scalability. Therefore, it requires local disk storage proportional to the APM event ingestion rate and additional memory to facilitate disk reads and writes. If the [storage limit](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-storage_limit-ref) is insufficient, trace events are indexed or discarded based on the [discard on write failure](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-discard-on-write-failure-ref) configuration.
150150

151151
It is recommended to use fast disks, ideally Solid State Drives (SSD) with high I/O per second (IOPS), when enabling tail-based sampling. Disk throughput and I/O may become performance bottlenecks for tail-based sampling and APM event ingestion overall. Disk writes are proportional to the event ingest rate, while disk reads are proportional to both the event ingest rate and the sampling rate.
152152

0 commit comments

Comments
 (0)