You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[Configure an automatic {{agent}} upgrade](#agent-policy-automatic-agent-upgrade) {applies_to}`stack: ga 9.1.0`|||
58
61
|[Change the output of a policy](#change-policy-output)|||
59
62
|[Add a {{fleet-server}} to a policy](#add-fleet-server-to-policy)|||
60
63
|[Configure secret values in a policy](#agent-policy-secret-values)|||
@@ -260,6 +263,19 @@ You can set a rate limit for the action handler for diagnostics requests coming
260
263
This setting configures retries for the file upload client handling diagnostics requests coming from {{fleet}}. The setting affects only {{fleet}}-managed {{agents}}. By default, a maximum of `10` retries are allowed with an initial duration of `1s` and a backoff duration of `1m`. The client may retry failed requests with exponential backoff.
261
264
262
265
266
+
## Configure an automatic {{agent}} upgrade [#agent-policy-automatic-agent-upgrade]
267
+
268
+
```{applies_to}
269
+
stack: ga 9.1.0
270
+
```
271
+
272
+
For a high-scale deployment of {{fleet}}, you can configure an automatic, gradual rollout of a new minor or patch version to a percentage of the {{agents}} in your policy. For more information, refer to [Auto-upgrade agents enrolled in a policy](/reference/fleet/upgrade-elastic-agent.md#auto-upgrade-agents).
273
+
274
+
::::{note}
275
+
This feature is only available for certain subscription levels. For more information, refer to [{{stack}} subscriptions](https://www.elastic.co/subscriptions).
276
+
::::
277
+
278
+
263
279
## Change the output of a policy [change-policy-output]
264
280
265
281
Assuming your [{{stack}} subscription level](https://www.elastic.co/subscriptions) supports per-policy outputs, you can change the output of a policy to send data to a different output.
@@ -44,7 +46,7 @@ These restrictions apply whether you are upgrading {{agents}} individually or in
44
46
45
47
## Upgrading {{agent}} [upgrade-agent]
46
48
47
-
To upgrade your {{agent}}s, go to **Management > {{fleet}} > Agents** in {{kib}}. You can perform the following upgrade-related actions:
49
+
To upgrade your {{agents}}, go to **Management** → **{{fleet}}** → **Agents** in {{kib}}. You can perform the following upgrade-related actions:
48
50
49
51
| User action | Result |
50
52
| --- | --- |
@@ -55,6 +57,8 @@ To upgrade your {{agent}}s, go to **Management > {{fleet}} > Agents** in {{kib}}
55
57
|[Restart an upgrade for a single agent](#restart-upgrade-single)| Restart an upgrade process that has stalled for a single agent. |
56
58
|[Restart an upgrade for multiple agents](#restart-upgrade-multiple)| Do a bulk restart of the upgrade process for a set of agents. |
57
59
60
+
With the right [subscription level](https://www.elastic.co/subscriptions), you can also configure an automatic, gradual upgrade of a percentage of the {{agents}} enrolled in an {{agent}} policy. For more information, refer to [Auto-upgrade agents enrolled in a policy](#auto-upgrade-agents). {applies_to}`stack: ga 9.1.0`
61
+
58
62
59
63
## Upgrade a single {{agent}} [upgrade-an-agent]
60
64
@@ -84,7 +88,6 @@ To upgrade your {{agent}}s, go to **Management > {{fleet}} > Agents** in {{kib}}
84
88
:::
85
89
86
90
87
-
88
91
## Do a rolling upgrade of multiple {{agent}}s [rolling-agent-upgrade]
89
92
90
93
You can do rolling upgrades to avoid exhausting network resources when updating a large number of {{agent}}s.
@@ -182,7 +185,6 @@ If an upgrade fails, you can view the agent logs to find the reason:
182
185
:::
183
186
184
187
185
-
186
188
## Restart an upgrade for a single agent [restart-upgrade-single]
187
189
188
190
An {{agent}} upgrade process may sometimes stall. This can happen for various reasons, including, for example, network connectivity issues or a delayed shutdown.
@@ -217,6 +219,68 @@ When the upgrade process for multiple agents has been detected to have stalled,
217
219
5. Restart the upgrades.
218
220
219
221
222
+
## Auto-upgrade agents enrolled in a policy [auto-upgrade-agents]
223
+
224
+
```{applies_to}
225
+
stack: ga 9.1.0
226
+
```
227
+
228
+
::::{note}
229
+
This feature is only available for certain subscription levels. For more information, refer to [{{stack}} subscriptions](https://www.elastic.co/subscriptions).
230
+
::::
231
+
232
+
To configure an automatic rollout of a new minor or patch version to a percentage of the agents enrolled in your {{agent}} policy. follow these steps:
233
+
234
+
1. In {{kib}}, go to **Management** → **{{fleet}}** → **Agent policies**.
235
+
2. Select the agent policy for which you want to configure an automatic agent upgrade.
236
+
3. On the agent policy's details page, find **Auto-upgrade agents**, and select **Manage** next to it.
237
+
4. In the **Manage auto-upgrade agents** window, click **Add target version**.
238
+
5. From the **Target agent version** dropdown, select the minor or patch version to which you want to upgrade a percentage of your agents.
239
+
6. In the **% of agents to upgrade** field, enter the percentage of active agents you want to upgrade to this target version.
240
+
241
+
Note that:
242
+
- Unenrolling, unenrolled, inactive, and uninstalled agents are not included in the count. For example, if you set the target upgrade percentage to 50% for a policy with 10 active agents and 10 inactive agents, the target is met when 5 active agents are upgraded.
243
+
- Rounding is applied, and the actual percentage of the upgraded agents may vary slightly. For example, if you set the target upgrade percentage to 30% for a policy with 25 active agents, the target is met when 8 active agents are upgraded (32%).
244
+
245
+
7. You can then add a different target version, and specify the percentage of agents you want to be upgraded to that version. The total percentage of agents to be upgraded cannot exceed 100%.
246
+
8. Click **Save**.
247
+
248
+
Once the configuration is saved, an asynchronous task runs every 30 minutes, gradually upgrading the agents in the policy to the specified target version.
249
+
250
+
In case of any failed upgrades, the upgrades are retried with exponential backoff mechanism until the upgrade is successful, or the maximum number of retries is reached. Note that the maximum number of retries is the number of [configured retry delays](#auto-upgrade-settings).
251
+
252
+
::::{note}
253
+
Only active agents enrolled in the policy are considered for the automatic upgrade.
254
+
255
+
If new agents are assigned to the policy, the number of {{agents}} to be upgraded is adjusted according to the set percentages.
256
+
::::
257
+
258
+
### Configure the auto-upgrade settings [auto-upgrade-settings]
259
+
260
+
On self-managed and cloud deployments of {{stack}}, you can configure the default task interval and the retry delays of the automatic upgrade in the [{{kib}} {{fleet}} settings](kibana://reference/configuration-reference/fleet-settings.md). For example:
1. The time interval at which the auto-upgrade task should run. Defaults to `30m`.
267
+
2. Array indicating how much time should pass before a failed auto-upgrade is retried. The array's length indicates the maximum number of retries. Defaults to `['30m', '1h', '2h', '4h', '8h', '16h', '24h']`.
268
+
269
+
For more information, refer to the [Kibana configuration reference](kibana://reference/configuration-reference.md).
270
+
271
+
### View the status of the automatic upgrade [auto-upgrade-view-status]
272
+
273
+
You can view the status of the automatic upgrade in the following ways:
274
+
275
+
- On the agent policy's details page, find **Auto-upgrade agents**, and select **Manage** to open the **Manage auto-upgrade agents** window.
276
+
277
+
The status of the upgrade is displayed next to the specified target version and percentage, and includes the percentage of agents that have already been upgraded.
278
+
279
+
To view any failed upgrades, hover over the **Upgrade failed** status, then click **Go to upgrade**.
280
+
281
+
- On the **{{fleet}}** → **Agents** page, click **Agent activity** to open a flyout showing logs of the {{agent}} activity and the progress of the automatic agent upgrade.
282
+
283
+
220
284
## Upgrade RPM and DEB system packages [upgrade-system-packages]
221
285
222
286
If you have installed and enrolled {{agent}} using either a DEB (for a Debian-based Linux distribution) or RPM (for a RedHat-based Linux distribution) install package, the upgrade cannot be managed by {{fleet}}. Instead, you can perform the upgrade using these steps.
Copy file name to clipboardExpand all lines: solutions/observability/apm/tail-based-sampling.md
+13-1Lines changed: 13 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,6 +85,18 @@ Policies map trace events to a sample rate. Each policy must specify a sample ra
85
85
| APM Server binary | `apm-server.sampling.tail.policies` |
86
86
| Fleet-managed | `Policies` |
87
87
88
+
### Discard On Write Failure [sampling-tail-discard-on-write-failure-ref]
89
+
90
+
Defines the indexing behavior when trace events fail to be written to storage (for example, when the storage limit is reached). When set to `false`, traces bypass sampling and are always indexed, which significantly increases the indexing load. When set to `true`, traces are discarded, causing data loss which can result in broken traces. The default is `false`.
The amount of storage space allocated for trace events matching tail sampling policies. Caution: Setting this limit higher than the allowed space may cause APM Server to become unhealthy.
@@ -93,7 +105,7 @@ A value of `0GB` (or equivalent) does not set a concrete limit, but rather allow
93
105
94
106
If this is not desired, a concrete `GB` value can be set for the maximum amount of disk used for tail-based sampling.
95
107
96
-
If the configured storage limit is insufficient, it logs "configured limit reached". The event will bypass sampling and will always be indexed when storage limit is reached.
108
+
If the configured storage limit is insufficient, it logs "configured limit reached". When the storage limit is reached, the event will be indexed or discarded based on the [Discard On Write Failure](#sampling-tail-discard-on-write-failure-ref) configuration.
Copy file name to clipboardExpand all lines: solutions/observability/apm/transaction-sampling.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -146,7 +146,7 @@ Due to [OpenTelemetry tail-based sampling limitations](/solutions/observability/
146
146
147
147
Tail-based sampling (TBS), by definition, requires storing events locally temporarily, such that they can be retrieved and forwarded when a sampling decision is made.
148
148
149
-
In an APM Server implementation, the events are stored temporarily on disk instead of in memory for better scalability. Therefore, it requires local disk storage proportional to the APM event ingestion rate and additional memory to facilitate disk reads and writes. If the [storage limit](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-storage_limit-ref) is insufficient, sampling will be bypassed.
149
+
In an APM Server implementation, the events are stored temporarily on disk instead of in memory for better scalability. Therefore, it requires local disk storage proportional to the APM event ingestion rate and additional memory to facilitate disk reads and writes. If the [storage limit](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-storage_limit-ref) is insufficient, trace events are indexed or discarded based on the [discard on write failure](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-discard-on-write-failure-ref) configuration.
150
150
151
151
It is recommended to use fast disks, ideally Solid State Drives (SSD) with high I/O per second (IOPS), when enabling tail-based sampling. Disk throughput and I/O may become performance bottlenecks for tail-based sampling and APM event ingestion overall. Disk writes are proportional to the event ingest rate, while disk reads are proportional to both the event ingest rate and the sampling rate.
0 commit comments