Skip to content

Commit d8a9aff

Browse files
isaacaflores2carsonipflorent-leborgne
authored
apm: Document sampling.tail.discard_on_write_failure config (#1453)
Document `sampling.tail.discard_on_write_failure` config. I sourced the config explanation from [here](https://github.com/elastic/apm-server/blob/613b774ba953d159584c666cd7d0753404374318/x-pack/apm-server/sampling/config.go#L115-L118) please let me know if the description is incorrect or unclear in any way. Updated pages can be found in the docs preview here: - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/1453/reference/apm/cloud/apm-settings - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/1453/solutions/observability/apm/tail-based-sampling - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/1453/solutions/observability/apm/configure-apm-server - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/1453/solutions/observability/apm/transaction-sampling#_tail_based_sampling_performance_and_requirements ## Checklist - [ ] Wait for PR #1269 to be merged and incorporate changes. ## Related issues Part of elastic/apm-server#15330 --------- Co-authored-by: Carson Ip <[email protected]> Co-authored-by: florent-leborgne <[email protected]>
1 parent 15b592b commit d8a9aff

File tree

2 files changed

+14
-2
lines changed

2 files changed

+14
-2
lines changed

solutions/observability/apm/tail-based-sampling.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,18 @@ Policies map trace events to a sample rate. Each policy must specify a sample ra
8585
| APM Server binary | `apm-server.sampling.tail.policies` |
8686
| Fleet-managed | `Policies` |
8787

88+
### Discard On Write Failure [sampling-tail-discard-on-write-failure-ref]
89+
90+
Defines the indexing behavior when trace events fail to be written to storage (for example, when the storage limit is reached). When set to `false`, traces bypass sampling and are always indexed, which significantly increases the indexing load. When set to `true`, traces are discarded, causing data loss which can result in broken traces. The default is `false`.
91+
92+
Default: `false`. (bool)
93+
94+
| | |
95+
|------------------------------|------------------------------------------|
96+
| APM Server binary | `apm-server.sampling.tail.discard_on_write_failure` |
97+
| Fleet-managed {applies_to}`stack: ga 9.1` | `Discard On Write Failure` |
98+
99+
88100
### Storage limit [sampling-tail-storage_limit-ref]
89101

90102
The amount of storage space allocated for trace events matching tail sampling policies. Caution: Setting this limit higher than the allowed space may cause APM Server to become unhealthy.
@@ -93,7 +105,7 @@ A value of `0GB` (or equivalent) does not set a concrete limit, but rather allow
93105

94106
If this is not desired, a concrete `GB` value can be set for the maximum amount of disk used for tail-based sampling.
95107

96-
If the configured storage limit is insufficient, it logs "configured limit reached". The event will bypass sampling and will always be indexed when storage limit is reached.
108+
If the configured storage limit is insufficient, it logs "configured limit reached". When the storage limit is reached, the event will be indexed or discarded based on the [Discard On Write Failure](#sampling-tail-discard-on-write-failure-ref) configuration.
97109

98110
Default: `0GB`. (text)
99111

solutions/observability/apm/transaction-sampling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ Due to [OpenTelemetry tail-based sampling limitations](/solutions/observability/
146146

147147
Tail-based sampling (TBS), by definition, requires storing events locally temporarily, such that they can be retrieved and forwarded when a sampling decision is made.
148148

149-
In an APM Server implementation, the events are stored temporarily on disk instead of in memory for better scalability. Therefore, it requires local disk storage proportional to the APM event ingestion rate and additional memory to facilitate disk reads and writes. If the [storage limit](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-storage_limit-ref) is insufficient, sampling will be bypassed.
149+
In an APM Server implementation, the events are stored temporarily on disk instead of in memory for better scalability. Therefore, it requires local disk storage proportional to the APM event ingestion rate and additional memory to facilitate disk reads and writes. If the [storage limit](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-storage_limit-ref) is insufficient, trace events are indexed or discarded based on the [discard on write failure](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-discard-on-write-failure-ref) configuration.
150150

151151
It is recommended to use fast disks, ideally Solid State Drives (SSD) with high I/O per second (IOPS), when enabling tail-based sampling. Disk throughput and I/O may become performance bottlenecks for tail-based sampling and APM event ingestion overall. Disk writes are proportional to the event ingest rate, while disk reads are proportional to both the event ingest rate and the sampling rate.
152152

0 commit comments

Comments
 (0)