-
Notifications
You must be signed in to change notification settings - Fork 159
APM Server TBS FAQ #2330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APM Server TBS FAQ #2330
Changes from 8 commits
9416018
a0b53ee
cada310
bc7c109
391307f
a7fa5ee
a48c34c
ed45c5f
5f8bc1a
a1ec189
8dc793b
834b595
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -189,3 +189,37 @@ This metric can also be used to get an estimate of the storage requirements for | |
### `apm-server.sampling.tail.storage.value_log_size` [sampling-tail-monitoring-storage-value-log-size-ref] | ||
|
||
This metric tracks the storage size for value log files used by the previous implementation of a tail-based sampler. This metric was deprecated in 9.0.0 and should always report `0`. | ||
|
||
## Frequently Asked Questions (FAQ) [sampling-tail-faq-ref] | ||
|
||
:::{dropdown} Why does the sampling rate shown in Storage Explorer not match the configured tail sampling rate? | ||
|
||
In APM Server, the tail sampling policy applied to a distributed trace is determined by evaluating the configured policies in order against the root transaction (the transaction without a parent) and using the first policy that matches. In contrast, the APM UI Storage Explorer calculates the effective average sampling rate for each service using a different method. It considers both head-based and tail-based sampling, but does not account for root transactions. As a result, the sampling rate displayed in Storage Explorer may differ from the configured tail sampling rate, which can give the false impression that tail-based sampling is not functioning correctly. | ||
|
||
|
||
For more information, see the related [Kibana issue](https://github.com/elastic/kibana/issues/226600). | ||
carsonip marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
::: | ||
|
||
:::{dropdown} Why do transactions disappear after enabling tail-based sampling? | ||
|
||
If a transaction is consistently not sampled after enabling tail-based sampling, verify that your instrumentation is not missing root transactions (transactions without a parent). APM Server makes sampling decisions when a distributed trace ends, which occurs when the root transaction ends. If the root transaction is not received by APM Server, it cannot make a sampling decision and will silently drop all associated trace events. | ||
|
||
This issue often arises when it is assumed that a particular service (e.g., service A) always produces the root transaction, but in reality, another service (e.g., service B) may precede it. If service B is not instrumented or sends data to a different APM Server cluster, the root transaction will be missing. To resolve this, ensure that all relevant services are instrumented and send data to the same APM Server cluster, or adjust the trace continuation strategy accordingly. | ||
|
||
To identify traces missing a root transaction, run the following ESQL query during a period when tail-based sampling is disabled. Use a short time range to limit the number of results: | ||
carsonip marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
``` | ||
FROM "traces-apm-*" | ||
| STATS total_docs = COUNT(*), total_child_docs = COUNT(parent.id) BY trace.id, transaction.id | ||
| WHERE total_docs == total_child_docs | ||
| KEEP trace.id, transaction.id | ||
``` | ||
::: | ||
|
||
:::{dropdown} Why is configured tail sampling rate ignored and trace always sampled, causing unexpected load to Elasticsearch? | ||
carsonip marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
When the storage limit for tail-based sampling is reached, APM Server will log "configured limit reached" (or "configured storage limit reached" in version 8) as it cannot store new trace events for sampling. By default, traces bypass sampling and are always indexed (sampling rate becomes 100%). This can cause a sudden increase in indexing load, potentially overloading Elasticsearch, as it must process all incoming traces instead of only the sampled subset. | ||
|
||
To mitigate this risk, enable the [`discard_on_write_failure`](#sampling-tail-discard-on-write-failure-ref) setting. When set to `true`, APM Server discards traces that cannot be written due to storage or indexing failures, rather than indexing them all. This helps protect Elasticsearch from excessive load. Note that enabling this option can result in data loss and broken traces, so it should be used with caution and only when system stability is a priority. | ||
|
||
For more information, see the [Discard On Write Failure](#sampling-tail-discard-on-write-failure-ref) section. | ||
carsonip marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
::: |
Uh oh!
There was an error while loading. Please reload this page.