-
Notifications
You must be signed in to change notification settings - Fork 156
Add troubleshooting guides for SDK and Collector sampling configuration #3200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
88ef429
Add troubleshooting guides for SDK and Collector sampling configuration
alexandra5000 7488569
Apply peer comments
alexandra5000 38b5e8b
update "applies to" tag
alexandra5000 28d98c7
Fix intro on both pages
alexandra5000 b17499b
Merge branch 'main' into sampling
alexandra5000 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
76 changes: 76 additions & 0 deletions
76
...leshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
--- | ||
navigation_title: Troubleshoot Collector sampling configuration | ||
description: Learn how to troubleshoot missing or incomplete traces in the EDOT Collector caused by sampling configuration. | ||
applies_to: | ||
serverless: all | ||
product: | ||
edot_collector: ga | ||
products: | ||
- id: observability | ||
- id: edot-collector | ||
--- | ||
|
||
# Missing or incomplete traces due to Collector sampling | ||
|
||
If traces or spans are missing in Kibana, the issue might be related to the Collector’s sampling configuration. Tail-based sampling in the Collector can reduce trace volume if policies are too strict or misconfigured. | ||
alexandra5000 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. See [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) for more information. | ||
|
||
## Symptoms | ||
|
||
- Only a small subset of traces reaches Elasticsearch/Kibana, even though SDKs are exporting spans. | ||
alexandra5000 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
- Error traces are missing because they’re not explicitly included in the `sampling_policy`. | ||
- Collector logs show dropped spans. | ||
|
||
## Causes | ||
|
||
alexandra5000 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Tail sampling policies in the Collector are too narrow or restrictive. | ||
- The default rule set excludes key transaction types (for example long-running requests, non-error transactions). | ||
- Differences between head sampling (SDK) and tail sampling (Collector) can lead to fewer traces being available for evaluation. | ||
- Conflicting or overlapping `sampling_policy` rules might result in unexpected drops. | ||
- High load: the Collector might drop traces if it can’t evaluate policies fast enough. | ||
|
||
## Resolution | ||
|
||
Follow these steps to resolve sampling configuration issues: | ||
|
||
::::{stepper} | ||
|
||
:::{step} Review `sampling_policy` configuration | ||
|
||
- Check the `processor/tailsampling` section of your Collector configuration | ||
- Ensure policies are broad enough to capture the traces you need | ||
alexandra5000 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
::: | ||
|
||
:::{step} Add explicit rules for critical traces | ||
|
||
- Create specific rules for important trace types | ||
- Example: keep all error traces, 100% of login requests, and 10% of everything else | ||
- Use attributes like `status_code`, `operation`, or `service.name` to fine-tune rules | ||
::: | ||
|
||
:::{step} Validate Collector logs | ||
|
||
- Review Collector logs for messages about dropped spans, and determine whether drops are due to sampling policy outcomes or resource limits | ||
::: | ||
|
||
:::{step} Differentiate head vs. tail sampling | ||
alexandra5000 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
- Review if SDKs apply head sampling, which reduces traces available for tail sampling | ||
alexandra5000 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
- Consider setting SDKs to `always_on` and managing sampling centrally in the Collector for more flexibility | ||
::: | ||
|
||
:::{step} Test in staging | ||
|
||
- Adjust sampling policies incrementally in a staging environment | ||
- Monitor trace volume before and after changes | ||
- Validate that critical traces are captured as expected | ||
::: | ||
|
||
:::: | ||
|
||
## Resources | ||
|
||
- [Tail sampling processor (Collector)](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor) | ||
- [OpenTelemetry sampling concepts - contrib documentation](https://opentelemetry.io/docs/concepts/sampling/) | ||
- [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) |
80 changes: 80 additions & 0 deletions
80
troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
--- | ||
navigation_title: Troubleshoot SDK sampling configuration | ||
description: Learn how to troubleshoot missing or incomplete traces in EDOT SDKs caused by head sampling configuration. | ||
applies_to: | ||
serverless: all | ||
product: | ||
elastic-otel-sdk: ga | ||
products: | ||
- id: observability | ||
- id: edot-sdk | ||
--- | ||
|
||
# Missing or incomplete traces due to SDK sampling | ||
|
||
If traces or spans are missing in Kibana, the issue might be related to SDK-level sampling configuration. By default, SDKs use head-based sampling, meaning the decision to record or drop a trace is made when the trace is first created. | ||
|
||
Both SDK-level and Collector-based sampling can result in gaps in telemetry if misconfigured. See [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) for more details. | ||
alexandra5000 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
## Symptoms | ||
|
||
alexandra5000 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Only a small subset of traces appears in Kibana, even under light traffic. | ||
alexandra5000 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
- Transactions look incomplete because some spans are missing. | ||
- Trace volume is unexpectedly low compared to logs or metrics. | ||
|
||
## Causes | ||
|
||
- Head sampling at the SDK level drops traces before they're exported. | ||
- Default sampling rates (for example `1/100` or `1/1000`) might be too low for your workload. | ||
- Environment variables like `OTEL_TRACES_SAMPLER` or `OTEL_TRACES_SAMPLER_ARG` are not set, not recognized, or formatted in a way the SDK doesn't support. | ||
- Inconsistent configuration across services can lead to fragmented or incomplete traces. | ||
- Some SDKs enforce stricter formats for sampler arguments, which can cause values to be ignored if not matched precisely. | ||
|
||
## Resolution | ||
|
||
Follow these steps to resolve SDK sampling configuration issues: | ||
|
||
::::{stepper} | ||
|
||
:::{step} Check SDK environment variables | ||
|
||
- Confirm that `OTEL_TRACES_SAMPLER` and `OTEL_TRACES_SAMPLER_ARG` are set correctly | ||
- For testing, you can temporarily set: | ||
```bash | ||
export OTEL_TRACES_SAMPLER=always_on | ||
``` | ||
- In production, consider using `parentbased_traceidratio` with an explicit ratio | ||
::: | ||
|
||
:::{step} Align configuration across services | ||
|
||
- Use consistent sampling configuration across all instrumented services to help avoid dropped child spans or fragmented traces | ||
::: | ||
|
||
:::{step} Adjust sampling ratios for your traffic | ||
|
||
- For low-traffic applications, avoid extremely low ratios (such as `1/1000`) | ||
|
||
For example, the following configuration samples ~20% of traces: | ||
|
||
```bash | ||
export OTEL_TRACES_SAMPLER=parentbased_traceidratio | ||
export OTEL_TRACES_SAMPLER_ARG=0.2 | ||
``` | ||
::: | ||
|
||
:::{step} Use Collector tail sampling for advanced scenarios | ||
|
||
- Head sampling can't evaluate the full trace context before making a decision | ||
- For more control (for example "keep all errors, sample 10% of successes"), use Collector tail sampling | ||
alexandra5000 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
For more information, refer to [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md). | ||
::: | ||
|
||
:::: | ||
|
||
## Resources | ||
|
||
- [OTEL_TRACES_SAMPLER environment variable specifications](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#otel_traces_sampler) | ||
- [OpenTelemetry sampling concepts - contrib documentation](https://opentelemetry.io/docs/concepts/sampling/) | ||
- [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.