From 88ef4292770b94fb78cd4c839018c4b5e5b0ad54 Mon Sep 17 00:00:00 2001 From: Aleksandra Spilkowska Date: Fri, 26 Sep 2025 19:47:30 +0200 Subject: [PATCH 1/4] Add troubleshooting guides for SDK and Collector sampling configuration --- .../misconfigured-sampling-collector.md | 76 ++++++++++++++++++ .../edot-sdks/misconfigured-sampling-sdk.md | 80 +++++++++++++++++++ troubleshoot/ingest/opentelemetry/toc.yml | 2 + 3 files changed, 158 insertions(+) create mode 100644 troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md create mode 100644 troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md diff --git a/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md b/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md new file mode 100644 index 0000000000..b480058146 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md @@ -0,0 +1,76 @@ +--- +navigation_title: Troubleshoot Collector sampling configuration +description: Learn how to troubleshoot missing or incomplete traces in the EDOT Collector caused by sampling configuration. +applies_to: + serverless: all + product: + edot_collector: ga +products: + - id: observability + - id: edot-collector +--- + +# Missing or incomplete traces due to Collector sampling + +If traces or spans are missing in Kibana, the issue might be related to the Collector’s sampling configuration. Tail-based sampling in the Collector can reduce trace volume if policies are too strict or misconfigured. + +Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. See [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) for more information. + +## Symptoms + +- Only a small subset of traces reaches Elasticsearch/Kibana, even though SDKs are exporting spans. +- Error traces are missing because they’re not explicitly included in the `sampling_policy`. +- Collector logs show dropped spans. + +## Causes + +- Tail sampling policies in the Collector are too narrow or restrictive. +- The default rule set excludes key transaction types (for example long-running requests, non-error transactions). +- Differences between head sampling (SDK) and tail sampling (Collector) can lead to fewer traces being available for evaluation. +- Conflicting or overlapping `sampling_policy` rules might result in unexpected drops. +- High load: the Collector might drop traces if it can’t evaluate policies fast enough. + +## Resolution + +Follow these steps to resolve sampling configuration issues: + +::::{stepper} + +:::{step} Review `sampling_policy` configuration + +- Check the `processor/tailsampling` section of your Collector configuration +- Ensure policies are broad enough to capture the traces you need +::: + +:::{step} Add explicit rules for critical traces + +- Create specific rules for important trace types +- Example: keep all error traces, 100% of login requests, and 10% of everything else +- Use attributes like `status_code`, `operation`, or `service.name` to fine-tune rules +::: + +:::{step} Validate Collector logs + +- Review Collector logs for messages about dropped spans, and determine whether drops are due to sampling policy outcomes or resource limits +::: + +:::{step} Differentiate head vs. tail sampling + +- Review if SDKs apply head sampling, which reduces traces available for tail sampling +- Consider setting SDKs to `always_on` and managing sampling centrally in the Collector for more flexibility +::: + +:::{step} Test in staging + +- Adjust sampling policies incrementally in a staging environment +- Monitor trace volume before and after changes +- Validate that critical traces are captured as expected +::: + +:::: + +## Resources + +- [Tail sampling processor (Collector)](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor) +- [OpenTelemetry sampling concepts - contrib documentation](https://opentelemetry.io/docs/concepts/sampling/) +- [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) \ No newline at end of file diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md b/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md new file mode 100644 index 0000000000..96754db63b --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md @@ -0,0 +1,80 @@ +--- +navigation_title: Troubleshoot SDK sampling configuration +description: Learn how to troubleshoot missing or incomplete traces in EDOT SDKs caused by head sampling configuration. +applies_to: + serverless: all + product: + elastic-otel-sdk: ga +products: + - id: observability + - id: edot-sdk +--- + +# Missing or incomplete traces due to SDK sampling + +If traces or spans are missing in Kibana, the issue might be related to SDK-level sampling configuration. By default, SDKs use head-based sampling, meaning the decision to record or drop a trace is made when the trace is first created. + +Both SDK-level and Collector-based sampling can result in gaps in telemetry if misconfigured. See [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) for more details. + +## Symptoms + +- Only a small subset of traces appears in Kibana, even under light traffic. +- Transactions look incomplete because some spans are missing. +- Trace volume is unexpectedly low compared to logs or metrics. + +## Causes + +- Head sampling at the SDK level drops traces before they're exported. +- Default sampling rates (for example `1/100` or `1/1000`) might be too low for your workload. +- Environment variables like `OTEL_TRACES_SAMPLER` or `OTEL_TRACES_SAMPLER_ARG` are not set, not recognized, or formatted in a way the SDK doesn't support. +- Inconsistent configuration across services can lead to fragmented or incomplete traces. +- Some SDKs enforce stricter formats for sampler arguments, which can cause values to be ignored if not matched precisely. + +## Resolution + +Follow these steps to resolve SDK sampling configuration issues: + +::::{stepper} + +:::{step} Check SDK environment variables + +- Confirm that `OTEL_TRACES_SAMPLER` and `OTEL_TRACES_SAMPLER_ARG` are set correctly +- For testing, you can temporarily set: + ```bash + export OTEL_TRACES_SAMPLER=always_on + ``` +- In production, consider using `parentbased_traceidratio` with an explicit ratio +::: + +:::{step} Align configuration across services + +- Use consistent sampling configuration across all instrumented services to help avoid dropped child spans or fragmented traces +::: + +:::{step} Adjust sampling ratios for your traffic + +- For low-traffic applications, avoid extremely low ratios (such as `1/1000`) + + For example, the following configuration samples ~20% of traces: + + ```bash + export OTEL_TRACES_SAMPLER=parentbased_traceidratio + export OTEL_TRACES_SAMPLER_ARG=0.2 + ``` +::: + +:::{step} Use Collector tail sampling for advanced scenarios + +- Head sampling can't evaluate the full trace context before making a decision +- For more control (for example "keep all errors, sample 10% of successes"), use Collector tail sampling + + For more information, refer to [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md). +::: + +:::: + +## Resources + +- [OTEL_TRACES_SAMPLER environment variable specifications](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#otel_traces_sampler) +- [OpenTelemetry sampling concepts - contrib documentation](https://opentelemetry.io/docs/concepts/sampling/) +- [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) diff --git a/troubleshoot/ingest/opentelemetry/toc.yml b/troubleshoot/ingest/opentelemetry/toc.yml index 510779daf3..6c363e1417 100644 --- a/troubleshoot/ingest/opentelemetry/toc.yml +++ b/troubleshoot/ingest/opentelemetry/toc.yml @@ -9,6 +9,7 @@ toc: - file: edot-collector/metadata.md - file: edot-collector/enable-debug-logging.md - file: edot-collector/collector-not-starting.md + - file: edot-collector/misconfigured-sampling-collector.md - file: edot-sdks/index.md children: - file: edot-sdks/android/index.md @@ -23,5 +24,6 @@ toc: - file: edot-sdks/enable-debug-logging.md - file: edot-sdks/missing-app-telemetry.md - file: edot-sdks/proxy.md + - file: edot-sdks/misconfigured-sampling-sdk.md - file: no-data-in-kibana.md - file: contact-support.md From 7488569275cb4b75e4608a1a39c2a1bfb97ca0f9 Mon Sep 17 00:00:00 2001 From: Aleksandra Spilkowska Date: Tue, 30 Sep 2025 14:38:00 +0200 Subject: [PATCH 2/4] Apply peer comments --- .../misconfigured-sampling-collector.md | 14 ++++++---- .../edot-sdks/misconfigured-sampling-sdk.md | 27 ++++++++++++------- 2 files changed, 26 insertions(+), 15 deletions(-) diff --git a/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md b/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md index b480058146..2de52ca3d7 100644 --- a/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md +++ b/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md @@ -1,5 +1,5 @@ --- -navigation_title: Troubleshoot Collector sampling configuration +navigation_title: Collector sampling issues description: Learn how to troubleshoot missing or incomplete traces in the EDOT Collector caused by sampling configuration. applies_to: serverless: all @@ -12,18 +12,22 @@ products: # Missing or incomplete traces due to Collector sampling -If traces or spans are missing in Kibana, the issue might be related to the Collector’s sampling configuration. Tail-based sampling in the Collector can reduce trace volume if policies are too strict or misconfigured. +If traces or spans are missing in {{kib}}, the issue might be related to the Collector’s sampling configuration. Tail-based sampling in the Collector can reduce trace volume if policies are too strict or misconfigured. Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. See [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) for more information. ## Symptoms -- Only a small subset of traces reaches Elasticsearch/Kibana, even though SDKs are exporting spans. +When Collector-based tail sampling is misconfigured or too restrictive, you might observe the following: + +- Only a small subset of traces reaches {{es}}/{{kib}}, even though SDKs are exporting spans. - Error traces are missing because they’re not explicitly included in the `sampling_policy`. - Collector logs show dropped spans. ## Causes +The following conditions can lead to missing or incomplete traces when using tail-based sampling in the Collector: + - Tail sampling policies in the Collector are too narrow or restrictive. - The default rule set excludes key transaction types (for example long-running requests, non-error transactions). - Differences between head sampling (SDK) and tail sampling (Collector) can lead to fewer traces being available for evaluation. @@ -54,9 +58,9 @@ Follow these steps to resolve sampling configuration issues: - Review Collector logs for messages about dropped spans, and determine whether drops are due to sampling policy outcomes or resource limits ::: -:::{step} Differentiate head vs. tail sampling +:::{step} Differentiate head and tail sampling -- Review if SDKs apply head sampling, which reduces traces available for tail sampling +- Review if SDKs already applies head sampling, which reduces traces available for tail sampling in the Collector - Consider setting SDKs to `always_on` and managing sampling centrally in the Collector for more flexibility ::: diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md b/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md index 96754db63b..c89e0d7166 100644 --- a/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md @@ -1,5 +1,5 @@ --- -navigation_title: Troubleshoot SDK sampling configuration +navigation_title: SDK sampling issues description: Learn how to troubleshoot missing or incomplete traces in EDOT SDKs caused by head sampling configuration. applies_to: serverless: all @@ -12,18 +12,24 @@ products: # Missing or incomplete traces due to SDK sampling -If traces or spans are missing in Kibana, the issue might be related to SDK-level sampling configuration. By default, SDKs use head-based sampling, meaning the decision to record or drop a trace is made when the trace is first created. +If traces or spans are missing in {{kib}}, the issue might be related to the Collector’s sampling configuration. -Both SDK-level and Collector-based sampling can result in gaps in telemetry if misconfigured. See [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) for more details. +{applies_to}`stack: 9.2` Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped. + +Both SDK-level and Collector-based sampling can result in gaps in telemetry if misconfigured. Refer to [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) for more details. ## Symptoms -- Only a small subset of traces appears in Kibana, even under light traffic. +You might notice one or more of the following behaviors when SDK-level sampling is impacting your traces: + +- Only a small subset of traces reaches {{es}} or {{kib}}, even though SDKs are exporting spans. - Transactions look incomplete because some spans are missing. - Trace volume is unexpectedly low compared to logs or metrics. ## Causes +These factors can result in missing spans or traces when sampling is configured at the SDK level: + - Head sampling at the SDK level drops traces before they're exported. - Default sampling rates (for example `1/100` or `1/1000`) might be too low for your workload. - Environment variables like `OTEL_TRACES_SAMPLER` or `OTEL_TRACES_SAMPLER_ARG` are not set, not recognized, or formatted in a way the SDK doesn't support. @@ -38,22 +44,23 @@ Follow these steps to resolve SDK sampling configuration issues: :::{step} Check SDK environment variables -- Confirm that `OTEL_TRACES_SAMPLER` and `OTEL_TRACES_SAMPLER_ARG` are set correctly +- Confirm that `OTEL_TRACES_SAMPLER` and `OTEL_TRACES_SAMPLER_ARG` are set correctly. - For testing, you can temporarily set: + ```bash export OTEL_TRACES_SAMPLER=always_on ``` -- In production, consider using `parentbased_traceidratio` with an explicit ratio +- In production, consider using `parentbased_traceidratio` with an explicit ratio. ::: :::{step} Align configuration across services -- Use consistent sampling configuration across all instrumented services to help avoid dropped child spans or fragmented traces +- Use consistent sampling configuration across all instrumented services to help avoid dropped child spans or fragmented traces. ::: :::{step} Adjust sampling ratios for your traffic -- For low-traffic applications, avoid extremely low ratios (such as `1/1000`) +- For low-traffic applications, avoid extremely low ratios (such as `1/1000`). For example, the following configuration samples ~20% of traces: @@ -65,8 +72,8 @@ Follow these steps to resolve SDK sampling configuration issues: :::{step} Use Collector tail sampling for advanced scenarios -- Head sampling can't evaluate the full trace context before making a decision -- For more control (for example "keep all errors, sample 10% of successes"), use Collector tail sampling +- Head sampling can't evaluate the full trace context before making a decision. +- For more control (for example "keep all errors, sample 10% of successes"), use Collector tail sampling. For more information, refer to [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md). ::: From 38b5e8b7708ab7766221d60ade1a9b852b3d23fb Mon Sep 17 00:00:00 2001 From: Aleksandra Spilkowska Date: Tue, 30 Sep 2025 14:44:53 +0200 Subject: [PATCH 3/4] update "applies to" tag --- .../opentelemetry/edot-sdks/misconfigured-sampling-sdk.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md b/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md index c89e0d7166..b13f9cec6d 100644 --- a/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md @@ -14,7 +14,7 @@ products: If traces or spans are missing in {{kib}}, the issue might be related to the Collector’s sampling configuration. -{applies_to}`stack: 9.2` Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped. +{applies_to}`stack: ga 9.2` Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped. Both SDK-level and Collector-based sampling can result in gaps in telemetry if misconfigured. Refer to [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) for more details. From 28d98c772b8ca63e7da0d3b80785ccf11ba5ba63 Mon Sep 17 00:00:00 2001 From: Aleksandra Spilkowska Date: Wed, 1 Oct 2025 12:24:00 +0200 Subject: [PATCH 4/4] Fix intro on both pages --- .../edot-collector/misconfigured-sampling-collector.md | 4 +++- .../opentelemetry/edot-sdks/misconfigured-sampling-sdk.md | 4 +--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md b/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md index 2de52ca3d7..c8ddfd7614 100644 --- a/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md +++ b/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md @@ -12,7 +12,9 @@ products: # Missing or incomplete traces due to Collector sampling -If traces or spans are missing in {{kib}}, the issue might be related to the Collector’s sampling configuration. Tail-based sampling in the Collector can reduce trace volume if policies are too strict or misconfigured. +If traces or spans are missing in {{kib}}, the issue might be related to the Collector’s sampling configuration. + +{applies_to}`stack: ga 9.2` Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped. Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. See [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) for more information. diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md b/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md index b13f9cec6d..e84db33de4 100644 --- a/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md @@ -12,9 +12,7 @@ products: # Missing or incomplete traces due to SDK sampling -If traces or spans are missing in {{kib}}, the issue might be related to the Collector’s sampling configuration. - -{applies_to}`stack: ga 9.2` Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped. +If traces or spans are missing in Kibana, the issue might be related to SDK-level sampling configuration. By default, SDKs use head-based sampling, meaning the decision to record or drop a trace is made when the trace is first created. Both SDK-level and Collector-based sampling can result in gaps in telemetry if misconfigured. Refer to [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) for more details.