Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
navigation_title: Trace export errors
description: Learn how to resolve trace export failures caused by `sending_queue` overflow and Elasticsearch exporter timeouts in the EDOT Collector.
applies_to:
serverless: all
product:
edot_collector: ga
products:
- id: observability
- id: edot-collector
---

# Trace export errors from the EDOT Collector

During high traffic or load testing scenarios, the EDOT Collector might fail to export trace data to {{es}}. This typically happens when the internal queue for outgoing data fills up faster than it can be drained, resulting in timeouts and dropped data.

## Symptoms

You might see one or more of the following messages in the EDOT Collector logs:

* `bulk indexer flush error: failed to execute the request: context deadline exceeded`
* `Exporting failed. Rejecting data. sending queue is full`
* Repeated `otelcol.signal: "traces"` errors from the exporter

These errors indicate the Collector is overwhelmed and unable to export traces fast enough, leading to queue overflows and data loss.

## Causes

This issue typically occurs when the `sending_queue` configuration is misaligned with the incoming trace volume. Common contributing factors include:

* `sending_queue.block_on_overflow` is not enabled (it defaults to `false`), so data is dropped when the queue is full.
* `num_consumers` is too low to keep up with the incoming trace volume and drain the queue efficiently.
* The queue size (`queue_size`) is too small for the traffic load.
* Export batching is disabled, increasing processing overhead.
* EDOT Collector resources (CPU, memory) are not sufficient for the traffic volume.

:::{note}
Increasing the `timeout` value (for example from 30s to 90s) doesn't help if the queue itself is the bottleneck.
:::

## Resolution

Update the EDOT Collector configuration as follows:

:::::{stepper}

::::{step} Enable `block_on_overflow`

Prevent silent trace drops by enabling blocking behavior when the queue is full:

```yaml
sending_queue:
enabled: true
queue_size: 1000
num_consumers: 10
block_on_overflow: true
```
::::

::::{step} Increase `num_consumers`

Raise the number of queue consumers to increase parallel processing of queued items. Start with 20–30 and adjust based on throughput and resource usage.

::::

::::{step} Tune `queue_size`

Increase the queue size to handle spikes in trace volume. Ensure sufficient memory is allocated to support the larger buffer.

::::

::::{step} Enable batching

If not already enabled, configure batching to reduce the per-span export cost and improve throughput.

::::

::::{step} Check resource allocation

Verify the EDOT Collector pod has enough CPU and memory. Increase limits or scale out the deployment if necessary.

::::

::::{step} Evaluate {{es}} performance

Check for indexing delays or errors on the {{es}} side. Bottlenecks here can also contribute to timeouts and queue buildup.

::::

:::::


## Resources

* [Upstream documentation - OpenTelemetry Collector configuration](https://opentelemetry.io/docs/collector/configuration)
1 change: 1 addition & 0 deletions troubleshoot/ingest/opentelemetry/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ toc:
- file: edot-collector/enable-debug-logging.md
- file: edot-collector/collector-not-starting.md
- file: edot-collector/misconfigured-sampling-collector.md
- file: edot-collector/trace-export-errors.md
- file: edot-sdks/index.md
children:
- file: edot-sdks/android/index.md
Expand Down
Loading