Skip to content

Commit 4931c18

Browse files
authored
Merge branch 'main' into 2025/10/16/s3-replication
2 parents 5dc96eb + d681794 commit 4931c18

File tree

2 files changed

+112
-0
lines changed

2 files changed

+112
-0
lines changed
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
navigation_title: Export errors from the EDOT Collector
3+
description: Learn how to resolve export failures caused by `sending_queue` overflow and Elasticsearch exporter timeouts in the EDOT Collector.
4+
applies_to:
5+
serverless: all
6+
product:
7+
edot_collector: ga
8+
products:
9+
- id: observability
10+
- id: edot-collector
11+
---
12+
13+
# Export failures when sending telemetry data from the EDOT Collector
14+
15+
During high traffic or load testing scenarios, the EDOT Collector might fail to export telemetry data (traces, metrics, or logs) to {{es}}. This typically happens when the internal queue for outgoing data fills up faster than it can be drained, resulting in timeouts and dropped data.
16+
17+
## Symptoms
18+
19+
You might see one or more of the following messages in the EDOT Collector logs:
20+
21+
* `bulk indexer flush error: failed to execute the request: context deadline exceeded`
22+
* `Exporting failed. Rejecting data. sending queue is full`
23+
24+
These errors indicate the Collector is overwhelmed and unable to export data fast enough, leading to queue overflows and data loss.
25+
26+
## Causes
27+
28+
This issue typically occurs when the `sending_queue` configuration or the Elasticsearch cluster scaling is misaligned with the incoming telemetry volume.
29+
30+
:::{important}
31+
{applies_to}`stack: ga 9.0, deprecated 9.3`
32+
The sending queue is turned off by default. Verify that `enabled: true` is explicitly set — otherwise any queue configuration will be ignored.
33+
:::
34+
35+
Common contributing factors include:
36+
37+
* Underscaled Elasticsearch cluster is the most frequent cause of persistent export failures. If Elasticsearch cannot index data fast enough, the Collector’s queue fills up.
38+
* {applies_to}`stack: ga 9.0, deprecated 9.3` `sending_queue.block_on_overflow` is turned off (defaults to `false`), which can lead to data drops.
39+
* Sending queue is enabled but `num_consumers` is too low to keep up with the incoming data volume.
40+
* Sending queue size (`queue_size`) is too small for the traffic load.
41+
* Both internal and sending queue batching are disabled, increasing processing overhead.
42+
* EDOT Collector resources (CPU, memory) are insufficient for the traffic volume.
43+
44+
:::{note}
45+
Increasing the `timeout` value (for example from 30s to 90s) doesn't help if the queue itself or Elasticsearch throughput is the bottleneck.
46+
:::
47+
48+
## Resolution
49+
50+
The resolution approach depends on your {{stack}} version and Collector configuration.
51+
52+
### When the sending queue is not enabled by default
53+
{applies_to}`stack: ga 9.0, deprecated 9.3`
54+
55+
Enable the sending queue and block on overflow to prevent data drops:
56+
57+
```yaml
58+
sending_queue:
59+
enabled: true
60+
queue_size: 1000
61+
num_consumers: 10
62+
block_on_overflow: true
63+
```
64+
65+
### When the sending queue is enabled by default
66+
{applies_to}`stack: ga 9.3`
67+
68+
The Elasticsearch exporter provides default `sending_queue` parameters (including `block_on_overflow: true`) but these can and often should be tuned for specific workloads.
69+
70+
The following steps can help identify and resolve export bottlenecks:
71+
72+
:::::{stepper}
73+
74+
::::{step} Check the Collector's internal metrics
75+
76+
If internal telemetry is enabled, review these metrics:
77+
78+
* `otelcol.elasticsearch.bulk_requests.latency` — high tail latency suggests Elasticsearch is the bottleneck. Check Elasticsearch cluster metrics and scale if necessary.
79+
80+
* `otelcol.elasticsearch.bulk_requests.count` and `otelcol.elasticsearch.flushed.bytes` — they help assess whether the Collector is sending too many or too large requests. Tune `sending_queue.num_consumers` or batching configuration to balance throughput.
81+
82+
* `otelcol_exporter_queue_size` and `otelcol_exporter_queue_capacity` — if the queue runs near capacity, but Elasticsearch is healthy, increase the queue size or number of consumers.
83+
84+
* `otelcol_enqueue_failed_spans`, `otelcol_enqueue_failed_metric_points`, `otelcol_enqueue_failed_log_records` — persistent enqueue failures indicate undersized queues or slow consumers.
85+
86+
For a complete list of available metrics, refer to the upstream OpenTelemetry metadata files for the [Elasticsearch exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/elasticsearchexporter/metadata.yaml) and [exporter helper](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/metadata.yaml).
87+
::::
88+
89+
::::{step} Scale the Collector's resources
90+
91+
* Ensure sufficient CPU and memory for the EDOT Collector.
92+
* Scale vertically (more resources) or horizontally (more replicas) as needed.
93+
::::
94+
95+
::::{step} Optimize Elasticsearch performance
96+
97+
Address indexing delays, rejected bulk requests, or shard imbalances that limit ingestion throughput.
98+
::::
99+
100+
:::::
101+
102+
:::{tip}
103+
{applies_to}`stack: ga 9.3`
104+
Focus tuning efforts on {{es}} performance, Collector resource allocation, and queue sizing informed by the internal telemetry metrics above.
105+
:::
106+
107+
108+
## Resources
109+
110+
* [Upstream documentation - OpenTelemetry Collector configuration](https://opentelemetry.io/docs/collector/configuration)
111+
* [Elasticsearch exporter configuration reference](elastic-agent://reference/edot-collector/components/elasticsearchexporter.md)

troubleshoot/ingest/opentelemetry/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ toc:
1010
- file: edot-collector/enable-debug-logging.md
1111
- file: edot-collector/collector-not-starting.md
1212
- file: edot-collector/misconfigured-sampling-collector.md
13+
- file: edot-collector/trace-export-errors.md
1314
- file: edot-sdks/index.md
1415
children:
1516
- file: edot-sdks/android/index.md

0 commit comments

Comments
 (0)