Skip to content

Commit d34aa9d

Browse files
committed
Add troubleshooting guide for 429
1 parent 7fa8106 commit d34aa9d

File tree

2 files changed

+122
-0
lines changed

2 files changed

+122
-0
lines changed
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
navigation_title: 429 errors when using the mOTLP endpoint
3+
description: Resolve HTTP 429 `Too Many Requests` errors when sending data through the Elastic Cloud Managed OTLP (mOTLP) endpoint in Elastic Cloud Serverless or Elastic Cloud Hosted (ECH).
4+
applies_to:
5+
stack:
6+
serverless:
7+
observability:
8+
product:
9+
edot_collector:
10+
products:
11+
- id: cloud-serverless
12+
- id: cloud-hosted
13+
- id: observability
14+
- id: edot-collector
15+
---
16+
17+
# 429 errors when using the Elastic Cloud Managed OTLP Endpoint
18+
19+
When sending telemetry data through the {{motlp}} (mOTLP), you might encounter HTTP `429 Too Many Requests` errors. These indicate that your ingest rate has temporarily exceeded the rate or burst limits configured for your Elastic Cloud project.
20+
21+
This issue can occur in both Elastic Cloud Serverless and {{ech}} (ECH) environments.
22+
23+
## Symptoms
24+
25+
You might see log messages similar to the following in your EDOT Collector output or SDK logs:
26+
27+
```json
28+
{
29+
"code": 8,
30+
"message": "error exporting items, request to <ingest endpoint> responded with HTTP Status Code 429"
31+
}
32+
```
33+
34+
In some cases, you may also see warnings or backpressure metrics increase in your Collector’s internal telemetry (for example, queue length or failed send count).
35+
36+
## Causes
37+
38+
A 429 status means that the rate of requests sent to the Managed OTLP endpoint has exceeded allowed thresholds. This can happen for several reasons:
39+
40+
* Your telemetry pipeline is sending data faster than the allowed ingest rate.
41+
* Bursts of telemetry data exceed the short-term burst limit, even if your sustained rate is within limits.
42+
43+
The specific limits depend on your environment:
44+
45+
| Deployment type | Rate limit | Burst limit |
46+
|-----------------|------------|-------------|
47+
| Serverless | 15 MB/s | 30 MB/s |
48+
| ECH | Depends on deployment size and available {{es}} capacity | Depends on deployment size and available {{es}} capacity |
49+
50+
Refer to the [Rate limiting section](opentelemetry://reference/motlp.md#rate-limiting) in the mOTLP reference documentation for details.
51+
52+
* The {{es}} capacity for your Cloud deployment cannot handle the incoming data rate.
53+
* Multiple Collectors or SDKs are sending data concurrently without load balancing or backoff mechanisms.
54+
55+
## Resolution
56+
57+
To resolve 429 errors, identify whether the bottleneck is caused by ingest limits or {{es}} capacity.
58+
59+
### Reduce ingest rate or enable backpressure
60+
61+
Lower the telemetry export rate by enabling batching and retry mechanisms in your EDOT Collector or SDK configuration. For example:
62+
63+
```yaml
64+
processors:
65+
batch:
66+
send_batch_size: 1000
67+
timeout: 5s
68+
69+
exporters:
70+
otlp:
71+
retry_on_failure:
72+
enabled: true
73+
initial_interval: 1s
74+
max_interval: 30s
75+
max_elapsed_time: 300s
76+
```
77+
78+
These settings help smooth out spikes and automatically retry failed exports after rate-limit responses.
79+
80+
### Scale your deployment or request higher limits
81+
82+
If you’ve confirmed that your ingest configuration is stable but still encounter 429 errors:
83+
84+
* Elastic Cloud Serverless: [Contact Elastic Support](contact-support.md) to request an increase in ingest limits.
85+
* {{ech}} (ECH): Increase your {{es}} capacity by scaling or resizing your deployment:
86+
* [Scaling considerations](../../../deploy-manage/production-guidance/scaling-considerations.md)
87+
* [Resize deployment](../../../deploy-manage/deploy/cloud-enterprise/resize-deployment.md)
88+
* [Autoscaling in ECE and ECH](../../../deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md)
89+
90+
After scaling, monitor your ingest metrics to verify that the rate of accepted requests increases and 429 responses stop appearing.
91+
92+
### Enable retry logic and queueing
93+
94+
To minimize data loss during temporary throttling, configure your exporter to use a sending queue and retry logic. For example:
95+
96+
```yaml
97+
exporters:
98+
otlp:
99+
sending_queue:
100+
enabled: true
101+
num_consumers: 10
102+
queue_size: 1000
103+
retry_on_failure:
104+
enabled: true
105+
```
106+
107+
This ensures the Collector buffers data locally while waiting for the ingest endpoint to recover from throttling.
108+
109+
## Best practices
110+
111+
To prevent 429 errors and maintain reliable telemetry data flow, implement these best practices:
112+
113+
* Monitor internal Collector metrics (such as `otelcol_exporter_send_failed` and `otelcol_exporter_queue_capacity`) to detect backpressure early.
114+
* Distribute telemetry load evenly across multiple Collectors instead of sending all data through a single instance.
115+
* When possible, enable batching and compression to reduce payload size.
116+
* Keep retry and backoff intervals conservative to avoid overwhelming the endpoint after a temporary throttle.
117+
118+
## Resources
119+
120+
* [{{motlp}} reference](opentelemetry://reference/motlp.md)
121+
* [Quickstart: Send OTLP data to Elastic Serverless or {{ech}}](../../../solutions/observability/get-started/quickstart-elastic-cloud-otel-endpoint.md)

troubleshoot/ingest/opentelemetry/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,5 @@ toc:
2828
- file: edot-sdks/misconfigured-sampling-sdk.md
2929
- file: no-data-in-kibana.md
3030
- file: connectivity.md
31+
- file: 429-errors-motlp.md
3132
- file: contact-support.md

0 commit comments

Comments
 (0)