Add troubleshooting guide for 429 errors when using the Elastic Cloud Managed OTLP Endpoint (#3669)

alexandra5000 · web-flow · commit 888f7c6fc7d3 · 2025-10-31T11:52:24.000Z
This PR adds a new troubleshooting topic under that explains how to diagnose and resolve HTTP 429 Too Many Requests errors when sending data to the mOTLP endpoint in both Elastic Cloud Serverless and Elastic Cloud Hosted environments. Also includes an update to the mOTLP quickstart page that replaces the inline “Error: too many requests” section with a link to this troubleshooting guide. Closes [#6054](elastic/ingest-dev#6054)
diff --git a/solutions/observability/get-started/quickstart-elastic-cloud-otel-endpoint.md b/solutions/observability/get-started/quickstart-elastic-cloud-otel-endpoint.md
@@ -162,7 +162,10 @@ You must format your API key as `"Authorization": "ApiKey <api-key-value-here>"`
 
 ### Error: too many requests
 
-The Managed OTLP endpoint has per-project rate limits in place. If you reach this limit, reach out to our [support team](https://support.elastic.co). Refer to [Rate limiting](opentelemetry://reference/motlp.md#rate-limiting) for more information.
+If you see HTTP `429 Too Many Requests` errors when sending data through the Elastic Cloud Managed OTLP Endpoint (mOTLP) endpoint, your project might be hitting ingest rate limits.
+
+Refer to the dedicated [429 errors when using the Elastic Cloud Managed OTLP Endpoint](/troubleshoot/ingest/opentelemetry/429-errors-motlp.md) troubleshooting guide for details on causes, rate limits, and solutions.
+
 
 ## Provide feedback
 
diff --git a/troubleshoot/ingest/opentelemetry/429-errors-motlp.md b/troubleshoot/ingest/opentelemetry/429-errors-motlp.md
@@ -0,0 +1,123 @@
+---
+navigation_title: 429 errors when using the mOTLP endpoint
+description: Resolve HTTP 429 `Too Many Requests` errors when sending data through the Elastic Cloud Managed OTLP (mOTLP) endpoint in Elastic Cloud Serverless or Elastic Cloud Hosted (ECH).
+applies_to:
+  stack:
+  serverless:
+    observability:
+  product:
+    edot_collector:
+products:
+  - id: cloud-serverless
+  - id: cloud-hosted
+  - id: observability
+  - id: edot-collector
+---
+
+# 429 errors when using the Elastic Cloud Managed OTLP Endpoint
+
+When sending telemetry data through the {{motlp}} (mOTLP), you might encounter HTTP `429 Too Many Requests` errors. These indicate that your ingest rate has temporarily exceeded the rate or burst limits configured for your {{ecloud}} project.
+
+This issue can occur in both {{serverless-full}} and {{ech}} (ECH) environments.
+
+## Symptoms
+
+You might see log messages similar to the following in your EDOT Collector output or SDK logs:
+
+```json
+{
+  "code": 8,
+  "message": "error exporting items, request to <ingest endpoint> responded with HTTP Status Code 429"
+}
+```
+
+In some cases, you may also see warnings or backpressure metrics increase in your Collector’s internal telemetry (for example, queue length or failed send count).
+
+## Causes
+
+A 429 status means that the rate of requests sent to the Managed OTLP endpoint has exceeded allowed thresholds. This can happen for several reasons:
+
+* Your telemetry pipeline is sending data faster than the allowed ingest rate.
+* Bursts of telemetry data exceed the short-term burst limit, even if your sustained rate is within limits.
+
+    The specific limits depend on your environment:
+
+    | Deployment type | Rate limit | Burst limit |
+    |-----------------|------------|-------------|
+    | Serverless      | 15 MB/s    | 30 MB/s     |
+    | ECH             | Depends on deployment size and available {{es}} capacity | Depends on deployment size and available {{es}} capacity |
+
+    Exact limits depend on your subscription tier.
+    Refer to the [Rate limiting section](opentelemetry://reference/motlp.md#rate-limiting) in the mOTLP reference documentation for details.
+
+* In {{ech}}, the {{es}} capacity for your deployment might be underscaled for the current ingest rate.
+* In {{serverless-full}}, rate limiting should not result from {{es}} capacity, since the platform automatically scales ingest capacity. If you suspect a scaling issue, [contact Elastic Support](contact-support.md).
+* Multiple Collectors or SDKs are sending data concurrently without load balancing or backoff mechanisms.
+
+## Resolution
+
+To resolve 429 errors, identify whether the bottleneck is caused by ingest limits or {{es}} capacity.
+
+### Scale your deployment or request higher limits
+
+If you’ve confirmed that your ingest configuration is stable but still encounter 429 errors:
+
+* {{serverless-full}}: [Contact Elastic Support](contact-support.md) to request an increase in ingest limits.
+* {{ech}} (ECH): Increase your {{es}} capacity by scaling or resizing your deployment:
+  * [Scaling considerations](../../../deploy-manage/production-guidance/scaling-considerations.md)
+  * [Resize deployment](../../../deploy-manage/deploy/cloud-enterprise/resize-deployment.md)
+  * [Autoscaling in ECE and ECH](../../../deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md)
+
+After scaling, monitor your ingest metrics to verify that the rate of accepted requests increases and 429 responses stop appearing.
+
+### Reduce ingest rate or enable backpressure
+
+Lower the telemetry export rate by enabling batching and retry mechanisms in your EDOT Collector or SDK configuration. For example:
+
+```yaml
+processors:
+  batch:
+    send_batch_size: 1000
+    timeout: 5s
+
+exporters:
+  otlp:
+    retry_on_failure:
+      enabled: true
+      initial_interval: 1s
+      max_interval: 30s
+      max_elapsed_time: 300s
+```
+
+These settings help smooth out spikes and automatically retry failed exports after rate-limit responses.
+
+### Enable retry logic and queueing
+
+To minimize data loss during temporary throttling, configure your exporter to use a sending queue and retry logic. For example:
+
+```yaml
+exporters:
+  otlp:
+    sending_queue:
+      enabled: true
+      num_consumers: 10
+      queue_size: 1000
+    retry_on_failure:
+      enabled: true
+```
+
+This ensures the Collector buffers data locally while waiting for the ingest endpoint to recover from throttling.
+
+## Best practices
+
+To prevent 429 errors and maintain reliable telemetry data flow, implement these best practices:
+
+* Monitor internal Collector metrics (such as `otelcol_exporter_send_failed` and `otelcol_exporter_queue_capacity`) to detect backpressure early.
+* Distribute telemetry load evenly across multiple Collectors instead of sending all data through a single instance.
+* When possible, enable batching and compression to reduce payload size.
+* Keep retry and backoff intervals conservative to avoid overwhelming the endpoint after a temporary throttle.
+
+## Resources
+
+* [{{motlp}} reference](opentelemetry://reference/motlp.md)
+* [Quickstart: Send OTLP data to Elastic Serverless or {{ech}}](../../../solutions/observability/get-started/quickstart-elastic-cloud-otel-endpoint.md)
diff --git a/troubleshoot/toc.yml b/troubleshoot/toc.yml
@@ -171,6 +171,7 @@ toc:
               - file: ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md
           - file: ingest/opentelemetry/no-data-in-kibana.md
           - file: ingest/opentelemetry/connectivity.md
+          - file: ingest/opentelemetry/429-errors-motlp.md
           - file: ingest/opentelemetry/contact-support.md
       - file: ingest/logstash.md
         children: