Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions troubleshoot/elasticsearch/rejected-requests.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,24 @@

`write` thread pool rejections frequently appear in the erring API and correlating log as `EsRejectedExecutionException` with either `QueueResizingEsThreadPoolExecutor` or `queue capacity`.

These errors are often related to [backlogged tasks](task-queue-backlog.md).
The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).

Check notice on line 33 in troubleshoot/elasticsearch/rejected-requests.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.EmDashes: Don't put a space before or after a dash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).
The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (for example, sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).

(we try to avoid Latin abbreviations: etc., e.g., i.e.,)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually at this level of detail we would recommend Monitoring (highlighting AutoOps).

But also, these two troubleshooting docs are really beginning to overlap 😂 & I'm not fully sure what to do. Cross-linking for concurrent PR where I requested your review as it sits in the bigger task backlog picture where I think this maybe would make more sense to explain: https://github.com/elastic/docs-content/pull/4657/files#diff-2ba649b230c054cba45dd0cc2039a0e546b92f16f09405a8b65c84a0e2007b3aR30


See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections.
These errors are often related to [backlogged tasks](task-queue-backlog.md). See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections.

Check notice on line 35 in troubleshoot/elasticsearch/rejected-requests.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'See', unless the term is in the UI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These errors are often related to [backlogged tasks](task-queue-backlog.md). See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections.
These errors are often related to [backlogged tasks](task-queue-backlog.md). See our [Threadpool Rejections video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a troubleshooting walkthrough.

This just gives people a bit more certaintly about where the video link will take them, and also, just in case the link breaks, it gives them a title to search for.


Example 1 - Queue building (backlog), but still no rejections
```
node_name name active queue rejected completed
es-02 search 13 240 0 90112
es-02 write 2 35 0 51003
```
Requests are arriving faster than workers can drain them (queue > 0). If it’s transient, it may be fine; if it persists/grows, it commonly precedes rejects.

Check notice on line 43 in troubleshoot/elasticsearch/rejected-requests.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Semicolons: Use semicolons judiciously.

Check notice on line 43 in troubleshoot/elasticsearch/rejected-requests.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'can, might' instead of 'may', unless the term is in the UI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Requests are arriving faster than workers can drain them (queue > 0). If it’s transient, it may be fine; if it persists/grows, it commonly precedes rejects.
Requests are arriving faster than workers can drain them (queue > 0). If this state is transient, it may be fine; if the queue remains large or continues to grow, this commonly precedes rejects.

(just trying to avoid too many "it"s)


Example 2 — Rejections happening (queue at/near limit)

Check notice on line 45 in troubleshoot/elasticsearch/rejected-requests.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.EmDashes: Don't put a space before or after a dash.
```
node_name name active queue rejected completed
es-02 search 13 1000 842 90510
```
The queue is saturated, and new work is being rejected (`rejected` increasing). Clients typically see HTTP 429 for those rejected requests.

Check notice on line 50 in troubleshoot/elasticsearch/rejected-requests.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'see', unless the term is in the UI.


## Check circuit breakers [check-circuit-breakers]
Expand Down Expand Up @@ -76,6 +91,10 @@

If {{es}} regularly rejects requests and other tasks, your cluster likely has high CPU usage or high JVM memory pressure. For tips, see [High CPU usage](high-cpu-usage.md) and [High JVM memory pressure](high-jvm-memory-pressure.md).

### Fix indexing/search request bursts and concurrency (HTTP 429)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:spiderman-jinx: 😂

I was also looking to cover this on the task queue backlog side because it matters just that it queues even if it doesn't reject (but I do like your expanded wording better if you wouldn't mind some plagiarizing / cross-pollinating?): https://github.com/elastic/docs-content/pull/4657/files#diff-2ba649b230c054cba45dd0cc2039a0e546b92f16f09405a8b65c84a0e2007b3aR39


Rejections (HTTP 429) typically indicate the cluster is receiving work faster than it can execute (for example, during traffic bursts or when indexing/search requests are not optimized). Prevent them by shaping workload—use client-side backoff/throttling for bursts and tune request patterns (bulk sizing, concurrency, query cost) to match the cluster’s sustained throughput.

Check notice on line 96 in troubleshoot/elasticsearch/rejected-requests.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'run, start' instead of 'execute', unless the term is in the UI.

### Fix for `semantic_text` ingestion issues [fix-semantic-text-ingestion-issues]
```{applies_to}
stack: ga 9.1
Expand Down
Loading