-
Notifications
You must be signed in to change notification settings - Fork 197
Expand "Rejected requests" documentation #4634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -30,9 +30,24 @@ | |||||
|
|
||||||
| `write` thread pool rejections frequently appear in the erring API and correlating log as `EsRejectedExecutionException` with either `QueueResizingEsThreadPoolExecutor` or `queue capacity`. | ||||||
|
|
||||||
| These errors are often related to [backlogged tasks](task-queue-backlog.md). | ||||||
| The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429). | ||||||
|
|
||||||
| See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections. | ||||||
| These errors are often related to [backlogged tasks](task-queue-backlog.md). See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This just gives people a bit more certaintly about where the video link will take them, and also, just in case the link breaks, it gives them a title to search for. |
||||||
|
|
||||||
| Example 1 - Queue building (backlog), but still no rejections | ||||||
| ``` | ||||||
| node_name name active queue rejected completed | ||||||
| es-02 search 13 240 0 90112 | ||||||
| es-02 write 2 35 0 51003 | ||||||
| ``` | ||||||
| Requests are arriving faster than workers can drain them (queue > 0). If it’s transient, it may be fine; if it persists/grows, it commonly precedes rejects. | ||||||
|
Check notice on line 43 in troubleshoot/elasticsearch/rejected-requests.md
|
||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(just trying to avoid too many "it"s) |
||||||
|
|
||||||
| Example 2 — Rejections happening (queue at/near limit) | ||||||
| ``` | ||||||
| node_name name active queue rejected completed | ||||||
| es-02 search 13 1000 842 90510 | ||||||
| ``` | ||||||
| The queue is saturated, and new work is being rejected (`rejected` increasing). Clients typically see HTTP 429 for those rejected requests. | ||||||
|
|
||||||
|
|
||||||
| ## Check circuit breakers [check-circuit-breakers] | ||||||
|
|
@@ -76,6 +91,10 @@ | |||||
|
|
||||||
| If {{es}} regularly rejects requests and other tasks, your cluster likely has high CPU usage or high JVM memory pressure. For tips, see [High CPU usage](high-cpu-usage.md) and [High JVM memory pressure](high-jvm-memory-pressure.md). | ||||||
|
|
||||||
| ### Fix indexing/search request bursts and concurrency (HTTP 429) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I was also looking to cover this on the task queue backlog side because it matters just that it queues even if it doesn't reject (but I do like your expanded wording better if you wouldn't mind some plagiarizing / cross-pollinating?): https://github.com/elastic/docs-content/pull/4657/files#diff-2ba649b230c054cba45dd0cc2039a0e546b92f16f09405a8b65c84a0e2007b3aR39 |
||||||
|
|
||||||
| Rejections (HTTP 429) typically indicate the cluster is receiving work faster than it can execute (for example, during traffic bursts or when indexing/search requests are not optimized). Prevent them by shaping workload—use client-side backoff/throttling for bursts and tune request patterns (bulk sizing, concurrency, query cost) to match the cluster’s sustained throughput. | ||||||
|
|
||||||
| ### Fix for `semantic_text` ingestion issues [fix-semantic-text-ingestion-issues] | ||||||
| ```{applies_to} | ||||||
| stack: ga 9.1 | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we try to avoid Latin abbreviations: etc., e.g., i.e.,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually at this level of detail we would recommend Monitoring (highlighting AutoOps).
But also, these two troubleshooting docs are really beginning to overlap 😂 & I'm not fully sure what to do. Cross-linking for concurrent PR where I requested your review as it sits in the bigger task backlog picture where I think this maybe would make more sense to explain: https://github.com/elastic/docs-content/pull/4657/files#diff-2ba649b230c054cba45dd0cc2039a0e546b92f16f09405a8b65c84a0e2007b3aR30