-
Notifications
You must be signed in to change notification settings - Fork 195
Expand "Rejected requests" documentation #4634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
https://www.elastic.co/docs/troubleshoot/elasticsearch/rejected-requests 1 - Under `Check rejected tasks`, added sample JSON outputs so users can better interpret the results. Also added > The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429). Also, `erring` is unclear wording to me, so recommended rephrasing to something unambiguous or omit that word. 2 - Under `Prevent rejected requests` added Fix indexing/search request bursts and concurrency (HTTP 429) Rejections (HTTP 429) typically indicate the cluster is receiving work faster than it can execute (for example, during traffic bursts or when indexing/search requests are not optimized). Prevent them by shaping workload—use client-side backoff/throttling for bursts and tune request patterns (bulk sizing, concurrency, query cost) to match the cluster’s sustained throughput.
Vale Linting ResultsSummary: 7 suggestions found 💡 Suggestions (7)
The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
🔍 Preview links for changed docs |
| `write` thread pool rejections frequently appear in the erring API and correlating log as `EsRejectedExecutionException` with either `QueueResizingEsThreadPoolExecutor` or `queue capacity`. | ||
|
|
||
| These errors are often related to [backlogged tasks](task-queue-backlog.md). | ||
| The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429). | |
| The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (for example, sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429). |
(we try to avoid Latin abbreviations: etc., e.g., i.e.,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually at this level of detail we would recommend Monitoring (highlighting AutoOps).
But also, these two troubleshooting docs are really beginning to overlap 😂 & I'm not fully sure what to do. Cross-linking for concurrent PR where I requested your review as it sits in the bigger task backlog picture where I think this maybe would make more sense to explain: https://github.com/elastic/docs-content/pull/4657/files#diff-2ba649b230c054cba45dd0cc2039a0e546b92f16f09405a8b65c84a0e2007b3aR30
| The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429). | ||
|
|
||
| See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections. | ||
| These errors are often related to [backlogged tasks](task-queue-backlog.md). See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| These errors are often related to [backlogged tasks](task-queue-backlog.md). See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections. | |
| These errors are often related to [backlogged tasks](task-queue-backlog.md). See our [Threadpool Rejections video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a troubleshooting walkthrough. |
This just gives people a bit more certaintly about where the video link will take them, and also, just in case the link breaks, it gives them a title to search for.
| es-02 search 13 240 0 90112 | ||
| es-02 write 2 35 0 51003 | ||
| ``` | ||
| Requests are arriving faster than workers can drain them (queue > 0). If it’s transient, it may be fine; if it persists/grows, it commonly precedes rejects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Requests are arriving faster than workers can drain them (queue > 0). If it’s transient, it may be fine; if it persists/grows, it commonly precedes rejects. | |
| Requests are arriving faster than workers can drain them (queue > 0). If this state is transient, it may be fine; if the queue remains large or continues to grow, this commonly precedes rejects. |
(just trying to avoid too many "it"s)
kilfoyle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
Thanks a lot @rodrigomadalozzo! I've left a few minor phrasing comments but apart from those all looks good. 👍
|
|
||
| If {{es}} regularly rejects requests and other tasks, your cluster likely has high CPU usage or high JVM memory pressure. For tips, see [High CPU usage](high-cpu-usage.md) and [High JVM memory pressure](high-jvm-memory-pressure.md). | ||
|
|
||
| ### Fix indexing/search request bursts and concurrency (HTTP 429) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:spiderman-jinx: 😂
I was also looking to cover this on the task queue backlog side because it matters just that it queues even if it doesn't reject (but I do like your expanded wording better if you wouldn't mind some plagiarizing / cross-pollinating?): https://github.com/elastic/docs-content/pull/4657/files#diff-2ba649b230c054cba45dd0cc2039a0e546b92f16f09405a8b65c84a0e2007b3aR39
stefnestor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
https://www.elastic.co/docs/troubleshoot/elasticsearch/rejected-requests
1 - Under
Check rejected tasks, added sample JSON outputs so users can better interpret the results.Also added
Also,
erringis unclear wording to me, so recommended rephrasing to something unambiguous or omit that word.2 - Under
Prevent rejected requestsaddedFix indexing/search request bursts and concurrency (HTTP 429)
Rejections (HTTP 429) typically indicate the cluster is receiving work faster than it can execute (for example, during traffic bursts or when indexing/search requests are not optimized). Prevent them by shaping workload—use client-side backoff/throttling for bursts and tune request patterns (bulk sizing, concurrency, query cost) to match the cluster’s sustained throughput.
@stefnestor