Skip to content

Conversation

@rodrigomadalozzo
Copy link
Contributor

https://www.elastic.co/docs/troubleshoot/elasticsearch/rejected-requests

1 - Under Check rejected tasks, added sample JSON outputs so users can better interpret the results.

Also added

The queue and active pools are point-in-time values. rejected and completed are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in rejected vs the increase in completed). Sustained non-zero rejected deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).

Also, erring is unclear wording to me, so recommended rephrasing to something unambiguous or omit that word.

2 - Under Prevent rejected requests added

Fix indexing/search request bursts and concurrency (HTTP 429)

Rejections (HTTP 429) typically indicate the cluster is receiving work faster than it can execute (for example, during traffic bursts or when indexing/search requests are not optimized). Prevent them by shaping workload—use client-side backoff/throttling for bursts and tune request patterns (bulk sizing, concurrency, query cost) to match the cluster’s sustained throughput.

  1. Did you use a generative AI (GenAI) tool to assist in creating this contribution?
  • Yes
  • No

@stefnestor

https://www.elastic.co/docs/troubleshoot/elasticsearch/rejected-requests

1 - Under `Check rejected tasks`, added sample JSON outputs so users can better interpret the results. 

Also added
> The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).

Also, `erring` is unclear wording to me, so recommended rephrasing to something unambiguous or omit that word.

2 - Under `Prevent rejected requests` added

Fix indexing/search request bursts and concurrency (HTTP 429)

Rejections (HTTP 429) typically indicate the cluster is receiving work faster than it can execute (for example, during traffic bursts or when indexing/search requests are not optimized). Prevent them by shaping workload—use client-side backoff/throttling for bursts and tune request patterns (bulk sizing, concurrency, query cost) to match the cluster’s sustained throughput.
@github-actions
Copy link
Contributor

Vale Linting Results

Summary: 7 suggestions found

💡 Suggestions (7)
File Line Rule Message
troubleshoot/elasticsearch/rejected-requests.md 33 Elastic.EmDashes Don't put a space before or after a dash.
troubleshoot/elasticsearch/rejected-requests.md 35 Elastic.WordChoice Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'See', unless the term is in the UI.
troubleshoot/elasticsearch/rejected-requests.md 43 Elastic.WordChoice Consider using 'can, might' instead of 'may', unless the term is in the UI.
troubleshoot/elasticsearch/rejected-requests.md 43 Elastic.Semicolons Use semicolons judiciously.
troubleshoot/elasticsearch/rejected-requests.md 45 Elastic.EmDashes Don't put a space before or after a dash.
troubleshoot/elasticsearch/rejected-requests.md 50 Elastic.WordChoice Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'see', unless the term is in the UI.
troubleshoot/elasticsearch/rejected-requests.md 96 Elastic.WordChoice Consider using 'run, start' instead of 'execute', unless the term is in the UI.

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

@github-actions
Copy link
Contributor

🔍 Preview links for changed docs

`write` thread pool rejections frequently appear in the erring API and correlating log as `EsRejectedExecutionException` with either `QueueResizingEsThreadPoolExecutor` or `queue capacity`.

These errors are often related to [backlogged tasks](task-queue-backlog.md).
The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).
The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (for example, sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).

(we try to avoid Latin abbreviations: etc., e.g., i.e.,)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually at this level of detail we would recommend Monitoring (highlighting AutoOps).

But also, these two troubleshooting docs are really beginning to overlap 😂 & I'm not fully sure what to do. Cross-linking for concurrent PR where I requested your review as it sits in the bigger task backlog picture where I think this maybe would make more sense to explain: https://github.com/elastic/docs-content/pull/4657/files#diff-2ba649b230c054cba45dd0cc2039a0e546b92f16f09405a8b65c84a0e2007b3aR30

The `queue` and `active` pools are point-in-time values. `rejected` and `completed` are cumulative counters on each node that reset when the node process restarts. Therefore, avoid interpreting a single snapshot “rejected/completed ratio” in isolation — compare deltas over a time window (e.g., sample every 1–5 minutes and calculate the increase in `rejected` vs the increase in `completed`). Sustained non-zero `rejected` deltas indicate ongoing request drops (typically surfaced to clients as HTTP 429).

See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections.
These errors are often related to [backlogged tasks](task-queue-backlog.md). See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These errors are often related to [backlogged tasks](task-queue-backlog.md). See [this video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a walkthrough of troubleshooting threadpool rejections.
These errors are often related to [backlogged tasks](task-queue-backlog.md). See our [Threadpool Rejections video](https://www.youtube.com/watch?v=auZJRXoAVpI) for a troubleshooting walkthrough.

This just gives people a bit more certaintly about where the video link will take them, and also, just in case the link breaks, it gives them a title to search for.

es-02 search 13 240 0 90112
es-02 write 2 35 0 51003
```
Requests are arriving faster than workers can drain them (queue > 0). If it’s transient, it may be fine; if it persists/grows, it commonly precedes rejects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Requests are arriving faster than workers can drain them (queue > 0). If it’s transient, it may be fine; if it persists/grows, it commonly precedes rejects.
Requests are arriving faster than workers can drain them (queue > 0). If this state is transient, it may be fine; if the queue remains large or continues to grow, this commonly precedes rejects.

(just trying to avoid too many "it"s)

Copy link
Contributor

@kilfoyle kilfoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀
Thanks a lot @rodrigomadalozzo! I've left a few minor phrasing comments but apart from those all looks good. 👍


If {{es}} regularly rejects requests and other tasks, your cluster likely has high CPU usage or high JVM memory pressure. For tips, see [High CPU usage](high-cpu-usage.md) and [High JVM memory pressure](high-jvm-memory-pressure.md).

### Fix indexing/search request bursts and concurrency (HTTP 429)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:spiderman-jinx: 😂

I was also looking to cover this on the task queue backlog side because it matters just that it queues even if it doesn't reject (but I do like your expanded wording better if you wouldn't mind some plagiarizing / cross-pollinating?): https://github.com/elastic/docs-content/pull/4657/files#diff-2ba649b230c054cba45dd0cc2039a0e546b92f16f09405a8b65c84a0e2007b3aR39

Copy link
Contributor

@stefnestor stefnestor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants