-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Allow larger write queues for large nodes #130061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With the rise of larger CPU count nodes our current write queue size might be too conservative. Indexing pressure will still provide protect against out of memories.
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
|
🔍 Preview links for changed docs: 🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes. |
With the rise of larger CPU count nodes our current write queue size might be too conservative. Indexing pressure will still provide protect against out of memories.
|
Curtis, how does this apply to current clusters that are heavy CPU allocated? For example, 7 nodes at 16 CPUs each. Is 10k still the best or would this new setting open up that 10k limit and be a larger value? Just trying to understand the logic here because we have been told over and over again that increasing this 10k write queue is not advisable, so wondering what has changed here. |
With the rise of larger CPU count nodes our current write queue size might be too conservative. Indexing pressure will still provide protect against out of memories.
This setting would make the queue be of size 12,000 for a 16 core CPU.
We encountered some internal benchmarks which showed 32 core machines exhausting the thread pool queue but still having manageable latency. Which led us to make a this tweak for larger machines. We would only advise changing it if a user is advanced and has a clear understanding of their load. I'm sure there are workloads out there with many, many small tasks which exhaust the queue with the node not being backed up. So if some benchmark shows that scenario, then changing the queue might be a way to address that. But the standard scenario where the queue is consistently exhausted is just under provisioning and increasing the queue will just lead to high write latencies, timeouts, retries, etc. |
|
Sounds great, thank you! This does help and I appreciate that breakdown! |
With the rise of larger CPU count nodes our current write queue size
might be too conservative. Indexing pressure will still provide protect
against out of memories.