Skip to content

Conversation

@nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Nov 4, 2025

Previously the queue latency histogram had 18 buckets, which meant any percentile over ~65s would be calculated as Long.MAX_VALUE ms (292,471,208 years).

This change adds two more buckets to the histogram allowing us to represent values up to ~ 4 minutes 20 seconds, hopefully covering the vast majority of queue latencies we're likely to see.

@nicktindall nicktindall requested a review from a team as a code owner November 4, 2025 23:11
@elasticsearchmachine elasticsearchmachine added v9.3.0 needs:triage Requires assignment of a team area label labels Nov 4, 2025
@nicktindall nicktindall added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed needs:triage Requires assignment of a team area label v9.3.0 labels Nov 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Nov 4, 2025
@mhl-b
Copy link
Contributor

mhl-b commented Nov 5, 2025

Can we manually assign buckets? Current structure has very low granularity at milliseconds and coarse at seconds. While we mostly look at seconds latency. Something like [10ms,100ms,500ms,1s,2s,4s,8s,10s,15s,20s,40s,60s,60s+]?

@nicktindall
Copy link
Contributor Author

Can we manually assign buckets? Current structure has very low granularity at milliseconds and coarse at seconds. While we mostly look at seconds latency. Something like [10ms,100ms,500ms,1s,2s,4s,8s,10s,15s,20s,40s,60s,60s+]?

Yeah we'd need to implement a different type of histogram

Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, right. I thought we can pass custom ranges. LGMT

@nicktindall
Copy link
Contributor Author

Ouch, right. I thought we can pass custom ranges. LGMT

I did consider making a arbitrary bucket histogram at the time, but the exponential one already existed. I think it'd be good going forward to have both, but not as part of this change.

@nicktindall nicktindall merged commit f7617d3 into elastic:main Nov 5, 2025
34 checks passed
@nicktindall nicktindall deleted the increase_queue_latency_buckets branch November 5, 2025 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants