Skip to content

Conversation

@nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Jul 30, 2025

Amend the thread pool node stats to include a float value for utilization.

Currently I've configured it to return the last value calculated for APM, because there is work underway to provide a singular utilization value for thread pools (see #131898). If that work merges before this I'll use that instead.

Relates ES-12316


public float getUtilization() {
return (float) apmUtilizationTracker.getLastValue();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a stop-gap until we have the singular utilisation value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:'-)

sumStat(firstStats.largest, secondStats.largest),
sumStat(firstStats.completed, secondStats.completed)
sumStat(firstStats.completed, secondStats.completed),
NaN // Don't sum utilization, it makes no sense
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used by the get cluster info REST api to merge the stats for each thread pool received from each node. I don't think it makes sense to merge utilization here, and in any case we don't render it in toXContent. Perhaps if/when we start rendering it we can implement a sensible merge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the indication that we don't really want to do this here. Although this might seem simpler than #131480 in terms of lines-of-code added, IMO that's because this is a bit of a short-cut. Fields in the transport messages which aren't represented in the REST API are tech debt.

I'd rather we moved towards having the allocator use dedicated messages for all these things rather than relying on stats APIs - the stats APIs we use today return a bunch of stuff about which we don't care.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a few like this now for features that aren't settled. Not because it was too hard, but rather to allow the freedom to change them before we commit to something that would be considered a breaking change.
Point taken about moving to dedicated messages though, this approach was an attempt to remain consistent with existing code, but it sounds like we don't want to do that.

@nicktindall nicktindall added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Jul 30, 2025
@nicktindall nicktindall marked this pull request as ready for review July 30, 2025 07:43
@nicktindall nicktindall requested a review from a team as a code owner July 30, 2025 07:43
@nicktindall nicktindall changed the title Add thread pool utilization to the THREAD_POOL node stats Add utilization to the THREAD_POOL node stats Jul 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jul 30, 2025
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sumStat(firstStats.largest, secondStats.largest),
sumStat(firstStats.completed, secondStats.completed)
sumStat(firstStats.completed, secondStats.completed),
NaN // Don't sum utilization, it makes no sense
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the indication that we don't really want to do this here. Although this might seem simpler than #131480 in terms of lines-of-code added, IMO that's because this is a bit of a short-cut. Fields in the transport messages which aren't represented in the REST API are tech debt.

I'd rather we moved towards having the allocator use dedicated messages for all these things rather than relying on stats APIs - the stats APIs we use today return a bunch of stuff about which we don't care.

@nicktindall nicktindall closed this Aug 1, 2025
@nicktindall nicktindall deleted the utilization_in_node_stats branch September 3, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants