- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
Add utilization to the THREAD_POOL node stats #132156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|  | ||
| public float getUtilization() { | ||
| return (float) apmUtilizationTracker.getLastValue(); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a stop-gap until we have the singular utilisation value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:'-)
| sumStat(firstStats.largest, secondStats.largest), | ||
| sumStat(firstStats.completed, secondStats.completed) | ||
| sumStat(firstStats.completed, secondStats.completed), | ||
| NaN // Don't sum utilization, it makes no sense | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used by the get cluster info REST api to merge the stats for each thread pool received from each node. I don't think it makes sense to merge utilization here, and in any case we don't render it in toXContent. Perhaps if/when we start rendering it we can implement a sensible merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the indication that we don't really want to do this here. Although this might seem simpler than #131480 in terms of lines-of-code added, IMO that's because this is a bit of a short-cut. Fields in the transport messages which aren't represented in the REST API are tech debt.
I'd rather we moved towards having the allocator use dedicated messages for all these things rather than relying on stats APIs - the stats APIs we use today return a bunch of stuff about which we don't care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a few like this now for features that aren't settled. Not because it was too hard, but rather to allow the freedom to change them before we commit to something that would be considered a breaking change.
Point taken about moving to dedicated messages though, this approach was an attempt to remain consistent with existing code, but it sounds like we don't want to do that.
| Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) | 
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| sumStat(firstStats.largest, secondStats.largest), | ||
| sumStat(firstStats.completed, secondStats.completed) | ||
| sumStat(firstStats.completed, secondStats.completed), | ||
| NaN // Don't sum utilization, it makes no sense | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the indication that we don't really want to do this here. Although this might seem simpler than #131480 in terms of lines-of-code added, IMO that's because this is a bit of a short-cut. Fields in the transport messages which aren't represented in the REST API are tech debt.
I'd rather we moved towards having the allocator use dedicated messages for all these things rather than relying on stats APIs - the stats APIs we use today return a bunch of stuff about which we don't care.
Amend the thread pool node stats to include a float value for utilization.
Currently I've configured it to return the last value calculated for APM, because there is work underway to provide a singular utilization value for thread pools (see #131898). If that work merges before this I'll use that instead.
Relates ES-12316