-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Time framed thread-pool utilization #131898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
23fe464 to
8345da4
Compare
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
|
Hi @mhl-b, I've created a changelog YAML for you. |
…sticsearch into framed-thread-pool-utilization
.../java/org/elasticsearch/benchmark/common/util/concurrent/ThreadPoolUtilizationBenchmark.java
Show resolved
Hide resolved
Framed thread pool utilization benchmark hacking
| * @param trackExecutionTime Whether to track execution stats | ||
| * @param trackUtilization enables thread-pool utilization metrics | ||
| * @param utilizationInterval when utilization is enabled, specifies interval for measurement | ||
| * @param trackOngoingTasks Whether to track ongoing task execution time, not just finished tasks | ||
| * @param trackMaxQueueLatency Whether to track max queue latency. | ||
| * @param executionTimeEwmaAlpha The alpha seed for execution time EWMA (ExponentiallyWeightedMovingAverage). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - mixed param description capitalization and ending periods. (What is the team preference here? It looks to be inconsistent across the codebase.)
.../org/elasticsearch/common/util/concurrent/TaskExecutionTimeTrackingEsThreadPoolExecutor.java
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/common/util/concurrent/TaskExecutionTimeTrackingEsThreadPoolExecutor.java
Outdated
Show resolved
Hide resolved
| assert intervalNano > 0; | ||
| this.interval = intervalNano; | ||
| this.timeNow = timeNow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - is it worth using a this(...) to call the other constructor and eliminate the duplicate code? (I know it is only a few lines, just wanted to mention it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truly nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other way around might be nice
FramedTimeTracker(long intervalNano) {
this(System::nanoTime);
}
🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truly nit
I find this worth addressing, we would not want these constructors to get out of sync in the future.
Your comment here makes it hard to read if you are rejecting the proposal or accepting it, hence my comment here.
Another approach would be to remove this helper constructor, seems to be used only once and it is not unreasonable nor unusual to let client code pass in the time-tracking mechanism they want to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No offence to Jeremy about "truly nit" :) I did talked with him about this PR.
Of course I will change it, if it catches eye of a reader, I don't have strong preference. Time provider can be in tracking config with default System::nanoTime
.../org/elasticsearch/common/util/concurrent/TaskExecutionTimeTrackingEsThreadPoolExecutor.java
Outdated
Show resolved
Hide resolved
…sticsearch into framed-thread-pool-utilization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this adds staleness, overhead and complexity. I am inclined to forego this for now and focus on the real balancing improvement work. I added a separate suggestion for how we could handle this instead, though I think we should simply live with what we have for now to ensure progress.
| */ | ||
| public synchronized long previousFrameTime() { | ||
| updateFrame0(timeNow.get()); | ||
| return previousTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is up to 30s stale. I think for this to work well we'd have to track it every second or so and then sum up all the prior frames. This adds even more overhead.
I think we should instead accept the original time-tracking's inaccuracy. I think the approach currently in code is good enough, but we can consider moving out the reset part into the consuming code. The node-usages stats could then itself track the prior value, either on the node or on the master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed staleness here 36374fc.
I measure time in a 1 second frame and use 30 seconds window. So it's at most 1 second stale, yet accurate for the last 30 seconds. Updated benchmark, overhead is still negligible.
|
@henningandersen there are few things that motivated this change rather than keeping existing logic
Also, time interval is configurable and can be set to lower values. It's easy to retain multiple frames for reporting purposes, for example 30 frames by 1 second interval. What I'm saying is that it would be really nice to put it into NodeStats, it will reduce scope of changes for the write-load balancing work. But to put it into node stats it should be independent from the callers. Also latest version has non-locking algorithm that has negligible overhead. edit:
it's done 36374fc |
| throw new IllegalStateException("No operation defined for [" + utilizationTrackingPurpose + "]"); | ||
| } | ||
| public double utilization() { | ||
| return (double) framedTimeTracker.totalTime() / (double) getMaximumPoolSize() / (double) framedTimeTracker.reportingInterval(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit of logic feels like it should be in the tracker itself perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it. I dont think frame-tracker has to know anything about threads and pool size.
| false, | ||
| false, | ||
| DEFAULT_EXECUTION_TIME_EWMA_ALPHA_FOR_TEST | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is peripheral but I think we should move to using the builder for DEFAULT and DO_NOT_TRACK. All those flags are meaningless without mouseovers. More-so now that we've added another one.
| final var now = nowTime / frameDuration; | ||
| final var frameWindow = getWindow(now); | ||
| frameWindow.frames[0].ongoingTasks.increment(); | ||
| frameWindow.frames[0].startEndDiff.add((now + 1) * frameDuration - nowTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit can this be frameDuration - (nowTime % frameDuration) ? not sure if that's more efficient but it's fewer operations.
| newFrames[i] = frame; | ||
| } | ||
| return new FrameWindow(newFrames); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like more object creation going on than what I would have hoped for, I know its probably necessary for the safety, but if I'm reading it right I think it means we create a new
FrameWindowFrame[]Framewhich each include 2 xLongAdder
every second for a 30s/1s tracker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, if I reuse window there is a risk of race condition. For reuse I would need to reset frames, it might happen that start/end gets window and try to update frame, and window update happens at the same moment.
So I would rather think of collapsing old frames into one big past frame, since it's unlikely to change. Only current and current-1 usually at race.
This PR changes Thread Pool utilization reporting intervals from dynamic (based on caller's frequency) into static (time frame based).
Originally there was a single consumer of utilization metric that runs on specific interval, so we would calculate utilization on the fly comparing previous invocation time and current. And it worked, kind of... With growing demand on utilization metrics, from shards balancing and allocation, we have new consumers, so this poll mechanism does not scale well. Thread pool needs to create a separate tracker for each caller.
This PR introduces total execution time measurements for thread-pool per fixed time frame. By default thread-pool will measure time in a 1 second frame and keep last 30 frames in memory. When we calculate utilization we look at past 30 frames. Utilization would be at most 1 second stale.
A new FramedTimeTracker incapsulates frame tracking logic, comes with own set of tests. Also added JMH benchmark.
Few options were considered for concurrent access to FramedTimeTracker - synchronized methods, read-write lock and non-locking.
Synchronized methods uses private long fields for tracking. Read-write lock uses write-lock when we update frame and read-lock for updating atomic fields during frame. Those are in commit history. Non-locking algorithm uses frame-windows, atomic flag for update and Thread.onSpinWait when update is happening.
Currently non-locking algorithm would be preferable, it takes a bit more memory than locking to keep track of extra frames, longAdders as counters, and few atomics, but it should be a good tradeoff for low performance overhead. Which is almost identical with baseline that just runs busy CPU cycles.
Baseline is 10000 busy CPU cycles, which is about 16 microseconds. Our write-thread-pool does have small, sub-millisecond, tasks.
Latest result from my machine