Move to a single shared utilization tracker #131796

nicktindall · 2025-07-24T06:07:52Z

Initial work after discussion around thread-pool utilization tracking.

nicktindall · 2025-07-24T06:08:28Z

.../org/elasticsearch/common/util/concurrent/TaskExecutionTimeTrackingEsThreadPoolExecutor.java

 public final class TaskExecutionTimeTrackingEsThreadPoolExecutor extends EsThreadPoolExecutor {
    public static final int QUEUE_LATENCY_HISTOGRAM_BUCKETS = 18;
    private static final int[] LATENCY_PERCENTILES_TO_REPORT = { 50, 90, 99 };
+    private static final long UTILISATION_REFRESH_INTERVAL_NANOS = TimeValue.timeValueSeconds(45).nanos();


This should be a setting perhaps so we can experiment?

nicktindall · 2025-07-24T06:12:26Z

.../org/elasticsearch/common/util/concurrent/TaskExecutionTimeTrackingEsThreadPoolExecutor.java

+         * Get the most recent utilization value calculated
+         */
+        public double getUtilization() {
+            return lastUtilization;


perhaps we should also call recalculateUtilizationIfDue in here in case there is zero activity in the pool? probably not an issue for the write pool.

I think we want that to avoid it being infinitely stale, but then you get into a task tracking issue?

Also, it seems like the value can now be 30s old instead of current?

I added a recalculate here, in case we end up using this. You're right, because we are recalculating independently of the polling it's possible the utilisation is up to TaskTrackingConfig#utilizationRefreshInterval old. Given that it's an average over the same interval, and our current goal of acting only on persistent hot-spots I think that's probably OK, but hopefully we can get a better utilisation measure.

nicktindall · 2025-07-24T07:26:38Z

server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java

            private boolean trackOngoingTasks = false;
            private boolean trackMaxQueueLatency = false;
            private double ewmaAlpha = DEFAULT_EXECUTION_TIME_EWMA_ALPHA_FOR_TEST;
+            private TimeValue utilizationRefreshInterval = TimeValue.timeValueSeconds(30);


I suspect that 30s might be too short given we saw > 1.0 utilization in the APM metrics which were calculated every 60s. Perhaps we should use 60 or split the difference and do 45?

elasticsearchmachine · 2025-07-24T07:29:40Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

mhl-b · 2025-07-24T22:08:50Z

I though of a different approach, when we talked outside. I'll try to summarize. Happy to send PR if you like it.

We can trade liveliness to accuracy when measuring thread-pool utilization. That means reporting
utilization from past interval (30sec/1min) is enough, since there no "real-time" actions based on this
metric.

Couple of terms I use below:
Interval - time duration for measuring utilization
Frame - a sequence number of the interval
where Frame*Interval is a frame start time

Approach is simple - pollUtilization returns execution time of previous frame, since we know exactly
all tasks that are finished and were running in previous frame. When task is finished, current and
previous frame stats are updated.

We need to consider following cases:

there are no tasks in frame
task started and finished in same frame
task started and still running
task started before current frame and finished in current
task started before current and still running

pseudocode:

Class variables:
currentFrame
currentFrameExecutionTime // for finished tasks
previousFrame
previousFrameExecutionTime // for finished tasks
currentTasks

afterExecute(task):
  endFrame = task.endTime / interval
  maybeResetFrame(endFrame)
  startFrame = task.startTime / interval
  if (startFrame == currentFrame):
    currentFrameExecutionTime += task.endTime - task.startTime # task started and finished in same frame
  else:
    currentFrameExecutionTime += task.endTime - currentFrame * interval
    if(startFrame == previousFrame):
      previousFrameExecutionTime += currentFrame * interval - task.startTime # task started in previous frame
    else:
      previousFrameExecutionTime += interval # task started before previous frame


# first time seeing frame or havent updated frame for a long time
maybeResetFrame(nowFrame):
  if (nowFrame == currentFrame):
    pass
  else if (nowFrame - currentFrame == 1):
    previousFrameExecutionTime = currentFrameExecutionTime
    currentFrameExecutionTime = 0
    previousFrame = currentFrame
    currentFrame = nowFrame
  else:
    previousFrameExecutionTime = 0
    currentFrameExecutionTime = 0
    currentFrame = nowFrame
    previousFrame = nowFrame -1

# returns previous frame tasks, completed and still running
pollUtilization():
  nowFrame = timeNow / interval
  maybeResetFrame(nowFrame)
  totalTime = previousFrameExecutionTime
  for (task in currentTasks):
      startFrame = task.startTime / interval
      if (startFrame == previousFrame):
        totalTime += currentFrame * interval - task.startTime # task started in previous frame, still running
      else if (startFrame < previousFrame):
        totalTime += interval # task started before previous frame, still running
  return totalTime / threadPoolSize * interval # can be cached by frame key

afterExecute is a lightweight, just updating few numbers.
pollUtilization a bit heavier because currentTasks need concurrent access, probably a copy. There is still racy condition when task is finished while we looping though currentTasks, need to work on that.

nicktindall · 2025-07-24T23:10:39Z

afterExecute is a lightweight, just updating few numbers. pollUtilization a bit heavier because currentTasks need concurrent access, probably a copy. There is still racy condition when task is finished while we looping though currentTasks, need to work on that.

I like it but I think it's better to avoid the dependency on current tasks. Not all thread pools track running tasks and to me it seems quite expensive to do (i.e you wouldn't want to turn it on across the board). I'd be inclined to keep it simple and revisit if the accuracy is a problem?

mhl-b · 2025-07-25T03:42:35Z

as discussed in slack I will work on my proposal addressing drawbacks of tracking ongoing tasks
#131898

Move to a single shared utilization tracker

df8d980

nicktindall added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Jul 24, 2025

elasticsearchmachine added the v9.2.0 label Jul 24, 2025

nicktindall commented Jul 24, 2025

View reviewed changes

nicktindall added 2 commits July 24, 2025 16:09

Fix javadoc

c701001

Fix javadoc

a573b56

nicktindall commented Jul 24, 2025

View reviewed changes

Fix test, add refresh interval to TaskTrackingConfig

c8b3de5

nicktindall commented Jul 24, 2025

View reviewed changes

nicktindall requested review from DiannaHohensee and mhl-b July 24, 2025 07:29

Merge branch 'main' into single_utilization_number

30dd5d5

nicktindall marked this pull request as ready for review July 24, 2025 07:29

nicktindall requested a review from a team as a code owner July 24, 2025 07:29

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jul 24, 2025

Merge branch 'main' into single_utilization_number

0a26722

nicktindall added 2 commits August 1, 2025 14:16

Merge branch 'main' into single_utilization_number

6dc97f3

Recalculate utilization on read if it's stale

b5c87b6

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move to a single shared utilization tracker #131796

Move to a single shared utilization tracker #131796

Uh oh!

nicktindall commented Jul 24, 2025

Uh oh!

nicktindall Jul 24, 2025

Uh oh!

nicktindall Jul 24, 2025

Uh oh!

henningandersen Jul 30, 2025

Uh oh!

nicktindall Aug 1, 2025

Uh oh!

nicktindall Jul 24, 2025

Uh oh!

elasticsearchmachine commented Jul 24, 2025

Uh oh!

mhl-b commented Jul 24, 2025 •

edited

Loading

Uh oh!

nicktindall commented Jul 24, 2025

Uh oh!

mhl-b commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Move to a single shared utilization tracker #131796

Are you sure you want to change the base?

Move to a single shared utilization tracker #131796

Uh oh!

Conversation

nicktindall commented Jul 24, 2025

Uh oh!

nicktindall Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

henningandersen Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jul 24, 2025

Uh oh!

mhl-b commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicktindall commented Jul 24, 2025

Uh oh!

mhl-b commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mhl-b commented Jul 24, 2025 •

edited

Loading