Skip to content

Conversation

@Tim-Brooks
Copy link
Contributor

This change adds a thread pool for write coordination to ensure that
bulk coordination does not get stuck on an overloaded primary node.

This change adds a thread pool for write coordination to ensure that
bulk coordination does not get stuck on an overloaded primary node.
@Tim-Brooks Tim-Brooks requested a review from a team as a code owner June 14, 2025 21:16
@Tim-Brooks Tim-Brooks added >non-issue :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v9.1.0 labels Jun 14, 2025
@Tim-Brooks
Copy link
Contributor Author

Tim-Brooks commented Jun 14, 2025

WIP / opened for CI.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Jun 14, 2025
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this makes sense to me, left a comment, but otherwise think we can move forward with tests etc.

},
executor
// Use the appropriate write executor for actual ingest processing
isOnlySystem ? systemWriteExecutor : writeExecutor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this, but it seems like for any case where we have ingest processing, we would then still have the coordination happen behind any local write work. At least the PR then avoids some of the wait roundtrips.

We could also decide to have both a system-write-coordination pool and a write-coordination pool and use those here? We can look at this in follow-ups ofc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also decide to have both a system-write-coordination pool and a write-coordination pool and use those here? We can look at this in follow-ups ofc.

Yes I would like to move ingest work to a non-WRITE thread pool in a follow-up as I think there might be a few things to discuss.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 17, 2025

🔍 Preview links for changed docs:

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

This seems intuitive and simple enough that we can merge. But it seems worth keeping an eye on nightly benchmarks the following day or two, just to double check.

safeAwait(startBarrier);
}

private static void fillWriteQueue(ThreadPool threadPool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename to:

Suggested change
private static void fillWriteQueue(ThreadPool threadPool) {
private static void fillWriteCoordinationQueue(ThreadPool threadPool) {

}
}

private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to:

Suggested change
private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) {
private static void blockWriteCoordinationPool(ThreadPool threadPool, CountDownLatch finishLatch) {

@Tim-Brooks Tim-Brooks merged commit 9ac6576 into elastic:main Jun 18, 2025
28 checks passed
kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Jun 23, 2025
This change adds a thread pool for write coordination to ensure that
bulk coordination does not get stuck on an overloaded primary node.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >non-issue Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants