- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
Add thread pool for write coordination #129450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change adds a thread pool for write coordination to ensure that bulk coordination does not get stuck on an overloaded primary node.
| 
           
  | 
    
| 
           Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes sense to me, left a comment, but otherwise think we can move forward with tests etc.
| }, | ||
| executor | ||
| // Use the appropriate write executor for actual ingest processing | ||
| isOnlySystem ? systemWriteExecutor : writeExecutor | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this, but it seems like for any case where we have ingest processing, we would then still have the coordination happen behind any local write work. At least the PR then avoids some of the wait roundtrips.
We could also decide to have both a system-write-coordination pool and a write-coordination pool and use those here? We can look at this in follow-ups ofc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also decide to have both a system-write-coordination pool and a write-coordination pool and use those here? We can look at this in follow-ups ofc.
Yes I would like to move ingest work to a non-WRITE thread pool in a follow-up as I think there might be a few things to discuss.
| 
           🔍 Preview links for changed docs: 🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This seems intuitive and simple enough that we can merge. But it seems worth keeping an eye on nightly benchmarks the following day or two, just to double check.
| safeAwait(startBarrier); | ||
| } | ||
| 
               | 
          ||
| private static void fillWriteQueue(ThreadPool threadPool) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename to:
| private static void fillWriteQueue(ThreadPool threadPool) { | |
| private static void fillWriteCoordinationQueue(ThreadPool threadPool) { | 
| } | ||
| } | ||
| 
               | 
          ||
| private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to:
| private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) { | |
| private static void blockWriteCoordinationPool(ThreadPool threadPool, CountDownLatch finishLatch) { | 
This change adds a thread pool for write coordination to ensure that bulk coordination does not get stuck on an overloaded primary node.
This change adds a thread pool for write coordination to ensure that
bulk coordination does not get stuck on an overloaded primary node.