-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Rework cancellation test for batched query execution #133579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework cancellation test for batched query execution #133579
Conversation
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
public void onNewReaderContext(ReaderContext c) { | ||
if (runOnNewReaderContext.get() != null) { | ||
runOnNewReaderContext.get().accept(c); | ||
public void onPreQueryPhase(SearchContext c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change because SearchContext gave me easier access to the node ID.
Both the onNewReaderContext and onPreQueryPhase hooks run before query execution begins, so either will do the job for this test (the only user of SearchShardBlockingPlugin).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I am only wondering whether we should keep the old test and still run it without batched execution, maybe that's overkill, up to you @benchaplin
Good shout. I suppose the batched setting will be available for users to disable, so I think we should keep the old test. I've randomized the setting. |
💚 Backport successful
|
Before this PR: this test makes a search request with allow_partial_search_results=false then manipulates the execution:
Batched query execution handles cancellations in a slightly worse manner. Instead of retrying (or failing if there are no replicas) immediately, batched queries must wait for all shards in the batch to complete. Then the data node responds to the coordinating node with the batch result, which handles shard failures. Therefore, this test doesn't work for batched, because if only one shard is allowed to proceed and it's part of a batched request, the batch will never complete.
After this PR: this test makes a search request with allow_partial_search_results=false then manipulates the execution: