-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fail request when all target shards fail in runtime #131177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
13e020e to
6c358a7
Compare
6c358a7 to
6c34c05
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java
Outdated
Show resolved
Hide resolved
|
@smalyshev Thanks for your quick review! |
If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API. Closes elastic#128994
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API. Closes elastic#128994 (cherry picked from commit 8f6f763) # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/EsqlNodeFailureIT.java
If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API. Closes #128994
If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API. Closes #128994 (cherry picked from commit 8f6f763) # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/EsqlNodeFailureIT.java
| // do not fail if any final result has results | ||
| if (finalResults.stream().anyMatch(p -> p.getPositionCount() > 0)) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this will behave correctly.
I am imagining a case with a single index and no rows in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request will not fail as long as the single target shard does not fail.
| * regardless of the partial_results configuration or skip_unavailable setting. This behavior doesn't fully align with the search API, | ||
| * which doesn't consider the failures from the remote clusters when skip_unavailable is true. | ||
| */ | ||
| static void failIfAllShardsFailed(EsqlExecutionInfo execInfo, List<Page> finalResults) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method concerning only remote shards/failures or local too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method treats both local and remote failures uniformly. This is where the behavior in ES|QL differs slightly from the search API. Here, we iterate over all execution info in each cluster (both local and remotes) to accumulate successful shards (excluding skipped shards) and failed shards. The request only fails if there are no successful shards, some failed shards, and no rows produced in the final results.
I do not think I follow this. Does this mean that the entire query will fail when we query a single index with a single shard using allow_partial_results=true and that shard fails? |
Yes, your understanding is correct. That is the proposal, and it matches the existing behavior of the search API when querying only the local cluster. I believe we should fail the request, rather than return partial results, when all target shards have failed and no results (rows) are produced. |
If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API.
Closes #128994