Fail request when all target shards fail in runtime #131177

dnhatn · 2025-07-14T05:36:32Z

If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API.

Closes #128994

elasticsearchmachine · 2025-07-15T19:43:31Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

dnhatn · 2025-07-15T22:31:03Z

@smalyshev Thanks for your quick review!

If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API. Closes elastic#128994

elasticsearchmachine · 2025-07-15T22:33:01Z

💔 Backport failed

Status	Branch	Result
✅	9.1
❌	8.19	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 131177

dnhatn · 2025-07-15T23:17:33Z

💚 All backports created successfully

Status	Branch	Result
✅	8.19

Questions ?

Please refer to the Backport tool documentation

If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API. Closes elastic#128994 (cherry picked from commit 8f6f763) # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/EsqlNodeFailureIT.java

If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API. Closes #128994

If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting. This behavior does not fully align with the search API, where skip_unavailable ignores all failures from remote clusters and only fails the request when all shards in the local cluster fail. However, we believe the proposed behavior is more sensible than the existing behavior in the search API. Closes #128994 (cherry picked from commit 8f6f763) # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/EsqlNodeFailureIT.java

idegtiarenko · 2025-07-16T06:20:09Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

+        // do not fail if any final result has results
+        if (finalResults.stream().anyMatch(p -> p.getPositionCount() > 0)) {
+            return;
+        }


I am not sure if this will behave correctly.
I am imagining a case with a single index and no rows in it.

The request will not fail as long as the single target shard does not fail.

idegtiarenko · 2025-07-16T06:21:22Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

+     * regardless of the partial_results configuration or skip_unavailable setting. This behavior doesn't fully align with the search API,
+     * which doesn't consider the failures from the remote clusters when skip_unavailable is true.
+     */
+    static void failIfAllShardsFailed(EsqlExecutionInfo execInfo, List<Page> finalResults) {


Is this method concerning only remote shards/failures or local too?

This method treats both local and remote failures uniformly. This is where the behavior in ES|QL differs slightly from the search API. Here, we iterate over all execution info in each cluster (both local and remotes) to accumulate successful shards (excluding skipped shards) and failed shards. The request only fails if there are no successful shards, some failed shards, and no rows produced in the final results.

idegtiarenko · 2025-07-16T06:25:03Z

If all target shards, excluding skipped shards, fail, we should fail the entire query regardless of the partial_results configuration or skip_unavailable setting.

I do not think I follow this. Does this mean that the entire query will fail when we query a single index with a single shard using allow_partial_results=true and that shard fails?

dnhatn · 2025-07-18T15:57:09Z

I do not think I follow this. Does this mean that the entire query will fail when we query a single index with a single shard using allow_partial_results=true and that shard fails?

Yes, your understanding is correct. That is the proposal, and it matches the existing behavior of the search API when querying only the local cluster. I believe we should fail the request, rather than return partial results, when all target shards have failed and no results (rows) are produced.

elasticsearchmachine added the v9.2.0 label Jul 14, 2025

dnhatn force-pushed the all-shards-failed branch from 13e020e to 6c358a7 Compare July 15, 2025 04:12

Fail request when all target shards fail in runtime

6c34c05

dnhatn force-pushed the all-shards-failed branch from 6c358a7 to 6c34c05 Compare July 15, 2025 19:31

dnhatn changed the title ~~All shards failed~~ Fail request when all target shards fail in runtime Jul 15, 2025

dnhatn added v9.1.1 v8.19.1 :Analytics/ES|QL AKA ESQL >non-issue labels Jul 15, 2025

dnhatn requested review from idegtiarenko, nik9000, quux00 and smalyshev July 15, 2025 19:42

dnhatn marked this pull request as ready for review July 15, 2025 19:43

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 15, 2025

dnhatn added the auto-backport Automatically create backport pull requests when merged label Jul 15, 2025

smalyshev reviewed Jul 15, 2025

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java Show resolved Hide resolved

smalyshev reviewed Jul 15, 2025

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java Outdated Show resolved Hide resolved

dnhatn added 2 commits July 15, 2025 13:36

reword

a1f002b

assertion

b961e9b

dnhatn requested a review from smalyshev July 15, 2025 20:39

smalyshev approved these changes Jul 15, 2025

View reviewed changes

dnhatn merged commit 8f6f763 into elastic:main Jul 15, 2025
33 checks passed

dnhatn deleted the all-shards-failed branch July 15, 2025 22:31

dnhatn mentioned this pull request Jul 15, 2025

[9.1] Fail request when all target shards fail in runtime (#131177) #131337

Merged

elasticsearchmachine added the backport pending label Jul 15, 2025

dnhatn mentioned this pull request Jul 15, 2025

[8.19] Fail request when all target shards fail in runtime (#131177) #131339

Merged

idegtiarenko reviewed Jul 16, 2025

View reviewed changes

dnhatn removed the backport pending label Jul 17, 2025

dnhatn mentioned this pull request Jul 25, 2025

ESQL: 200 even if all shards fail due to bug - partial failure behavior inconsistent with _search #128311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fail request when all target shards fail in runtime #131177

Fail request when all target shards fail in runtime #131177

Uh oh!

dnhatn commented Jul 14, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

dnhatn commented Jul 15, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 15, 2025

Uh oh!

dnhatn commented Jul 15, 2025

Uh oh!

idegtiarenko Jul 16, 2025 •

edited

Loading

Uh oh!

dnhatn Jul 18, 2025

Uh oh!

idegtiarenko Jul 16, 2025 •

edited

Loading

Uh oh!

dnhatn Jul 18, 2025

Uh oh!

idegtiarenko commented Jul 16, 2025

Uh oh!

dnhatn commented Jul 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fail request when all target shards fail in runtime #131177

Fail request when all target shards fail in runtime #131177

Uh oh!

Conversation

dnhatn commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

dnhatn commented Jul 15, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 15, 2025

💔 Backport failed

Uh oh!

dnhatn commented Jul 15, 2025

💚 All backports created successfully

Questions ?

Uh oh!

idegtiarenko Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

idegtiarenko commented Jul 16, 2025

Uh oh!

dnhatn commented Jul 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dnhatn commented Jul 14, 2025 •

edited

Loading

idegtiarenko Jul 16, 2025 •

edited

Loading

idegtiarenko Jul 16, 2025 •

edited

Loading