Fix test fail on unavailable shards resolver #125096

idegtiarenko · 2025-03-18T13:36:40Z

This enables testFailOnUnavailableShards.

This was originally pointing to the issue that is closed.
After enabling it I noticed that we were silently searching only in available shards.

This change fixes the failure by surfacing the failures during the index resolution.

idegtiarenko · 2025-03-18T15:14:45Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+                        } else {
+                            l.onResponse(result.withIndexResolution(indexResolution));
+                        }
+                    })


This change completes the listener early, immediately after index resolution with a failure if there are failures

dnhatn

The approach looks right to me. I think we are missing tracking and returning this failure to users when allow_partial_results=true, but we can address it in a follow-up. Can you minimize the format changes, please? Thanks @idegtiarenko.

dnhatn · 2025-03-19T16:46:09Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/IndexResolver.java

        );

+        FieldCapabilitiesFailure localResolutionFailure = null;
+        for (FieldCapabilitiesFailure failure : fieldCapsResponse.getFailures()) {


I think this can mistakenly use failures from the remote clusters. Can we extend determineUnavailableRemoteClusters to include local failures and extract from there instead?

I think you are right. I have replaced this with filtering for NoShardAvailableActionException.
I would like to merge this fix as it fixes the handling of local unavailable shards, but likely we will need one more followup iteration with focus on CCS (does unavailable remote cluster means result is partial? can we detect remote cluster unavailable shards?)

dnhatn · 2025-03-19T16:47:43Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/index/IndexResolution.java

    // all indices found by field-caps
    private final Set<String> resolvedIndices;
+    @Nullable
+    private final FieldCapabilitiesFailure localResolutionFailure;


Can we merge this with unavailableClusters?

I do not think so. I just updated the implementation to look for unavailable shards rather than generic failures and I would like to keep unavailableClusters remote only as documented:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/index/IndexResolution.java

Line 63 in d185356

// remote clusters included in the user's index expression that could not be connected to

elasticsearchmachine · 2025-03-21T08:39:28Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

dnhatn

I've left a comment, but LGTM as we can address it in a follow-up.

dnhatn · 2025-03-24T15:25:14Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/IndexResolver.java


+        Set<NoShardAvailableActionException> unavailableShards = new HashSet<>();
+        for (FieldCapabilitiesFailure failure : fieldCapsResponse.getFailures()) {
+            if (failure.getException() instanceof NoShardAvailableActionException e) {


I think this is restrictive and could be fragile. The field-caps API can fail for reasons other than a NoShardAvailableActionException. Should we handle all exceptions? I'm fine if you plan to address this in a follow-up.

This case is happening when running testFailOnUnavailableShards,
but I agree, we should expand this list as we find more cases.
For now, do you believe there are other cases I could add?

I think we should detect the index pattern for non-remote scenarios and handle all exceptions.

elasticsearchmachine · 2025-03-25T09:00:27Z

💔 Backport failed

Status	Branch	Result
❌	9.0	Commit could not be cherrypicked due to conflicts
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 125096

(cherry picked from commit 3b685cb)

idegtiarenko added 2 commits March 18, 2025 13:38

cleanup test

07be3f2

Keep failures to resolve indices

ce2b86a

idegtiarenko requested review from dnhatn and nik9000 March 18, 2025 13:36

elasticsearchmachine added the v9.1.0 label Mar 18, 2025

idegtiarenko commented Mar 18, 2025

View reviewed changes

dnhatn reviewed Mar 19, 2025

View reviewed changes

idegtiarenko mentioned this pull request Mar 20, 2025

Enable and fix testFailOnUnavailableShards #125020

Closed

idegtiarenko added 3 commits March 20, 2025 10:29

Merge branch 'main' into fix_testFailOnUnavailableShards_resolver

56fb6ce

look for unavailable shards exceptions

d185356

Merge branch 'main' into fix_testFailOnUnavailableShards_resolver

eec7176

idegtiarenko marked this pull request as ready for review March 21, 2025 08:29

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 21, 2025

idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged v9.0.0 v8.19.0 and removed needs:triage Requires assignment of a team area label labels Mar 21, 2025

idegtiarenko requested a review from dnhatn March 21, 2025 15:28

nik9000 approved these changes Mar 24, 2025

View reviewed changes

dnhatn approved these changes Mar 24, 2025

View reviewed changes

Merge branch 'main' into fix_testFailOnUnavailableShards_resolver

aea8859

idegtiarenko merged commit 3b685cb into elastic:main Mar 25, 2025
17 checks passed

elasticsearchmachine added the backport pending label Mar 25, 2025

idegtiarenko mentioned this pull request Mar 25, 2025

[8.x] Fix test fail on unavailable shards resolver (#125096) #125569

Merged

elasticsearchmachine pushed a commit that referenced this pull request Mar 25, 2025

Fix test fail on unavailable shards resolver (#125096) (#125569)

2b268c3

(cherry picked from commit 3b685cb)

idegtiarenko removed backport pending v9.0.0 labels Mar 25, 2025

idegtiarenko deleted the fix_testFailOnUnavailableShards_resolver branch March 25, 2025 09:48

omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025

Fix test fail on unavailable shards resolver (elastic#125096)

8cba429

Fix test fail on unavailable shards resolver #125096

Fix test fail on unavailable shards resolver #125096

Uh oh!

Conversation

idegtiarenko commented Mar 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Mar 21, 2025

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 25, 2025

💔 Backport failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants