Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

This enables testFailOnUnavailableShards.

This was originally pointing to the issue that is closed.
After enabling it I noticed that we were silently searching only in available shards.

This change fixes the failure by surfacing the failures during the index resolution.

} else {
l.onResponse(result.withIndexResolution(indexResolution));
}
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change completes the listener early, immediately after index resolution with a failure if there are failures

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks right to me. I think we are missing tracking and returning this failure to users when allow_partial_results=true, but we can address it in a follow-up. Can you minimize the format changes, please? Thanks @idegtiarenko.

);

FieldCapabilitiesFailure localResolutionFailure = null;
for (FieldCapabilitiesFailure failure : fieldCapsResponse.getFailures()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can mistakenly use failures from the remote clusters. Can we extend determineUnavailableRemoteClusters to include local failures and extract from there instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. I have replaced this with filtering for NoShardAvailableActionException.
I would like to merge this fix as it fixes the handling of local unavailable shards, but likely we will need one more followup iteration with focus on CCS (does unavailable remote cluster means result is partial? can we detect remote cluster unavailable shards?)

// all indices found by field-caps
private final Set<String> resolvedIndices;
@Nullable
private final FieldCapabilitiesFailure localResolutionFailure;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this with unavailableClusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so. I just updated the implementation to look for unavailable shards rather than generic failures and I would like to keep unavailableClusters remote only as documented:

// remote clusters included in the user's index expression that could not be connected to

@idegtiarenko idegtiarenko marked this pull request as ready for review March 21, 2025 08:29
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 21, 2025
@idegtiarenko idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged v9.0.0 v8.19.0 and removed needs:triage Requires assignment of a team area label labels Mar 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@idegtiarenko idegtiarenko requested a review from dnhatn March 21, 2025 15:28
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a comment, but LGTM as we can address it in a follow-up.


Set<NoShardAvailableActionException> unavailableShards = new HashSet<>();
for (FieldCapabilitiesFailure failure : fieldCapsResponse.getFailures()) {
if (failure.getException() instanceof NoShardAvailableActionException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is restrictive and could be fragile. The field-caps API can fail for reasons other than a NoShardAvailableActionException. Should we handle all exceptions? I'm fine if you plan to address this in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is happening when running testFailOnUnavailableShards,
but I agree, we should expand this list as we find more cases.
For now, do you believe there are other cases I could add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should detect the index pattern for non-remote scenarios and handle all exceptions.

@idegtiarenko idegtiarenko merged commit 3b685cb into elastic:main Mar 25, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.0 Commit could not be cherrypicked due to conflicts
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 125096

@idegtiarenko idegtiarenko deleted the fix_testFailOnUnavailableShards_resolver branch March 25, 2025 09:48
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants