Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jul 8, 2025

Currently, errors from the field-caps phase are not always handled properly, leading to cases where the final response is not marked as partial correctly. For example: FROM ok*,unavailable_index* should return a partial result, as unavailable_index* is skipped after the resolution phase. This change tracks failures that occur during field-caps and reports them in the final response. Since this only affects cases with allow_partial_results=true, I am labeling this as a non-issue and will backport the change to 9.1 and 8.19.

@dnhatn dnhatn force-pushed the resolution-errors branch 3 times, most recently from a74566a to 8fede66 Compare July 9, 2025 05:39
@dnhatn dnhatn force-pushed the resolution-errors branch from 8fede66 to d7f257d Compare July 9, 2025 16:14
@dnhatn dnhatn added the auto-backport Automatically create backport pull requests when merged label Jul 9, 2025
@dnhatn dnhatn marked this pull request as ready for review July 9, 2025 17:47
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

}
}

public void testResolutionFailures() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out also #129957 - I've made some tests for things that weren't handled properly.

@dnhatn dnhatn requested a review from smalyshev July 10, 2025 17:42
@dnhatn dnhatn requested a review from idegtiarenko July 10, 2025 17:42
@dnhatn
Copy link
Member Author

dnhatn commented Jul 10, 2025

@smalyshev @idegtiarenko Thanks for review. I think it's ready again.

) {
Set<String> clustersWithResolvedIndices = new HashSet<>();
// determine missing clusters
final Set<String> clustersWithNoMatchingIndices = new HashSet<>(executionInfo.clusterAliases());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should use something like executionInfo.getClusterStates(RUNNING) here. That should automatically exclude unavailable clusters if we call updateExecutionInfoWithUnavailableClusters before this. I think this would also let us eliminate unavailableClusters parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 77aacf2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to revert this refactoring because the tests in EsqlCCSUtilsTests assume these conditions together. Unfortunately, I don't have the bandwidth to work on this now. Could you work on this after this PR? Thanks!

// when queries use a remote cluster wildcard, e.g., `*:my-logs*`.
Exception nonIndexNotFound = failures.stream()
.map(FieldCapabilitiesFailure::getException)
.filter(ex -> ExceptionsHelper.unwrap(ex, IndexNotFoundException.class) == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here: are all situations where we can't use index for some reason (security, closed, hidden, whatever it is) would be covered by IndexNotFoundException? If not, we still could get unhelpful failures here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should report all errors, including IndexNotFoundException. This special handling exists because index_options are not available in ES|QL; otherwise, we shouldn't have it.

EsqlCCSUtils.updateExecutionInfoWithClustersWithNoMatchingIndices(executionInfo, result.indices, null);
var unavailableClusters = EsqlCCSUtils.determineUnavailableRemoteClusters(result.indices.failures());
EsqlCCSUtils.updateExecutionInfoWithUnavailableClusters(executionInfo, unavailableClusters);
EsqlCCSUtils.updateExecutionInfoWithClustersWithNoMatchingIndices(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it may be better to move updateExecutionInfoWithUnavailableClusters into updateExecutionInfoWithUnavailableClusters and have updateExecutionInfoWithClustersWithNoMatchingIndices use cluster statuses (since unavailables would get marked as skipped). What do you think?

Copy link
Member Author

@dnhatn dnhatn Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a good idea, but I don’t have bandwidth for it right now and would prefer to focus on getting this in for 8.19/9.1 as soon as possible. Would you be able to work on it later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I could make a followup on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thank you!

@dnhatn dnhatn requested a review from smalyshev July 14, 2025 03:54
EsqlCCSUtils.updateExecutionInfoWithClustersWithNoMatchingIndices(executionInfo, result.indices, null);
var unavailableClusters = EsqlCCSUtils.determineUnavailableRemoteClusters(result.indices.failures());
EsqlCCSUtils.updateExecutionInfoWithUnavailableClusters(executionInfo, unavailableClusters);
EsqlCCSUtils.updateExecutionInfoWithClustersWithNoMatchingIndices(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I could make a followup on this.

@dnhatn
Copy link
Member Author

dnhatn commented Jul 14, 2025

@smalyshev @idegtiarenko Thank you for reviews!

@dnhatn dnhatn merged commit a699655 into elastic:main Jul 14, 2025
33 checks passed
@dnhatn dnhatn deleted the resolution-errors branch July 14, 2025 22:16
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.1 Commit could not be cherrypicked due to conflicts
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 130840

jdconrad pushed a commit to JVerwolf/elasticsearch that referenced this pull request Jul 14, 2025
Currently, errors from the field-caps phase are not always handled
properly, leading to cases where the final response is not marked as
partial correctly. For example: FROM ok*,unavailable_index* should
return a partial result, as unavailable_index* is skipped after the
resolution phase. This change tracks failures that occur during
field-caps and reports them in the final response. Since this only
affects cases with allow_partial_results=true, I am labeling this as a
non-issue and will backport the change to 9.1 and 8.19.
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jul 15, 2025
Currently, errors from the field-caps phase are not always handled
properly, leading to cases where the final response is not marked as
partial correctly. For example: FROM ok*,unavailable_index* should
return a partial result, as unavailable_index* is skipped after the
resolution phase. This change tracks failures that occur during
field-caps and reports them in the final response. Since this only
affects cases with allow_partial_results=true, I am labeling this as a
non-issue and will backport the change to 9.1 and 8.19.

(cherry picked from commit a699655)
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jul 15, 2025
Currently, errors from the field-caps phase are not always handled
properly, leading to cases where the final response is not marked as
partial correctly. For example: FROM ok*,unavailable_index* should
return a partial result, as unavailable_index* is skipped after the
resolution phase. This change tracks failures that occur during
field-caps and reports them in the final response. Since this only
affects cases with allow_partial_results=true, I am labeling this as a
non-issue and will backport the change to 9.1 and 8.19.

(cherry picked from commit a699655)
@dnhatn
Copy link
Member Author

dnhatn commented Jul 15, 2025

💚 All backports created successfully

Status Branch Result
9.1
8.19

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Jul 15, 2025
Currently, errors from the field-caps phase are not always handled
properly, leading to cases where the final response is not marked as
partial correctly. For example: FROM ok*,unavailable_index* should
return a partial result, as unavailable_index* is skipped after the
resolution phase. This change tracks failures that occur during
field-caps and reports them in the final response. Since this only
affects cases with allow_partial_results=true, I am labeling this as a
non-issue and will backport the change to 9.1 and 8.19.

(cherry picked from commit a699655)
elasticsearchmachine pushed a commit that referenced this pull request Jul 15, 2025
Currently, errors from the field-caps phase are not always handled
properly, leading to cases where the final response is not marked as
partial correctly. For example: FROM ok*,unavailable_index* should
return a partial result, as unavailable_index* is skipped after the
resolution phase. This change tracks failures that occur during
field-caps and reports them in the final response. Since this only
affects cases with allow_partial_results=true, I am labeling this as a
non-issue and will backport the change to 9.1 and 8.19.

(cherry picked from commit a699655)
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
Currently, errors from the field-caps phase are not always handled 
properly, leading to cases where the final response is not marked as
partial correctly. For example: FROM ok*,unavailable_index* should
return a partial result, as unavailable_index* is skipped after the
resolution phase. This change tracks failures that occur during
field-caps and reports them in the final response. Since this only
affects cases with allow_partial_results=true, I am labeling this as a
non-issue and will backport the change to 9.1 and 8.19.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
Currently, errors from the field-caps phase are not always handled 
properly, leading to cases where the final response is not marked as
partial correctly. For example: FROM ok*,unavailable_index* should
return a partial result, as unavailable_index* is skipped after the
resolution phase. This change tracks failures that occur during
field-caps and reports them in the final response. Since this only
affects cases with allow_partial_results=true, I am labeling this as a
non-issue and will backport the change to 9.1 and 8.19.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.1 v9.1.1 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants