Improve filter handling for ESQL CCS #126807

smalyshev · 2025-04-14T21:58:38Z

This is a partial fix for how ESQL works with CCS and filters. This contains:

Track the clusters that have been excluded by the filters or empty wildcards
Do not error out on filtered-out indices
Check for disconnects and missing indices on both lookups when filter is involved
Add tests for filter interactions with CCS mechanics

Missing index handling for index expressions including existing/non-existing mix, and some LIMIT 0 scenarios, are still broken, not intending to fix it in this patch.

Implements tests for #118054

quux00

First pass review. I didn't review the tests yet and I need to do a deeper round on the core logic, but these are my initial thoughts.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlExecutionInfo.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlCCSUtils.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

… missing indices So we have to drop FILTERED status for now

smalyshev · 2025-04-21T17:59:52Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/ColumnInfoImpl.java

        return originalTypes;
    }
+
+    public String toString() {


Not required for the fix strictly speaking, but it improves observability of columns and makes it much easier to inspect them when debugging.

elasticsearchmachine · 2025-04-21T18:02:52Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-04-21T18:02:52Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

astefan · 2025-04-22T10:11:36Z

...qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/EsqlRestValidationTestCase.java


    private String getInexistentIndexErrorMessage() {
-        return "\"reason\" : \"Found 1 problem\\nline 1:1: Unknown index ";
+        return "Unknown index ";


Is there any valid reason for this change?

Fyi, the line:column information there is relevant for the accuracy of the parsing error messaging system. Those two numbers are linked to the Node itself (which comes from the ANTLR parser) and is used by the Verifier to build a consistent error message, no matter the error message itself and which Node it comes from.

I think this test is too strict as it doesn't allow changing any details of how the errors are reported, and that led to some failures, but this may be older versions of the patch and I'll re-check why exactly it is needed. In general, I think our tests relying on specific error messages makes some things hard to refactor or fix (i.e. we have some parts of code produce "unknown index" and some "no such index" and we have a lot of tests that rely on that, and that means we essentially are testing implementation details instead of functionality, down to exact wording of the error message). In this particular case, beyond checking that there's an unknown index, it also checks that this is the only problem and that it is reported in a very specific way - which means if we changed how exactly unknown indices are handled in a particular case (e.g., say, moved the check from runtime to planning time, as we may have to do), this test would likely break. It may not be necessary now for this patch - going to check that - but I think this is something we may want to consider.

astefan · 2025-04-22T11:13:29Z

...rnalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterQueryWithFiltersIT.java

+        assertThat(clusterMetatata.getFailedShards(), equalTo(0));
+    }
+
+    protected void assertClusterMetadataNoShards(EsqlExecutionInfo.Cluster clusterMetatata, int shards, long took, String indexExpression) {


This method can be simplified by calling the one above assertClusterMetadataSuccess and passing 0 as the shards parameter. Also, this method doesn't even use the int shards parameter and can be completely removed from the method signature.

Sure, I can clean up those. I left them somewhat verbose to make the debugging easier but now I can condense them. I do want to keep the different methods because that makes it easier to see which case is happening from looking at the test code, but I certainly can make it less copy-pasty.

...rnalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterQueryWithFiltersIT.java

astefan · 2025-04-22T11:20:11Z

...rnalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterQueryWithFiltersIT.java

+    }
+
+    protected EsqlQueryResponse runQuery(String query, Boolean ccsMetadataInResponse, QueryBuilder filter) {
+        EsqlQueryRequest request = EsqlQueryRequest.syncEsqlQueryRequest();


Can this, also, be execute with async search as well?

Sure, I didn't even look into async side yet, had my hands full with the sync case. I'll look into that now.

astefan · 2025-04-22T12:09:57Z

...rnalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterQueryWithFiltersIT.java

+        try (EsqlQueryResponse resp = runQuery("from logs-*,c*:logs-*", randomBoolean(), filter)) {
+            List<List<Object>> values = getValuesList(resp);
+            assertThat(values, hasSize(docsTest1));
+            // FIXME: this is currently inconsistent with the non-wildcard case, since empty wildcard is not an error,


I am not sure I understand the issue here and the expected behavior.

I will describe a test I performed in non-CCS scenario where, from my point of view, things look ok. Data and index names are the one we use in CSV unit and IT tests.

from airports*,emp* metadata _index | keep *name*, _index

This returns columns as first_name | last_name | name | _index and data from indices

airports_mp employees_incompatible employees airports_web airports_not_indexed_nor_doc_values airports_not_indexed airports_no_doc_values airports

I am adding a filter to the request to filter out all the airports* indices:

"query":"from airports*,emp* metadata _index | keep *name*, _index", "filter": { "bool": { "filter": [ { "exists": { "field": "emp_no" } } ] } }

and I get back columns as first_name | last_name | _index and data from indices

employees_incompatible employees

If I understand the FIXME right, the issue is that the remote columns are not added to the response, and this is consistent with the existent non-CCS behavior. And those columns shouldn't be added to the response, since the filter eliminates completely those indices (and, thus, their columns).

A simple, reproduceable scenario is appreciated.
Would be worth having this scenario reproduced (and compared with) in both CCS and non-CCS scenarios.

Thank you 🙏

The problem here is that the columns are sometimes added and sometimes aren't, depending on obscure factors like presence of wildcards (even though the underlying index is the same), other indices, structure of the query, etc. Basically, if anything is not resolved on the first lookup, the second lookup will add the fields - but whether or not second lookup happens may be completely unrelated to the index in question (e.g. second lookup can happen because of something related to another index), and I think this would look completely random to the user. IMO there should be a consistent system of when the fields are there, and it should not depend on the implementation details of two lookups that it is now.
I think it doesn't really depend on CCS, just easier to set up and reproduce in CCS scenarios. This patch is not intending to fix it, just make the more serious failures (like filtered queries resulting in unknown index errors) go away.

@smalyshev I think it is important to see those very specific scenarios of inconsistency, since this is a new functionality that haven't been tried by our users much.

To give a bit more context, the way field names are resolved using field_caps has been a functionality that didn't change for at least 5 years in various projects (but using more or less the same code).

ES|QL is the first one where this field_caps filtering is happening and the possible inconsistency of the columns in the response is exactly a likely side-effect/bug/known issue of this change. On one side, ES|QL gains (in most cases) performance improvement by having the filter applied to field_caps, but there may be unknown inconsistencies we might not have foreseen, thus my previous asks about a specific use case without CCS where this might be an inconsistency. If we know of the specific use cases we could either improve the code or update our documentation with this information.

I'll look into whether I can reproduce the inconsistency without CCS.

quux00

Left a few minor requests/questions. LGTM

quux00 · 2025-04-22T19:28:36Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

            && executionInfo.getClusterStates(EsqlExecutionInfo.Cluster.Status.RUNNING).findAny().isEmpty()) {
            // for a CCS, if all clusters have been marked as SKIPPED, nothing to search so send a sentinel Exception
            // to let the LogicalPlanActionListener decide how to proceed
+            LOGGER.debug("No more clusters to search, ending analysis stage");


Is this logging useful? If we see that in the logs, how does it help? Is it useful for debugging?

I think it is as it shows what is happening in the debug log - and yes, this is mainly for debugging.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

dnhatn

LGTM. Thanks @smalyshev

astefan

LGTM

elasticsearchmachine · 2025-04-23T15:10:51Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126807

smalyshev · 2025-04-23T15:14:06Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

* Test for CCS with filters * Partial fix for CCS/filters problems (cherry picked from commit 12451b6) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/ColumnInfoImpl.java

elasticsearchmachine added the v9.1.0 label Apr 14, 2025

Test for CCS with filters

9f05ded

smalyshev force-pushed the fix-filter-ccs branch from ef16bf4 to 9f05ded Compare April 15, 2025 00:51

elasticsearchmachine and others added 2 commits April 15, 2025 00:58

[CI] Auto commit changes from spotless

6f348b9

Partial fix for CCS/filters problems

7313daf

smalyshev force-pushed the fix-filter-ccs branch from aa80117 to 7313daf Compare April 17, 2025 23:10

smalyshev changed the title ~~Test for CCS with filters~~ Improve filter handling for ESQL CCS Apr 17, 2025

smalyshev and others added 2 commits April 17, 2025 18:19

fix tests

fb7fb6a

Merge branch 'main' into fix-filter-ccs

8610845

smalyshev force-pushed the fix-filter-ccs branch 2 times, most recently from 26a7147 to 7fea91d Compare April 18, 2025 19:30

More tests

a51bf2a

smalyshev force-pushed the fix-filter-ccs branch from acb85a4 to a51bf2a Compare April 18, 2025 19:37

quux00 reviewed Apr 18, 2025

View reviewed changes

smalyshev and others added 4 commits April 18, 2025 15:19

We can not eliminate some filtered runtime calls since there could be…

d58d6e6

… missing indices So we have to drop FILTERED status for now

Merge branch 'main' into fix-filter-ccs

9f432eb

Merge branch 'main' into fix-filter-ccs

c730f21

Improve comments

d2df0e1

smalyshev requested review from astefan and dnhatn April 21, 2025 16:26

smalyshev added :Analytics/ES|QL AKA ESQL :Search Foundations/CCS v8.19.0 auto-backport Automatically create backport pull requests when merged >bug >non-issue labels Apr 21, 2025

smalyshev commented Apr 21, 2025

View reviewed changes

smalyshev marked this pull request as ready for review April 21, 2025 18:02

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Apr 21, 2025

smalyshev and others added 3 commits April 21, 2025 12:09

Add column tests

492daae

More tests for unavailable

9cc3fad

Merge branch 'main' into fix-filter-ccs

14e70bb

astefan reviewed Apr 22, 2025

View reviewed changes

smalyshev and others added 4 commits April 22, 2025 09:24

Merge branch 'main' into fix-filter-ccs

f9d78f2

Declutter asserts

216bd11

Remove this change, may not be needed

6cbfe19

add async

9098695

quux00 approved these changes Apr 22, 2025

View reviewed changes

dnhatn approved these changes Apr 22, 2025

View reviewed changes

smalyshev and others added 2 commits April 22, 2025 17:16

Improve comment

e66972a

Merge branch 'main' into fix-filter-ccs

7b71cf7

astefan approved these changes Apr 23, 2025

View reviewed changes

Merge branch 'main' into fix-filter-ccs

4cbf9c3

smalyshev enabled auto-merge (squash) April 23, 2025 14:04

smalyshev merged commit 12451b6 into elastic:main Apr 23, 2025
16 of 17 checks passed

elasticsearchmachine added the backport pending label Apr 23, 2025

smalyshev mentioned this pull request Apr 23, 2025

[8.x] Improve filter handling for ESQL CCS (#126807) #127261

Merged

smalyshev removed the backport pending label Apr 24, 2025

smalyshev deleted the fix-filter-ccs branch April 24, 2025 18:00

Improve filter handling for ESQL CCS #126807

Improve filter handling for ESQL CCS #126807

Uh oh!

Conversation

smalyshev commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quux00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Apr 21, 2025

Uh oh!

elasticsearchmachine commented Apr 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smalyshev Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quux00 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 23, 2025

💔 Backport failed

Uh oh!

smalyshev commented Apr 23, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

smalyshev commented Apr 14, 2025 •

edited

Loading

smalyshev Apr 22, 2025 •

edited

Loading