POC - Semantic search CCS support with ccs_minimize_roundtrips=false #137247

Mikep86 · 2025-10-28T12:34:25Z

This is a POC for adding semantic search CCS support with ccs_minimize_roundtrips=false. This is achieved through three high-level changes:

The addition of GetInferenceFieldsAction, which allows us to get inference fields (and their associated inference results for a given query) for a remote cluster
The ability to register async actions for remote clusters in QueryRewriteContext
Updating the semantic and intercepted queries to get remote cluster inference results during local cluster coordinator rewrite when ccs_minimize_roundtrips=false

These changes add semantic search CCS support when using:

ES|QL
Query DSL and ccs_minimize_roundtrips=false
(With some follow-up changes) Simplified linear/rrf retrievers

Note that as a POC, the code is a bit rough. Some things are over-complicated, unhandled edge cases still exist, and the few tests that exist are hacked together. Everything will get cleaned up as this is split into multiple PRs for a production implementation.

…tion

…eried when ccs_minimize_roundtrips=false

… only when ccs_minimize_roundtrips=false

Mikep86 · 2025-10-28T12:34:50Z

@elasticmachine update branch

Mikep86 · 2025-10-28T12:39:28Z

...nternalClusterTest/java/org/elasticsearch/xpack/esql/action/SemanticTextMultiClustersIT.java

This test is a total hack job, it exists only to show that this approach also addresses semantic search CCS in ES|QL

TBH the test doesn't look that bad to me, a little cleanup and I think it's fine

Mikep86 · 2025-10-28T12:42:13Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java

It's probably time to factor out the methods for getting inference results into a common place outside of SemanticQueryBuilder

Mikep86 · 2025-10-28T12:43:42Z

...lusterTest/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverCrossClusterSearchIT.java

This test can be disregarded for now. I added it to help characterize retriever behavior with CCS.

Mikep86 · 2025-10-28T15:04:44Z

...rc/main/java/org/elasticsearch/xpack/inference/queries/InterceptedInferenceQueryBuilder.java


-        // If we are handling a CCS request, always retain the intercepted query logic so that we can get inference results generated on
-        // the local cluster from the inference results map when rewriting on remote cluster data nodes. This can be necessary when:
-        // - A query specifies an inference ID override
-        // - Only non-inference fields are queried on the remote cluster
-        if (inferenceIds.isEmpty() && this.ccsRequest == false) {
-            // Not querying a semantic text field
+        boolean ccsRequest = this.ccsRequest || resolvedIndices.getRemoteClusterIndices().isEmpty() == false;
+        Boolean ccsMinimizeRoundTrips = queryRewriteContext.isCcsMinimizeRoundTrips();
+        if (inferenceIds.isEmpty() && (ccsRequest == false || Boolean.TRUE.equals(ccsMinimizeRoundTrips))) {
+            // Not querying a semantic text field locally and either:
+            // - no remote indices are specified
+            // - ccs_minimize_roundtrips: true, so the query will be re-intercepted (if necessary) on the remote cluster
            return originalQuery;
        }


Changes here are causing some regressions at the moment. Since we now want to handle CCS when ccs_minimize_roundtrips: false and we are querying an inference field only on a remote cluster, we have to use the intercepted query logic for an additional rewrite iteration to resolve remote cluster inference fields (if any). Thus, we have to adjust when we rewrite to the original query to allow for this additional rewrite iteration.

This logic isn't quite right yet, but I'm confident with a a little more time we'll work out the edge cases here.

Mikep86 · 2025-10-28T15:05:37Z

...ClusterTest/java/org/elasticsearch/search/ccs/KnnVectorQueryBuilderCrossClusterSearchIT.java

+    @AwaitsFix(bugUrl = "https://fake.url")
    public void testKnnQuery() throws Exception {


This test is temporarily disabled due to regressions introduced in https://github.com/elastic/elasticsearch/pull/137247/files#r2469934806

kderusso

Very high level review as I see that there is a lot of planned cleanup. Overall the approach seems to make a lot of sense. I would be curious, you'd mentioned there were some alternate approaches that traded some efficiency for simplicity. Wondering if you could help the team understand a bit more about those tradeoffs in case we want to do future optimizations.

kderusso · 2025-10-28T19:37:56Z

server/src/main/java/org/elasticsearch/index/query/QueryRewriteContext.java

     */
    public void executeAsyncActions(ActionListener<Void> listener) {
-        if (asyncActions.isEmpty()) {
+        if (asyncActions.isEmpty() && remoteAsyncActions.isEmpty()) {


I would break out remote async actions into their own method.

The problem with breaking remote async actions out into their own method is that it adds considerable complexity to callers of QueryRewriteContext that want to ensure that all (i.e. local and remote) async actions are executed. IMO that should be the default, executing only local async actions could lead to strange edge cases.

Having a different method for executing remote async actions would have the following side effects:

Callers would need to remember to check both hasAsyncActions and hasRemoteAsyncActions

Callers would need to construct a GroupedActionListener that encapsulates calls to executeAsyncActions and executeRemoteAsyncActions to ensure that both are complete before asynchronously moving to the next rewrite iteration. See how executeAsyncActions is called in rewriteAndFetch for a concrete example of the additional complexity pushed to the caller.

kderusso · 2025-10-28T19:40:06Z

x-pack/plugin/esql/build.gradle

  testImplementation('org.webjars.npm:fontsource__roboto-mono:4.5.7')

  internalClusterTestImplementation project(":modules:mapper-extras")
+  internalClusterTestImplementation project(xpackModule('inference'))


So I found out the hard way, that ES|QL will enter some weird jarhell state and tests won't pass. You're going to want to make sure to not have this inference dependency for your non-draft PR. I solved this in my PR by refactoring the classes I needed to core.

kderusso · 2025-10-28T19:44:32Z

...re/src/main/java/org/elasticsearch/xpack/core/inference/action/GetInferenceFieldsAction.java

+        super(NAME);
+    }
+
+    public static class Request extends ActionRequest {


Remember to add BWC serialization tests for these (and the associated responses) & make sure the json docs are added too

kderusso · 2025-10-28T19:45:39Z

...nternalClusterTest/java/org/elasticsearch/xpack/esql/action/SemanticTextMultiClustersIT.java

TBH the test doesn't look that bad to me, a little cleanup and I think it's fine

kderusso · 2025-10-28T19:50:59Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java

dimitris-athanasiou

Nice!

++ on approach! Makes sense!

dimitris-athanasiou · 2025-11-03T11:08:37Z

server/src/main/java/org/elasticsearch/index/query/QueryRewriteContext.java

     */
    public boolean hasAsyncActions() {
-        return asyncActions.isEmpty() == false;
+        return asyncActions.isEmpty() == false || remoteAsyncActions.isEmpty() == false;


There is some interplay here with remote and non-remote actions. In registerRemoteAsyncAction we're adding the remote ones in the non-remote list too. Thus, the check on asyncActions should be enough? Otherwise, should we not be mixing remote/non-remote at all?

No, remote and local async actions are stored separate lists. This is necessary because the remote async actions are mapped by cluster alias.

dimitris-athanasiou · 2025-11-03T11:10:10Z

server/src/main/java/org/elasticsearch/index/query/QueryRewriteContext.java

-            CountDown countDown = new CountDown(asyncActions.size());
+            int actionCount = asyncActions.size();
+            for (var remoteAsyncActionList : remoteAsyncActions.values()) {
+                actionCount += remoteAsyncActionList.size();


Same here. Are we double-counting remote actions?

No, see https://github.com/elastic/elasticsearch/pull/137247/files#r2518344021

dimitris-athanasiou · 2025-11-03T11:18:39Z

...rc/main/java/org/elasticsearch/xpack/inference/queries/InterceptedInferenceQueryBuilder.java

    protected final T originalQuery;
    protected final Map<FullyQualifiedInferenceId, InferenceResults> inferenceResultsMap;
-    protected final SetOnce<Map<FullyQualifiedInferenceId, InferenceResults>> inferenceResultsMapSupplier;
+    protected final SetOnce<Map<FullyQualifiedInferenceId, InferenceResults>> localInferenceResultsMapSupplier;


As these are not serialized and we don't have BWC issues, I think it might be nice to group them together in a single class that manages them.

…uster

smalyshev · 2025-11-03T17:39:02Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java

+                    listener.onResponse(null);
+                }, e -> {
+                    Exception failure = e;
+                    if (e.getCause() instanceof ActionNotFoundTransportException actionNotFoundTransportException


Couldn't we check remote's transport version to know if the new action is supported before we sent the request?

We don't get that information until the semantic query is serialized for transport to the remote cluster, which happens way later than query rewriting

We can check the minimum transport version in the cluster, in fact that's how we should determine if the query is eligible for the new path to handle ccs.

@jimczi Can we get a remote cluster's min transport version from local cluster state?

For ESQL at least, we know min transport version once initial index lookup is done, it's kept in PreAnalysisResult. It's still being worked on to make all the parts respect it properly, e.g. see #137431
Not sure when it comes into action on DSL side.

I'll take a look if it's possible to get remote cluster min transport version from something like cluster service

See ClusterState#getMinTransportVersion

Isn't that only for the current cluster's min transport version?

ah sorry you meant the remote cluster and I read local for some reasons.

~~To close the loop on this, I don't see a way to get the remote cluster transport version during query rewrite~~

Nevermind, we can use RemoteClusterClient#getConnection to get the remote cluster transport version here: https://github.com/Mikep86/elasticsearch/blob/9ee1813b95d3834194277a1603c424fef64cd7fc/server/src/main/java/org/elasticsearch/client/internal/RemoteClusterClient.java#L48-L53

BASE=60406a6315bb9b1fc847e614175899a9161b2e82 HEAD=5aab46d5a38808333f5f4a432ca3057a015f9162 Branch=main

Mikep86 added 30 commits October 22, 2025 12:30

Tweaked CCS tests for debugging

8d77992

Added a method to compute service to build a query rewrite context

c3e28a5

Add remote coordinator rewrite

a1a1423

Added linear retriever cross-cluster search test

cc28bb0

Added get inference fields API

ac38af0

Use the correct inject annotation

0a15496

Update query rewrite context to execute remote cluster actions

5bb1f6e

Use direct executor service

6fde53e

Add remote cluster action type

d246828

Disable coordinator rewrite

3f7a3fd

Fix test plugins

c2d5763

Add fields and query to request

b8df821

Add resolve wildcards to request

b8976c2

Build the inference fields map

ad5b8f2

Get inference results

f4add01

Don't hard-code useDefaultFields

91204c9

Added code to semantic query builder to get remote inference results

ee36f8d

Added remote inference results map supplier

81bf2bf

Update semantic query to remote ccs_minimize_roundtrips=false restric…

6a5c7bb

…tion

Fix logic errors

2b60947

Update semantic query builder CCS test

81f2638

Updated intercepted queries to handle ccs_minimize_roundtrips=false

988e344

Updated intercepted queries to detect when no inference fields are qu…

6b7e784

…eried when ccs_minimize_roundtrips=false

Get remote inference results during local cluster coordinator rewrite…

e48bd2e

… only when ccs_minimize_roundtrips=false

Pre-allocate hashmap size

503286a

Update match query builder CCS integration tests

230a96c

Fix logic error

9f0fcb1

Remove debug code

44a0f08

Revert changes to ClusterComputeHandler

b4d9b70

Spotless

54b14cc

Mikep86 added the :SearchOrg/Relevance Label for the Search (solution/org) Relevance team label Oct 28, 2025

Merge branch 'main' into semantic-search_ccs-esql-discovery

e6eec87

elasticsearchmachine added the v9.3.0 label Oct 28, 2025

Mikep86 commented Oct 28, 2025

View reviewed changes

Mikep86 added 6 commits October 28, 2025 08:46

Revert changes to Clusters

d5284b1

Adjusted interception logic when ccs_minimize_roundtrips: true

5b2d0df

Disable broken unit test

e1ef7ea

Disable broken integration tests

8a50bd3

Disable broken integration test

49daf0f

Fix class cast exceptions

51a7a3c

Mikep86 commented Oct 28, 2025

View reviewed changes

Mikep86 requested review from a team, dimitris-athanasiou, ioanatia, jimczi and kderusso October 28, 2025 16:36

kderusso reviewed Oct 28, 2025

View reviewed changes

dimitris-athanasiou reviewed Nov 3, 2025

View reviewed changes

Return 400 error when attempting to run a CCS query on an outdated cl…

5aab46d

…uster

smalyshev reviewed Nov 3, 2025

View reviewed changes

jimczi removed their request for review November 3, 2025 17:40

Mikep86 mentioned this pull request Nov 6, 2025

Add internal action for getting inference fields and inference results for those fields #137680

Merged

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Nov 6, 2025

Mirror upstream elastic#137247 as single snapshot commit for AI review

0b74a84

BASE=60406a6315bb9b1fc847e614175899a9161b2e82 HEAD=5aab46d5a38808333f5f4a432ca3057a015f9162 Branch=main

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Nov 7, 2025

Mirror upstream elastic#137247 as single snapshot commit for AI review

31758a8

BASE=60406a6315bb9b1fc847e614175899a9161b2e82 HEAD=5aab46d5a38808333f5f4a432ca3057a015f9162 Branch=main

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Nov 7, 2025

Mirror upstream elastic#137247 as single snapshot commit for AI review

11e1014

BASE=60406a6315bb9b1fc847e614175899a9161b2e82 HEAD=5aab46d5a38808333f5f4a432ca3057a015f9162 Branch=main

Mikep86 mentioned this pull request Nov 14, 2025

Allow QueryRewriteContext to perform async actions on remote clusters #138124

Merged

		@AwaitsFix(bugUrl = "https://fake.url")
		public void testKnnQuery() throws Exception {

POC - Semantic search CCS support with ccs_minimize_roundtrips=false #137247

Are you sure you want to change the base?

POC - Semantic search CCS support with ccs_minimize_roundtrips=false #137247

Conversation

Mikep86 commented Oct 28, 2025

Uh oh!

Mikep86 commented Oct 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mikep86 Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimczi Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mikep86 Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Mikep86 Nov 12, 2025 •

edited

Loading

jimczi Nov 3, 2025 •

edited

Loading

Mikep86 Nov 12, 2025 •

edited

Loading