Skip to content

Conversation

@Mikep86
Copy link
Contributor

@Mikep86 Mikep86 commented Sep 23, 2025

Enables CCS when ccs_minimize_roundtrips=true for the semantic query and for intercepted match, knn, and sparse_vector queries that target semantic_text fields.

Best effort detection for unsupported ccs_minimize_roundtrips=false cases has been implemented as well. However, one case in particular is impossible to detect right now:

  • Intercepted query usage (i.e. match, knn, or sparse_vector)
  • The only semantic_text field targeted is on a remote cluster

In this case, the query is not intercepted on the local cluster and the original query is sent to the remote cluster data nodes, resulting in an error about the intercepted query not supporting semantic_text fields. Tests have been added to validate this behavior.

@Mikep86 Mikep86 added >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels Sep 23, 2025
Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments from a first pass before looking into the integration test.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass on this and my feedback is very minor, often nitpicky 😄

Overall it looks really good - nice changes that build on top of the previous PRs and thorough testing.

As we discussed, there will be additional followup work to test this "in practice" once it's in snapshot.

I don't have blocking feedback so I'll approve.


// TODO: Check for supported CCS mode here (once we support CCS)
if (resolvedIndices.getRemoteClusterIndices().isEmpty() == false) {
boolean ccsRequest = this.ccsRequest || resolvedIndices.getRemoteClusterIndices().isEmpty() == false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ccsRequest set to false but at least one remote index specified a valid state to be in? I feel like this should be a 400 error that happens early during validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because ccsRequest is set to false upon initial query construction in most code paths. The important thing is that once ccsRequest is true, it remains set to that, hence the logical OR here. Transitioning from false to true (when resolvedIndices.getRemoteClusterIndices().isEmpty() == false) during rewrite is fine.

+ " query does not support cross-cluster search when querying a ["
+ SemanticTextFieldMapper.CONTENT_TYPE
+ "] field"
+ "] field when [ccs_minimize_roundtrips] is false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be friendlier to change the wording of this, to something more along the lines of Set ccs_minimize_roundtrips to true to query semantic_text fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be misleading in its own right. There are code paths that trigger CCS that don't allow setting ccs_minimize_roundtrips. The big one is ES|QL, which always performs CCS as if ccs_minimize_roundtrips=false. Returning an error with a suggestion that doesn't apply to all scenarios would probably cause more confusion.

Of course, that also means that the current error message could be confusing in such scenarios as well. I just couldn't think of a better way to state the problem both generically (to cover ES|QL) and specifically (when the user can change ccs_minimize_roundtrips). Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either breaking this up, and/or (better yet) consolidating some of the code as there's a lot of boilerplate here would be helpful. Really at the end of the day, a semantic, match and sparse vector search against a sparse embedding model should return the same results, ditto for semantic, match and knn over a text embedding model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly reminder to update rest compatibility so this test isn't executed.

import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.is;

public abstract class AbstractSemanticCrossClusterSearchTestCase extends AbstractMultiClustersTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just about to ask about yaml tests - but since we have these I don't think we need other integration tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. These integration tests cover every case I think we need for CCS right now.

@@ -1 +1 @@
initial_elasticsearch_8_18_8,8840010
transform_check_for_dangling_tasks,8840011
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that CI pushed this changes - probably merging main here will make these disappear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still figuring out the new "simplified" transport versions. There are some rough edges :)

io.netty.common:
- outbound_network
- manage_threads
- inbound_network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't see these files changed often - just for me to understand - why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need these permissions so that the two clusters in the CCS integration tests can communicate. We didn't have CCS integration tests in the inference plugin before this, hence the change.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I think it's fine to merge this and clean up tests as a follow up.

@Mikep86 Mikep86 merged commit 7613137 into elastic:main Sep 25, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants