-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enable Semantic Search CCS When ccs_minimize_roundtrips=true #135309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments from a first pass before looking into the integration test.
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/xpack/inference/queries/InterceptedInferenceMatchQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../elasticsearch/xpack/inference/queries/AbstractInterceptedInferenceQueryBuilderTestCase.java
Outdated
Show resolved
Hide resolved
kderusso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass on this and my feedback is very minor, often nitpicky 😄
Overall it looks really good - nice changes that build on top of the previous PRs and thorough testing.
As we discussed, there will be additional followup work to test this "in practice" once it's in snapshot.
I don't have blocking feedback so I'll approve.
...rence/src/test/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilderTests.java
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/inference/queries/InterceptedInferenceQueryBuilder.java
Show resolved
Hide resolved
...ava/org/elasticsearch/xpack/inference/queries/InterceptedInferenceKnnVectorQueryBuilder.java
Outdated
Show resolved
Hide resolved
|
|
||
| // TODO: Check for supported CCS mode here (once we support CCS) | ||
| if (resolvedIndices.getRemoteClusterIndices().isEmpty() == false) { | ||
| boolean ccsRequest = this.ccsRequest || resolvedIndices.getRemoteClusterIndices().isEmpty() == false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ccsRequest set to false but at least one remote index specified a valid state to be in? I feel like this should be a 400 error that happens early during validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because ccsRequest is set to false upon initial query construction in most code paths. The important thing is that once ccsRequest is true, it remains set to that, hence the logical OR here. Transitioning from false to true (when resolvedIndices.getRemoteClusterIndices().isEmpty() == false) during rewrite is fine.
| + " query does not support cross-cluster search when querying a [" | ||
| + SemanticTextFieldMapper.CONTENT_TYPE | ||
| + "] field" | ||
| + "] field when [ccs_minimize_roundtrips] is false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be friendlier to change the wording of this, to something more along the lines of Set ccs_minimize_roundtrips to true to query semantic_text fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be misleading in its own right. There are code paths that trigger CCS that don't allow setting ccs_minimize_roundtrips. The big one is ES|QL, which always performs CCS as if ccs_minimize_roundtrips=false. Returning an error with a suggestion that doesn't apply to all scenarios would probably cause more confusion.
Of course, that also means that the current error message could be confusing in such scenarios as well. I just couldn't think of a better way to state the problem both generically (to cover ES|QL) and specifically (when the user can change ccs_minimize_roundtrips). Thoughts?
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../elasticsearch/xpack/inference/queries/AbstractInterceptedInferenceQueryBuilderTestCase.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either breaking this up, and/or (better yet) consolidating some of the code as there's a lot of boilerplate here would be helpful. Really at the end of the day, a semantic, match and sparse vector search against a sparse embedding model should return the same results, ditto for semantic, match and knn over a text embedding model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly reminder to update rest compatibility so this test isn't executed.
| import static org.hamcrest.Matchers.equalTo; | ||
| import static org.hamcrest.Matchers.is; | ||
|
|
||
| public abstract class AbstractSemanticCrossClusterSearchTestCase extends AbstractMultiClustersTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just about to ask about yaml tests - but since we have these I don't think we need other integration tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. These integration tests cover every case I think we need for CCS right now.
| @@ -1 +1 @@ | |||
| initial_elasticsearch_8_18_8,8840010 | |||
| transform_check_for_dangling_tasks,8840011 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird that CI pushed this changes - probably merging main here will make these disappear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still figuring out the new "simplified" transport versions. There are some rough edges :)
| io.netty.common: | ||
| - outbound_network | ||
| - manage_threads | ||
| - inbound_network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't see these files changed often - just for me to understand - why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need these permissions so that the two clusters in the CCS integration tests can communicate. We didn't have CCS integration tests in the inference plugin before this, hence the change.
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I think it's fine to merge this and clean up tests as a follow up.
… inference results map
Enables CCS when
ccs_minimize_roundtrips=truefor thesemanticquery and for interceptedmatch,knn, andsparse_vectorqueries that targetsemantic_textfields.Best effort detection for unsupported
ccs_minimize_roundtrips=falsecases has been implemented as well. However, one case in particular is impossible to detect right now:match,knn, orsparse_vector)semantic_textfield targeted is on a remote clusterIn this case, the query is not intercepted on the local cluster and the original query is sent to the remote cluster data nodes, resulting in an error about the intercepted query not supporting
semantic_textfields. Tests have been added to validate this behavior.