Apply field type checking in text_expansion queries #116335

henriquepaes1 · 2024-11-06T15:26:28Z

This PR closes #116046 by adding a type verification in a text_expansion query. As @kderusso detailed, field type checking happens only when a text_expansion query are rewritten as a weighted_tokens query due to token pruning. The code is refactored such that field type checking occurs regardless of token pruning and text_expansion queries occur only against allowed types (sparse_vector and rank features).

henriquepaes1 · 2024-11-06T15:40:41Z

Hey guys! I'm pretty stuck with the field type checking. I was reading through other queries' code (such as WeightedTokenQuery that @kderusso mentioned) and noticed that field type checking happens through the query context.
I'm trying a similar approach with TextExpansionQuery but I noticed that in the class entry point (doRewrite method), the context is received as a QueryRewriteContext, which has limited capabilities. Until this moment, I have tried the following:

Query the field type using rewrite context: unsuccessful because most mapping information is null in the RewriteContext
Get the SearchContext with the convertToSearchExecutionContext method: the implementation returns null

I'm thinking about going up in the call stack and making this verification in a place with a more meaningful context. I don't know if that's the right approach since it will probably lead to code outside the TextExpansionQuery class, and from what I studied, boolean queries are flexible and this kind of verification would definitely break everything. Can someone enlighten me?

kderusso · 2024-11-06T20:56:39Z

In general, I think the approach here should be to rewrite the query to a WeightedTokensQueryBuilder as often as possible during the rewrite phase. Right now the criteria is if pruning is configured, but are there any additional checks we can do? I'm sorry that I don't have the time to dig in today to look into more concrete suggestions, but that's where I'd start.

henriquepaes1 · 2024-11-07T16:21:33Z

Right now the criteria is if pruning is configured, but are there any additional checks we can do?

I read through your article and it gave me some ideas. You mention that "inference results that would be sent as input into a text expansion search". Since we're trying to rewrite most of the work as a WeightedTokens query, would it be enough to verify that the text expansion results produced weighted tokens?

kderusso · 2024-11-07T20:26:30Z

I read through your article and it gave me some ideas.
You mention that "inference results that would be sent as input into a text expansion search". Since we're trying to rewrite most of the work as a WeightedTokens query, would it be enough to verify that the text expansion results produced weighted tokens?

So you're thinking of making sure that a weightedTokensSupplier exists? That's a good instinct, but it turns out that that supplier is populated for each call, with the inference results, here.

I had started down the path of seeing if we could omit the boolean query, but that didn't seem to work: #116047 I haven't looked into it further to see what the best solution would be.

You could validate and experiment with solutions though, and add some tests to validate whether they work.

First, you could add a test case in text_expansion_search.yml that does the following:

---
"Test text-expansion that displays error for invalid queried field type":
  - do:
      catch: /\[keyword\] is not an appropriate field type for this query/
      search:
        index: index-with-rank-features
        body:
          query:
            text_expansion:
              source_text:
                model_id: text_expansion_model
                model_text: "octopus comforter smells"

This would show this behavior of what happens without the pruning configurations. (There is more information about our YAML REST tests here).

You can then see if other tests break with the following command line call:

./gradlew :x-pack:plugin:ml:check

(This is a more specific version of the ./gradlew :check function we ask contributors to run before submitting PRs). If specific tests fail, they'll be output with specific commands to reproduce them, so that you can test just the one failure and not wait for the entire test suite to fail.

Hope that helps!

henriquepaes1 · 2024-11-07T23:35:17Z

Thanks for the material! It was really helpful to run the test suite.

So you're thinking of making sure that a weightedTokensSupplier exists? That's a good instinct, but it turns out that that supplier is populated for each call, with the inference results, here.

Actually, since I noticed that the supplier is populated for each call regardless of the query type, I'm checking if there are any weightedTokens inside of the TextExpansionResults class during this call. So it would be something like

if(tokenPruning != null || textExpansionResultsHasWeightedTokens) {
    ...
}

How does that sound?
I ran the test suite and didn't notice any related breaks.

elasticsearchmachine · 2024-11-18T09:22:46Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2024-11-18T09:22:46Z

Pinging @elastic/search-eng (Team:SearchOrg)

elasticsearchmachine · 2024-11-18T09:22:47Z

Pinging @elastic/search-relevance (Team:Search - Relevance)

spotting where type check should happen

aea3034

elasticsearchmachine added v9.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 6, 2024

henriquepaes1 added 2 commits November 7, 2024 11:31

using weighted tokens as criteria for rewriting text expansion query

9df2a86

modify test to ensure that query is rewritten to a supported type

7c0a3e7

henriquepaes1 marked this pull request as ready for review November 13, 2024 19:19

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 13, 2024

add a test without pruning to yaml

7d3e655

gareth-ellis added :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels Nov 18, 2024

gbanasiak removed the needs:triage Requires assignment of a team area label label Nov 22, 2024

Merge branch 'main' into 116046-vector-type-check-infer-query

0a3f58b

elasticsearchmachine added v9.1.0 and removed v9.0.0 labels Jan 30, 2025

henriquepaes1 closed this Jun 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apply field type checking in text_expansion queries #116335

Apply field type checking in text_expansion queries #116335

Uh oh!

henriquepaes1 commented Nov 6, 2024

Uh oh!

henriquepaes1 commented Nov 6, 2024 •

edited

Loading

Uh oh!

kderusso commented Nov 6, 2024

Uh oh!

henriquepaes1 commented Nov 7, 2024 •

edited

Loading

Uh oh!

kderusso commented Nov 7, 2024

Uh oh!

henriquepaes1 commented Nov 7, 2024

Uh oh!

elasticsearchmachine commented Nov 18, 2024

Uh oh!

elasticsearchmachine commented Nov 18, 2024

Uh oh!

elasticsearchmachine commented Nov 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Apply field type checking in text_expansion queries #116335

Apply field type checking in text_expansion queries #116335

Uh oh!

Conversation

henriquepaes1 commented Nov 6, 2024

Uh oh!

henriquepaes1 commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kderusso commented Nov 6, 2024

Uh oh!

henriquepaes1 commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kderusso commented Nov 7, 2024

Uh oh!

henriquepaes1 commented Nov 7, 2024

Uh oh!

elasticsearchmachine commented Nov 18, 2024

Uh oh!

elasticsearchmachine commented Nov 18, 2024

Uh oh!

elasticsearchmachine commented Nov 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

henriquepaes1 commented Nov 6, 2024 •

edited

Loading

henriquepaes1 commented Nov 7, 2024 •

edited

Loading