Skip to content

Conversation

@henriquepaes1
Copy link
Contributor

This PR closes #116046 by adding a type verification in a text_expansion query. As @kderusso detailed, field type checking happens only when a text_expansion query are rewritten as a weighted_tokens query due to token pruning. The code is refactored such that field type checking occurs regardless of token pruning and text_expansion queries occur only against allowed types (sparse_vector and rank features).

@elasticsearchmachine elasticsearchmachine added v9.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 6, 2024
@henriquepaes1
Copy link
Contributor Author

henriquepaes1 commented Nov 6, 2024

Hey guys! I'm pretty stuck with the field type checking. I was reading through other queries' code (such as WeightedTokenQuery that @kderusso mentioned) and noticed that field type checking happens through the query context.
I'm trying a similar approach with TextExpansionQuery but I noticed that in the class entry point (doRewrite method), the context is received as a QueryRewriteContext, which has limited capabilities. Until this moment, I have tried the following:

  1. Query the field type using rewrite context: unsuccessful because most mapping information is null in the RewriteContext
  2. Get the SearchContext with the convertToSearchExecutionContext method: the implementation returns null

I'm thinking about going up in the call stack and making this verification in a place with a more meaningful context. I don't know if that's the right approach since it will probably lead to code outside the TextExpansionQuery class, and from what I studied, boolean queries are flexible and this kind of verification would definitely break everything. Can someone enlighten me?

@kderusso
Copy link
Member

kderusso commented Nov 6, 2024

In general, I think the approach here should be to rewrite the query to a WeightedTokensQueryBuilder as often as possible during the rewrite phase. Right now the criteria is if pruning is configured, but are there any additional checks we can do? I'm sorry that I don't have the time to dig in today to look into more concrete suggestions, but that's where I'd start.

@henriquepaes1
Copy link
Contributor Author

henriquepaes1 commented Nov 7, 2024

Right now the criteria is if pruning is configured, but are there any additional checks we can do?

I read through your article and it gave me some ideas. You mention that "inference results that would be sent as input into a text expansion search". Since we're trying to rewrite most of the work as a WeightedTokens query, would it be enough to verify that the text expansion results produced weighted tokens?

@kderusso
Copy link
Member

kderusso commented Nov 7, 2024

I read through your article and it gave me some ideas.
You mention that "inference results that would be sent as input into a text expansion search". Since we're trying to rewrite most of the work as a WeightedTokens query, would it be enough to verify that the text expansion results produced weighted tokens?

So you're thinking of making sure that a weightedTokensSupplier exists? That's a good instinct, but it turns out that that supplier is populated for each call, with the inference results, here.

I had started down the path of seeing if we could omit the boolean query, but that didn't seem to work: #116047 I haven't looked into it further to see what the best solution would be.

You could validate and experiment with solutions though, and add some tests to validate whether they work.

First, you could add a test case in text_expansion_search.yml that does the following:

---
"Test text-expansion that displays error for invalid queried field type":
  - do:
      catch: /\[keyword\] is not an appropriate field type for this query/
      search:
        index: index-with-rank-features
        body:
          query:
            text_expansion:
              source_text:
                model_id: text_expansion_model
                model_text: "octopus comforter smells"

This would show this behavior of what happens without the pruning configurations. (There is more information about our YAML REST tests here).

You can then see if other tests break with the following command line call:

./gradlew :x-pack:plugin:ml:check

(This is a more specific version of the ./gradlew :check function we ask contributors to run before submitting PRs). If specific tests fail, they'll be output with specific commands to reproduce them, so that you can test just the one failure and not wait for the entire test suite to fail.

Hope that helps!

@henriquepaes1
Copy link
Contributor Author

Thanks for the material! It was really helpful to run the test suite.

So you're thinking of making sure that a weightedTokensSupplier exists? That's a good instinct, but it turns out that that supplier is populated for each call, with the inference results, here.

Actually, since I noticed that the supplier is populated for each call regardless of the query type, I'm checking if there are any weightedTokens inside of the TextExpansionResults class during this call. So it would be something like

if(tokenPruning != null || textExpansionResultsHasWeightedTokens) {
    ...
}

How does that sound?
I ran the test suite and didn't notice any related breaks.

@henriquepaes1 henriquepaes1 marked this pull request as ready for review November 13, 2024 19:19
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 13, 2024
@gareth-ellis gareth-ellis added :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels Nov 18, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@gbanasiak gbanasiak removed the needs:triage Requires assignment of a team area label label Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

text_expansion queries return results for specified text fields

5 participants