Skip to content

Conversation

@Mikep86
Copy link
Contributor

@Mikep86 Mikep86 commented Jun 10, 2025

Adds a simplified syntax for the linear retriever:

GET my-index/_search
{
  "retriever": {
    "linear": {
      "fields": ["field_1", "field_2^2"],
      "query": "my awesome query",
      "normalizer": "minmax"
    }
  }
}

fields is optional. If it is not provided, we query the fields defined by the index.query.default_field index setting (which is * by default).

This syntax automatically handles querying a mix of lexical fields (i.e. fields that support lexical search via match) and semantic_text fields. The fields are divided into lexical and semantic groups to create a 50/50 weight distribution between the two in the final score. This is achieved by creating a retriever tree that looks like:

linear
   multi_match on lexical fields
   linear
     match on semantic_text field A
     match on semantic_text field B
     match on semantic_text field C

The end result is a score that ranges between 0-2, with up to 1 coming from the lexical matches and up to 1 coming from the semantic matches.

Common logic for generating the retriever tree is in SimplifiedInnerRetrieverUtils, which will also be used by the simplified rrf retriever (see #128633 for a preview of that).

@Mikep86 Mikep86 requested review from jimczi, kderusso and pmpailis June 10, 2025 13:42
@Mikep86 Mikep86 added >enhancement auto-backport Automatically create backport pull requests when merged :SearchOrg/Relevance Label for the Search (solution/org) Relevance team :Search Relevance/Search Catch all for Search Relevance v8.19.0 v9.1.0 labels Jun 10, 2025
@elasticsearchmachine elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch Team:Search - Relevance The Search organization Search Relevance team labels Jun 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @Mikep86, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

RetrieverBuilder rewritten = this;

ResolvedIndices resolvedIndices = ctx.getResolvedIndices();
if (resolvedIndices != null && query != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still add a global prefilter on the top-level retriever, right? Should we also account for it when generating the expansions or is this something not to be supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work as is since rewriting with pre-filters happens after this rewrite logic. I can add a test to confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See b891166 and fc57b61.

A couple of things to follow up on here:

  • There are a whole class of bugs in the retriever framework right now caused by incomplete copies during rewrite. We should adopt a copy constructor approach (similar to how queries handle copy generation) to address all of these issues. This is outside the scope of this PR though, so I did the quick fix for now.
  • I would have loved to add a unit test that has more access to the retriever structure to verify that the filters are being propagated as pre-filters, but that is difficult to do right now with the current rewrite logic. We rewrite all the way to a RankDocsRetrieverBuilder, with no stopping cue for when all the pre-search rewrite activities (such as pre-filter propagation) are complete. It would help a lot with testability if we could refactor the rewrite process to add such a stopping cue. In the meantime, I added a YAML test to cover this, but that doesn't give us visibility into whether the filters are applied as actual pre-filters.

@pmpailis
Copy link
Contributor

Just finished a first pass, seems really nice :) Will we also include doc changes & additional examples in retriever_examples to showcase this?

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor feedback

@Mikep86
Copy link
Contributor Author

Mikep86 commented Jun 10, 2025

@pmpailis

Will we also include doc changes & additional examples in retriever_examples to showcase this?

Yes, all that will come in a follow-up docs-focused PR :)

Copy link
Contributor

@Samiul-TheSoccerFan Samiul-TheSoccerFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work 👏

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach, @Mikep86, it’s less invasive than I expected! 😉
I do have some concerns about the growing number of parameters in the linear retriever that aren't compatible with each other, but that's an inherent trade-off with the decision to add this functionality there, so I’m comfortable with it.

- match: { hits.hits.1._id: "2" }
- lte: { hits.hits.1._score: 2.0 }
- match: { hits.hits.2._id: "1" }
- lte: { hits.hits.2._score: 2.0 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test with a filter? We need to make sure that the filter is propagated correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, working on a unit test to verify pre-filter propagation now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@pmpailis pmpailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @Mikep86 ❤️ Only minor comment is the prefilter tests addition, but other than that it looks really nice.

@Mikep86 Mikep86 requested a review from jimczi June 12, 2025 20:56
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,5 @@
pr: 129200
summary: Simplified Linear Retriever
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be updated? Probably something like Add simplified syntax and hybrid support to linear retriever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description is still succinct and accurate, good as is

@Mikep86 Mikep86 merged commit fc77640 into elastic:main Jun 16, 2025
18 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 129200

@Mikep86
Copy link
Contributor Author

Mikep86 commented Jun 17, 2025

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

Mikep86 added a commit to Mikep86/elasticsearch that referenced this pull request Jun 17, 2025
(cherry picked from commit fc77640)

# Conflicts:
#	x-pack/plugin/inference/build.gradle
#	x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java
#	x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRankBuilder.java
elasticsearchmachine pushed a commit that referenced this pull request Jun 17, 2025
(cherry picked from commit fc77640)

# Conflicts:
#	x-pack/plugin/inference/build.gradle
#	x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java
#	x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRankBuilder.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :Search Relevance/Search Catch all for Search Relevance :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch Team:SearchOrg Meta label for the Search Org (Enterprise Search) v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants