Skip to content

Conversation

pmpailis
Copy link
Contributor

@pmpailis pmpailis commented Jan 15, 2025

This PR adds a new linear retriever to facilitate hybrid search, that would be able to linearly combine the results of other sub-retrievers and compute the final score of a document based on the weighted sum of each sub-components.

Each sub-component can specify the following elements:

  • retriever -> specifies how we will compute the top documents
  • normalizer -> specifies how we want to normalize the top documents for this retriever (so that we can ensure that all scores fall within the same range)
  • weight -> the weight for the normalized score if the final weighted sum computation

Pagination is similar to that of rrf's retriever, i.e. we compute the global rank_window_size docs and pagination is only available within these bounds.

So, working through an example, let's say that we perform a hybrid search query where:

  • we want to run a simple string query through a standard retriever, and normalize the scores to a [0, 1] range
  • we want to run knn search through the knn retriever, without normalizing the documents as well
  • compute the final score for the retriever as score = 1.5 * standard + 2.5 * knn

Sample syntax:

GET /retrievers_example/_search
{
    "retriever": {
        "linear": {
            "retrievers": [
                {
                        "retriever": {
                            "standard": {
                                "query": {
                                    "simple_query_string": {
                                        "query": "artifical intelligence in medicine",
                                        "fields": [
                                            "text"
                                        ]
                                    }
                                }
                            }
                        },
                        "weight": 1.5,
                        "normalizer": "minmax"
                },
                {
                        "retriever": {
                            "knn": {
                                "field": "vector",
                                "query_vector": [
                                    0.23,
                                    0.67,
                                    0.89
                                ],
                                "k": 3,
                                "num_candidates": 5
                            }
                        },
                        "weight": 2.5
                }
            ],
            "rank_window_size": 10
        }
    }
}

Copy link
Contributor

Documentation preview:

@pmpailis pmpailis added >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. :Search Relevance/Search Catch all for Search Relevance v8.18.0 labels Jan 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @pmpailis, I've created a changelog YAML for you.

@pmpailis pmpailis added the auto-backport Automatically create backport pull requests when merged label Jan 16, 2025
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better. I have a concern around testing:

Do we have a test that specifically exercises the path when the different retrievers return different doc IDs? (e.g. they match non-overlapping doc sets).

@pmpailis
Copy link
Contributor Author

Do we have a test that specifically exercises the path when the different retrievers return different doc IDs? (e.g. they match non-overlapping doc sets).

Added a test to account for this in ea1787f

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: :godmode:

@pmpailis pmpailis merged commit 375814d into elastic:main Jan 28, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 120222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants