Skip to content

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Dec 12, 2024

This pull request introduces a new retriever called rescorer, which leverages the rescore functionality of the search request.
The rescorer retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities.

All rescorers supported in the rescore section of a search request are available in this retriever, and the same format is used to define the rescore configuration.

Example:
  - do:
      search:
        index: test
        body:
          retriever:
            rescorer:
              rescore:
                window_size: 10
                query:
                  rescore_query:
                    rank_feature:
                      field: "features.second_stage"
                      linear: { }
                  query_weight: 0
              retriever:
                standard:
                  query:
                    rank_feature:
                      field: "features.first_stage"
                      linear: { }
          size: 2

Key Changes

  1. Rescore Phase Adaptation:
    The original rescore phase was modified to support tie-breaking on the _shard_doc field. This ensures consistent sorting across all rounds of rescoring.
  2. CompoundRetrieverBuilder Integration:
    The implementation uses the CompoundRetrieverBuilder, ensuring the rescorer retriever can seamlessly integrate into any position within the retriever tree.

Commit Structure

  • Commit 1: Adapts the rescore phase to handle _shard_doc as a tiebreaker.
  • Commit 2: Implements the rescorer retriever.

To facilitate review, I split the changes into two commits. If preferred, I can open separate pull requests for each commit to simplify the review process. However, I opted to include all changes in this PR to provide a complete overview.

Closes #118327

jimczi and others added 28 commits November 21, 2024 20:46
This commit introduces support for using the `_shard_doc` field as a sort tiebreaker during query rescoring.
This change is a prerequisite to add support for rescorers in retriever workflows.
This change adds a new `rescorer` retriever that re-scores only the top documents returned by its child retriever.
@jimczi jimczi added >feature :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Dec 12, 2024
@jimczi jimczi requested a review from a team as a code owner December 18, 2024 13:39
@jimczi jimczi merged commit 6f26106 into elastic:main Dec 18, 2024
16 checks passed
@jimczi jimczi deleted the rescorer_retriever branch December 18, 2024 19:47
@benwtrent
Copy link
Member

Thank you for tackling this @jimczi ! I didn't fully review, but it looks nice!

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 18, 2024
…ore functionality (elastic#118585)

This pull request introduces a new retriever called `rescorer`, which leverages the `rescore` functionality of the search request.  
The `rescorer` retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities.  

All rescorers supported in the `rescore` section of a search request are available in this retriever, and the same format is used to define the rescore configuration.  

<details>
<summary>Example:</summary>

```yaml
  - do:
      search:
        index: test
        body:
          retriever:
            rescorer:
              rescore:
                window_size: 10
                query:
                  rescore_query:
                    rank_feature:
                      field: "features.second_stage"
                      linear: { }
                  query_weight: 0
              retriever:
                standard:
                  query:
                    rank_feature:
                      field: "features.first_stage"
                      linear: { }
          size: 2
```

</details>

Closes elastic#118327

Co-authored-by: Liam Thompson <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Dec 19, 2024
…s rescore functionality (#119023)

* Add a generic `rescorer` retriever based on the search request's rescore functionality   (#118585)

This pull request introduces a new retriever called `rescorer`, which leverages the `rescore` functionality of the search request.  
The `rescorer` retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities.  

All rescorers supported in the `rescore` section of a search request are available in this retriever, and the same format is used to define the rescore configuration.  

<details>
<summary>Example:</summary>

```yaml
  - do:
      search:
        index: test
        body:
          retriever:
            rescorer:
              rescore:
                window_size: 10
                query:
                  rescore_query:
                    rank_feature:
                      field: "features.second_stage"
                      linear: { }
                  query_weight: 0
              retriever:
                standard:
                  query:
                    rank_feature:
                      field: "features.first_stage"
                      linear: { }
          size: 2
```

</details>

Closes #118327

Co-authored-by: Liam Thompson <[email protected]>

* replace java21 only method

* fix compil

---------

Co-authored-by: Liam Thompson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport pending >feature :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request to use rescore with retriever for ES Query DSL

6 participants