Text similarity reranker chunks and scores snippets #133576

kderusso · 2025-08-26T18:16:31Z

This is a followup to #129369, but instead of using the highlighter to create snippets we chunk and score field content directly.

Allows users to configure the chunk sizes we use to generate snippets in the following ways:

Specifying chunking_settings in the request
Specifying a chunk_size in the request, which we use to construct a default chunking settings with that size
Not specifying, in which we construct a default chunking settings

Example API calls:

// Specifying chunking settings 
POST my-index/_search
{
  "track_total_hits": true,
  "fields": [
    "text"
  ],
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "What is the capital of the USA?"
            }
          }
        }
      },
      "field": "text",
      "chunk_rescorer": {
        "size": 3,
        "chunking_settings": {
          "strategy": "sentence",
          "max_chunk_size": 25,
          "sentence_overlap": 0
        }
      },
      "rank_window_size": 10,
      "inference_text": "What is the capital of the USA?"
    }
  }
}

// Only specifying chunk size, inferring sentence with 0 boundary 
POST my-index/_search
{
  "track_total_hits": true,
  "fields": [
    "text"
  ],
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "What is the capital of the USA?"
            }
          }
        }
      },
      "field": "text",
      "chunk_rescorer": {
        "size": 3,
        "chunking_settings": {
          "max_chunk_size": 25
        }
      },
      "rank_window_size": 10,
      "inference_text": "What is the capital of the USA?"
    }
  }
}

// Defaults
POST my-index/_search
{
  "track_total_hits": true,
  "fields": [
    "text"
  ],
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "What is the capital of the USA?"
            }
          }
        }
      },
      "field": "text",
      "chunk_rescorer": {},
      "rank_window_size": 10,
      "inference_text": "What is the capital of the USA?"
    }
  }
}

NOTE:

Specifying an inference ID to pull chunking settings is planned in a followup PR.

…s in text similarity reranker

…for syntactic sugar

elasticsearchmachine · 2025-08-27T18:59:57Z

Pinging @elastic/search-relevance (Team:Search - Relevance)

elasticsearchmachine · 2025-08-27T18:59:57Z

Hi @kderusso, I've created a changelog YAML for you.

…chunks

jimczi

It's a very elegant solution @kderusso , I like where this is going.
I made some comment about the API, since we adopt chunking explicitly I think we need to remove the confusion with snippets.
The API could look like this:

{
    "text_similarity_reranker": {
        "inference_id": "my_rerank_inference_id",
        "field": "semantic_text_field",
        "chunk_scorer": {
            "top": 2,
            "chunking_settings": {
                "strategy": "sentence",
                "max_chunk_size": 128, 
                "sentence_overlap": 0
            }
        }
    }
}

We can also say that:

{
    "text_similarity_reranker": {
        "inference_id": "",
        "field": "semantic_text_field",
        "chunk_scorer": {
            "top": 2,
            "chunking_settings": {
                "max_chunk_size": 128
            }
         }
     }
}

is valid with good default for all fields as long as chunk_scorer is provided.
So something like:

{
    "text_similarity_reranker": {
        "inference_id": "",
        "field": "semantic_text_field",
        "chunk_scorer": {
            "top": 2
         }
     }
}

is valid too.

...ck/plugin/core/src/main/java/org/elasticsearch/xpack/core/common/snippets/SnippetScorer.java

...rg/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankRetrieverBuilder.java

server/src/main/java/org/elasticsearch/TransportVersions.java

jimczi

Great work @kderusso , I left some nits about the API.
We should also remove the feature flag!

...on-rest/src/yamlRestTest/java/org/elasticsearch/test/rest/yaml/CcsCommonYamlTestSuiteIT.java

server/src/main/java/org/elasticsearch/TransportVersions.java

...in/core/src/main/java/org/elasticsearch/xpack/core/common/chunks/MemoryIndexChunkScorer.java

...re/src/test/java/org/elasticsearch/xpack/core/common/chunks/MemoryIndexChunkScorerTests.java

...rg/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankRetrieverBuilder.java

jimczi

LGTM

…chunks

kderusso · 2025-09-09T21:49:32Z

@elasticmachine update branch

…chunks

kderusso · 2025-09-10T15:29:28Z

@elasticmachine update branch

…chunks

kderusso · 2025-09-10T18:00:44Z

@elasticmachine run elasticsearch-ci/pr-upgrade-part-2

…chunks

This reverts commit d0813b2.

… the test failures then :table-flip:

…chunks

elastic/elasticsearch#133576 introduced the concept of a `chunk_rescorer` in the `text_similarity_reranker` retriever. This PR adds the chunk rescorer to Dev Tools AutoComplete. https://github.com/user-attachments/assets/9688a4ce-2cd5-40ad-ba8e-e8e69abed26e --------- Co-authored-by: kibanamachine <[email protected]>

Instead of generating snippets via highlighter, chunk and score chunk…

79b7e72

…s in text similarity reranker

elasticsearchmachine added the v9.2.0 label Aug 26, 2025

elasticsearchmachine and others added 3 commits August 26, 2025 18:23

[CI] Auto commit changes from spotless

9f28c08

Add customization based on preferred chunking settings or chunk size …

49d25a7

…for syntactic sugar

Cleanup

0036271

kderusso added >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels Aug 27, 2025

kderusso marked this pull request as ready for review August 27, 2025 18:59

kderusso requested a review from jimczi August 27, 2025 18:59

elasticsearchmachine added the Team:Search - Relevance The Search organization Search Relevance team label Aug 27, 2025

Update docs/changelog/133576.yaml

2df2f9d

kderusso requested a review from a team August 27, 2025 19:00

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

ad404db

…chunks

jimczi reviewed Aug 27, 2025

View reviewed changes

kderusso added 3 commits August 27, 2025 16:13

Refactor/Rename SnippetScorer to MemoryIndexChunkScorer

8b7f7f2

PR feedback on MemoryIndexChunkScorer

80f4434

Update API and code to rename snippets to chunks

9872258

kderusso commented Aug 28, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/TransportVersions.java Outdated Show resolved Hide resolved

Missed some snippet renames

8c4ab1e

kderusso requested a review from jimczi August 28, 2025 13:08

Handle case where no matches were found to score chunks

fc706a8

jimczi reviewed Sep 4, 2025

View reviewed changes

kderusso added 3 commits September 8, 2025 10:14

PR feedback on MemoryIndexChunkScorer, add tests

246dfa2

Rename num_chunks to size

6355282

Merge from main

d03a0f1

jimczi approved these changes Sep 8, 2025

View reviewed changes

elasticsearchmachine and others added 2 commits September 8, 2025 14:47

[CI] Auto commit changes from spotless

ed13074

Fix error in merge

6a06b84

kderusso added 3 commits September 9, 2025 13:31

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

92a060f

…chunks

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

a172b6c

…chunks

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

a6c2364

…chunks

elasticmachine and others added 4 commits September 9, 2025 17:49

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

b3f95f9

…chunks

Add feature flag to InferenceUpgradeTestCase

e55ccfe

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

68ea8cb

…chunks

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

68af14f

…chunks

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

37eca54

…chunks

kderusso and others added 13 commits September 10, 2025 14:01

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

b66a58e

…chunks

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

7a4ccff

…chunks

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

ca597fa

…chunks

Yolo see if this fixes the test

ef024cf

Real fix for upgrade IT

c208845

[CI] Auto commit changes from spotless

cc1e913

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

2651fbd

…chunks

Another ignore

d0813b2

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

d0c2139

…chunks

Revert "Another ignore"

187106f

This reverts commit d0813b2.

let's try reverting the renamed feature flag. If this is the cause of…

af95d57

… the test failures then :table-flip:

Merge branch 'main' into kderusso/text-similarity-reranking-now-with-…

6f8b5fe

…chunks

Remove ignored test

d0dd688

kderusso merged commit 436ec11 into elastic:main Sep 11, 2025
34 checks passed

This was referenced Sep 25, 2025

Add chunk rescoring in text_similarity_reranker to the specification elastic/elasticsearch-specification#5343

Merged

Add chunk rescorer to Kibana Autocomplete elastic/kibana#237180

Merged

kderusso mentioned this pull request Oct 10, 2025

Add docs for chunk_rescorer in text_similarity_reranker #136428

Merged

Uh oh!

Text similarity reranker chunks and scores snippets #133576

Text similarity reranker chunks and scores snippets #133576

Uh oh!

Conversation

kderusso commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Aug 27, 2025

Uh oh!

elasticsearchmachine commented Aug 27, 2025

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

kderusso commented Sep 9, 2025

Uh oh!

kderusso commented Sep 10, 2025

Uh oh!

kderusso commented Sep 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kderusso commented Aug 26, 2025 •

edited

Loading