Skip to content

Commit 9f48b4a

Browse files
kderussoleemthompo
andauthored
Add docs for chunk_rescorer in text_similarity_reranker (#136428)
* Add docs for chunk_rescorer * Updates to docs based on preview * Update docs/reference/elasticsearch/rest-apis/retrievers/text-similarity-reranker-retriever.md Co-authored-by: Liam Thompson <[email protected]> * Update docs/reference/elasticsearch/rest-apis/retrievers/text-similarity-reranker-retriever.md Co-authored-by: Liam Thompson <[email protected]> * Update docs/reference/elasticsearch/rest-apis/retrievers/text-similarity-reranker-retriever.md Co-authored-by: Liam Thompson <[email protected]> --------- Co-authored-by: Liam Thompson <[email protected]>
1 parent 0a7d113 commit 9f48b4a

File tree

2 files changed

+32
-4
lines changed

2 files changed

+32
-4
lines changed

docs/reference/elasticsearch/rest-apis/retrievers/retrievers-examples.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -440,7 +440,7 @@ GET /retrievers_example/_search
440440
"query": "artificial intelligence"
441441
}
442442
}
443-
}
443+
}
444444
```
445445

446446
This returns the following response based on the final rrf score for each result.
@@ -497,7 +497,7 @@ GET /retrievers_example/_search
497497
"fields": ["text", "text_semantic"]
498498
}
499499
}
500-
}
500+
}
501501
```
502502

503503
::::{note}
@@ -570,7 +570,7 @@ GET /retrievers_example/_search
570570
"normalizer": "minmax"
571571
}
572572
}
573-
}
573+
}
574574
```
575575

576576
This returns the following response based on the normalized score for each result:
@@ -1503,6 +1503,7 @@ PUT _inference/rerank/my-rerank-model
15031503
```
15041504

15051505
Let’s start by reranking the results of the `rrf` retriever in our previous example.
1506+
We'll also apply a `chunk_rescorer` to ensure that we only consider the best scoring chunks when sending information to the reranker.
15061507

15071508
```console
15081509
GET retrievers_example/_search
@@ -1541,7 +1542,15 @@ GET retrievers_example/_search
15411542
},
15421543
"field": "text",
15431544
"inference_id": "my-rerank-model",
1544-
"inference_text": "What are the state of the art applications of AI in information retrieval?"
1545+
"inference_text": "What are the state of the art applications of AI in information retrieval?",
1546+
"chunk_rescorer": {
1547+
"size": 1,
1548+
"chunking_settings": {
1549+
"strategy": "sentence",
1550+
"max_chunk_size": 300,
1551+
"sentence_overlap": 0
1552+
}
1553+
},
15451554
}
15461555
},
15471556
"_source": false

docs/reference/elasticsearch/rest-apis/retrievers/text-similarity-reranker-retriever.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,25 @@ score = ln(score), if score < 0
8686

8787
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to the child `retriever`. If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.
8888

89+
`chunk_rescorer` {applies_to}`stack: beta 9.2`
90+
: (Optional, `object`)
91+
92+
Chunks and scores documents based on configured chunking settings, and only sends the best scoring chunks to the reranking model as input. This helps improve relevance when reranking long documents that would otherwise be truncated by the reranking model's token limit.
93+
94+
Parameters for `chunk_rescorer`:
95+
96+
`size`
97+
: (Optional, `int`)
98+
99+
The number of chunks to pass to the reranker. Defaults to `1`.
100+
101+
`chunking_settings`
102+
: (Optional, `object`)
103+
104+
Settings for chunking text into smaller passages for scoring and reranking. Defaults to the optimal chunking settings for [Elastic Rerank](docs-content:///explore-analyze/machine-learning/nlp/ml-nlp-rerank.md). Refer to the [Inference API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put#operation-inference-put-body-application-json-chunking_settings) for valid values for `chunking_settings`.
105+
:::{warning}
106+
If you configure chunks larger than the reranker's token limit, the results may be truncated. This can degrade relevance significantly.
107+
:::
89108

90109

91110
## Example: Elastic Rerank [text-similarity-reranker-retriever-example-elastic-rerank]

0 commit comments

Comments
 (0)