Skip to content

Commit b589cce

Browse files
jimczileemthompo
andcommitted
Add a generic rescorer retriever based on the search request's rescore functionality (#118585)
This pull request introduces a new retriever called `rescorer`, which leverages the `rescore` functionality of the search request. The `rescorer` retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities. All rescorers supported in the `rescore` section of a search request are available in this retriever, and the same format is used to define the rescore configuration. <details> <summary>Example:</summary> ```yaml - do: search: index: test body: retriever: rescorer: rescore: window_size: 10 query: rescore_query: rank_feature: field: "features.second_stage" linear: { } query_weight: 0 retriever: standard: query: rank_feature: field: "features.first_stage" linear: { } size: 2 ``` </details> Closes #118327 Co-authored-by: Liam Thompson <[email protected]>
1 parent 3d1f8d2 commit b589cce

File tree

24 files changed

+1180
-71
lines changed

24 files changed

+1180
-71
lines changed

docs/changelog/118585.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pr: 118585
2+
summary: Add a generic `rescorer` retriever based on the search request's rescore
3+
functionality
4+
area: Ranking
5+
type: feature
6+
issues:
7+
- 118327

docs/reference/search/retriever.asciidoc

Lines changed: 120 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ A <<standard-retriever, retriever>> that replaces the functionality of a traditi
2222
`knn`::
2323
A <<knn-retriever, retriever>> that replaces the functionality of a <<search-api-knn, knn search>>.
2424

25+
`rescorer`::
26+
A <<rescorer-retriever, retriever>> that replaces the functionality of the <<rescore, query rescorer>>.
27+
2528
`rrf`::
2629
A <<rrf-retriever, retriever>> that produces top documents from <<rrf, reciprocal rank fusion (RRF)>>.
2730

@@ -371,6 +374,122 @@ GET movies/_search
371374
----
372375
// TEST[skip:uses ELSER]
373376

377+
[[rescorer-retriever]]
378+
==== Rescorer Retriever
379+
380+
The `rescorer` retriever re-scores only the results produced by its child retriever.
381+
For the `standard` and `knn` retrievers, the `window_size` parameter specifies the number of documents examined per shard.
382+
383+
For compound retrievers like `rrf`, the `window_size` parameter defines the total number of documents examined globally.
384+
385+
When using the `rescorer`, an error is returned if the following conditions are not met:
386+
387+
* The minimum configured rescore's `window_size` is:
388+
** Greater than or equal to the `size` of the parent retriever for nested `rescorer` setups.
389+
** Greater than or equal to the `size` of the search request when used as the primary retriever in the tree.
390+
391+
* And the maximum rescore's `window_size` is:
392+
** Smaller than or equal to the `size` or `rank_window_size` of the child retriever.
393+
394+
[discrete]
395+
[[rescorer-retriever-parameters]]
396+
===== Parameters
397+
398+
`rescore`::
399+
(Required. <<rescore, A rescorer definition or an array of rescorer definitions>>)
400+
+
401+
Defines the <<rescore, rescorers>> applied sequentially to the top documents returned by the child retriever.
402+
403+
`retriever`::
404+
(Required. <<retriever, retriever>>)
405+
+
406+
Specifies the child retriever responsible for generating the initial set of top documents to be re-ranked.
407+
408+
`filter`::
409+
(Optional. <<query-dsl, query object or list of query objects>>)
410+
+
411+
Applies a <<query-dsl-bool-query, boolean query filter>> to the retriever, ensuring that all documents match the filter criteria without affecting their scores.
412+
413+
[discrete]
414+
[[rescorer-retriever-example]]
415+
==== Example
416+
417+
The `rescorer` retriever can be placed at any level within the retriever tree.
418+
The following example demonstrates a `rescorer` applied to the results produced by an `rrf` retriever:
419+
420+
[source,console]
421+
----
422+
GET movies/_search
423+
{
424+
"size": 10, <1>
425+
"retriever": {
426+
"rescorer": { <2>
427+
"rescore": {
428+
"query": { <3>
429+
"window_size": 50, <4>
430+
"rescore_query": {
431+
"script_score": {
432+
"script": {
433+
"source": "cosineSimilarity(params.queryVector, 'product-vector_final_stage') + 1.0",
434+
"params": {
435+
"queryVector": [-0.5, 90.0, -10, 14.8, -156.0]
436+
}
437+
}
438+
}
439+
}
440+
}
441+
},
442+
"retriever": { <5>
443+
"rrf": {
444+
"rank_window_size": 100, <6>
445+
"retrievers": [
446+
{
447+
"standard": {
448+
"query": {
449+
"sparse_vector": {
450+
"field": "plot_embedding",
451+
"inference_id": "my-elser-model",
452+
"query": "films that explore psychological depths"
453+
}
454+
}
455+
}
456+
},
457+
{
458+
"standard": {
459+
"query": {
460+
"multi_match": {
461+
"query": "crime",
462+
"fields": [
463+
"plot",
464+
"title"
465+
]
466+
}
467+
}
468+
}
469+
},
470+
{
471+
"knn": {
472+
"field": "vector",
473+
"query_vector": [10, 22, 77],
474+
"k": 10,
475+
"num_candidates": 10
476+
}
477+
}
478+
]
479+
}
480+
}
481+
}
482+
}
483+
}
484+
----
485+
// TEST[skip:uses ELSER]
486+
<1> Specifies the number of top documents to return in the final response.
487+
<2> A `rescorer` retriever applied as the final step.
488+
<3> The definition of the `query` rescorer.
489+
<4> Defines the number of documents to rescore from the child retriever.
490+
<5> Specifies the child retriever definition.
491+
<6> Defines the number of documents returned by the `rrf` retriever, which limits the available documents to
492+
374493
[[text-similarity-reranker-retriever]]
375494
==== Text Similarity Re-ranker Retriever
376495

@@ -772,4 +891,4 @@ When a retriever is specified as part of a search, the following elements are no
772891
* <<search-after, `search_after`>>
773892
* <<request-body-search-terminate-after, `terminate_after`>>
774893
* <<search-sort-param, `sort`>>
775-
* <<rescore, `rescore`>>
894+
* <<rescore, `rescore`>> use a <<rescorer-retriever, rescorer retriever>> instead

rest-api-spec/build.gradle

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,4 +258,5 @@ tasks.named("yamlRestTestV7CompatTransform").configure({ task ->
258258
task.skipTest("search.vectors/41_knn_search_bbq_hnsw/Test knn search", "Scoring has changed in latest versions")
259259
task.skipTest("search.vectors/42_knn_search_bbq_flat/Test knn search", "Scoring has changed in latest versions")
260260
task.skipTest("synonyms/90_synonyms_reloading_for_synset/Reload analyzers for specific synonym set", "Can't work until auto-expand replicas is 0-1 for synonyms index")
261+
task.skipTest("search/90_search_after/_shard_doc sort", "restriction has been lifted in latest versions")
261262
})
Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
setup:
2+
- requires:
3+
cluster_features: [ "search.retriever.rescorer.enabled" ]
4+
reason: "Support for rescorer retriever"
5+
6+
- do:
7+
indices.create:
8+
index: test
9+
body:
10+
settings:
11+
number_of_shards: 1
12+
number_of_replicas: 0
13+
mappings:
14+
properties:
15+
available:
16+
type: boolean
17+
features:
18+
type: rank_features
19+
20+
- do:
21+
bulk:
22+
refresh: true
23+
index: test
24+
body:
25+
- '{"index": {"_id": 1 }}'
26+
- '{"features": { "first_stage": 1, "second_stage": 10}, "available": true, "group": 1}'
27+
- '{"index": {"_id": 2 }}'
28+
- '{"features": { "first_stage": 2, "second_stage": 9}, "available": false, "group": 1}'
29+
- '{"index": {"_id": 3 }}'
30+
- '{"features": { "first_stage": 3, "second_stage": 8}, "available": false, "group": 3}'
31+
- '{"index": {"_id": 4 }}'
32+
- '{"features": { "first_stage": 4, "second_stage": 7}, "available": true, "group": 1}'
33+
- '{"index": {"_id": 5 }}'
34+
- '{"features": { "first_stage": 5, "second_stage": 6}, "available": true, "group": 3}'
35+
- '{"index": {"_id": 6 }}'
36+
- '{"features": { "first_stage": 6, "second_stage": 5}, "available": false, "group": 2}'
37+
- '{"index": {"_id": 7 }}'
38+
- '{"features": { "first_stage": 7, "second_stage": 4}, "available": true, "group": 3}'
39+
- '{"index": {"_id": 8 }}'
40+
- '{"features": { "first_stage": 8, "second_stage": 3}, "available": true, "group": 1}'
41+
- '{"index": {"_id": 9 }}'
42+
- '{"features": { "first_stage": 9, "second_stage": 2}, "available": true, "group": 2}'
43+
- '{"index": {"_id": 10 }}'
44+
- '{"features": { "first_stage": 10, "second_stage": 1}, "available": false, "group": 1}'
45+
46+
---
47+
"Rescorer retriever basic":
48+
- do:
49+
search:
50+
index: test
51+
body:
52+
retriever:
53+
rescorer:
54+
rescore:
55+
window_size: 10
56+
query:
57+
rescore_query:
58+
rank_feature:
59+
field: "features.second_stage"
60+
linear: { }
61+
query_weight: 0
62+
retriever:
63+
standard:
64+
query:
65+
rank_feature:
66+
field: "features.first_stage"
67+
linear: { }
68+
size: 2
69+
70+
- match: { hits.total.value: 10 }
71+
- match: { hits.hits.0._id: "1" }
72+
- match: { hits.hits.0._score: 10.0 }
73+
- match: { hits.hits.1._id: "2" }
74+
- match: { hits.hits.1._score: 9.0 }
75+
76+
- do:
77+
search:
78+
index: test
79+
body:
80+
retriever:
81+
rescorer:
82+
rescore:
83+
window_size: 3
84+
query:
85+
rescore_query:
86+
rank_feature:
87+
field: "features.second_stage"
88+
linear: {}
89+
query_weight: 0
90+
retriever:
91+
standard:
92+
query:
93+
rank_feature:
94+
field: "features.first_stage"
95+
linear: {}
96+
size: 2
97+
98+
- match: {hits.total.value: 10}
99+
- match: {hits.hits.0._id: "8"}
100+
- match: { hits.hits.0._score: 3.0 }
101+
- match: {hits.hits.1._id: "9"}
102+
- match: { hits.hits.1._score: 2.0 }
103+
104+
---
105+
"Rescorer retriever with pre-filters":
106+
- do:
107+
search:
108+
index: test
109+
body:
110+
retriever:
111+
rescorer:
112+
filter:
113+
match:
114+
available: true
115+
rescore:
116+
window_size: 10
117+
query:
118+
rescore_query:
119+
rank_feature:
120+
field: "features.second_stage"
121+
linear: { }
122+
query_weight: 0
123+
retriever:
124+
standard:
125+
query:
126+
rank_feature:
127+
field: "features.first_stage"
128+
linear: { }
129+
size: 2
130+
131+
- match: { hits.total.value: 6 }
132+
- match: { hits.hits.0._id: "1" }
133+
- match: { hits.hits.0._score: 10.0 }
134+
- match: { hits.hits.1._id: "4" }
135+
- match: { hits.hits.1._score: 7.0 }
136+
137+
- do:
138+
search:
139+
index: test
140+
body:
141+
retriever:
142+
rescorer:
143+
rescore:
144+
window_size: 4
145+
query:
146+
rescore_query:
147+
rank_feature:
148+
field: "features.second_stage"
149+
linear: { }
150+
query_weight: 0
151+
retriever:
152+
standard:
153+
filter:
154+
match:
155+
available: true
156+
query:
157+
rank_feature:
158+
field: "features.first_stage"
159+
linear: { }
160+
size: 2
161+
162+
- match: { hits.total.value: 6 }
163+
- match: { hits.hits.0._id: "5" }
164+
- match: { hits.hits.0._score: 6.0 }
165+
- match: { hits.hits.1._id: "7" }
166+
- match: { hits.hits.1._score: 4.0 }
167+
168+
---
169+
"Rescorer retriever and collapsing":
170+
- do:
171+
search:
172+
index: test
173+
body:
174+
retriever:
175+
rescorer:
176+
rescore:
177+
window_size: 10
178+
query:
179+
rescore_query:
180+
rank_feature:
181+
field: "features.second_stage"
182+
linear: { }
183+
query_weight: 0
184+
retriever:
185+
standard:
186+
query:
187+
rank_feature:
188+
field: "features.first_stage"
189+
linear: { }
190+
collapse:
191+
field: group
192+
size: 3
193+
194+
- match: { hits.total.value: 10 }
195+
- match: { hits.hits.0._id: "1" }
196+
- match: { hits.hits.0._score: 10.0 }
197+
- match: { hits.hits.1._id: "3" }
198+
- match: { hits.hits.1._score: 8.0 }
199+
- match: { hits.hits.2._id: "6" }
200+
- match: { hits.hits.2._score: 5.0 }
201+
202+
---
203+
"Rescorer retriever and invalid window size":
204+
- do:
205+
catch: "/\\[rescorer\\] requires \\[window_size: 5\\] be greater than or equal to \\[size: 10\\]/"
206+
search:
207+
index: test
208+
body:
209+
retriever:
210+
rescorer:
211+
rescore:
212+
window_size: 5
213+
query:
214+
rescore_query:
215+
rank_feature:
216+
field: "features.second_stage"
217+
linear: { }
218+
query_weight: 0
219+
retriever:
220+
standard:
221+
query:
222+
rank_feature:
223+
field: "features.first_stage"
224+
linear: { }
225+
size: 10

0 commit comments

Comments
 (0)