Skip to content

Commit 0b16ce8

Browse files
authored
[8.19] Simplified Linear and RRF Retrievers Docs (#130842)
1 parent ce51a84 commit 0b16ce8

File tree

3 files changed

+643
-75
lines changed

3 files changed

+643
-75
lines changed

docs/reference/rest-api/common-parms.asciidoc

Lines changed: 64 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1310,8 +1310,26 @@ See <<index-wait-for-active-shards>>.
13101310
end::wait_for_active_shards[]
13111311

13121312
tag::rrf-retrievers[]
1313+
1314+
[NOTE]
1315+
====
1316+
Either `query` or `retrievers` must be specified.
1317+
Combining `query` and `retrievers` is not supported.
1318+
====
1319+
1320+
`query`::
1321+
(Optional, String)
1322+
+
1323+
The query to use when using the <<multi-field-query-format, multi-field query format>>.
1324+
1325+
`fields`::
1326+
(Optional, array of strings)
1327+
+
1328+
The fields to query when using the <<multi-field-query-format, multi-field query format>>.
1329+
If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default.
1330+
13131331
`retrievers`::
1314-
(Required, array of retriever objects)
1332+
(Optional, array of retriever objects)
13151333
+
13161334
A list of child retrievers to specify which sets of returned top documents
13171335
will have the RRF formula applied to them. Each child retriever carries an
@@ -1337,7 +1355,7 @@ This value determines the size of the individual result sets per
13371355
query. A higher value will improve result relevance at the cost of performance. The final
13381356
ranked result set is pruned down to the search request's <<search-size-param, size>>.
13391357
`rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`.
1340-
Defaults to the `size` parameter.
1358+
Defaults to 10.
13411359
end::compound-retriever-rank-window-size[]
13421360

13431361
tag::compound-retriever-filter[]
@@ -1349,39 +1367,68 @@ according to each retriever's specifications.
13491367
end::compound-retriever-filter[]
13501368

13511369
tag::linear-retriever-components[]
1370+
1371+
[NOTE]
1372+
====
1373+
Either `query` or `retrievers` must be specified.
1374+
Combining `query` and `retrievers` is not supported.
1375+
====
1376+
1377+
`query`::
1378+
(Optional, String)
1379+
+
1380+
The query to use when using the <<multi-field-query-format, multi-field query format>>.
1381+
1382+
`fields`::
1383+
(Optional, array of strings)
1384+
+
1385+
The fields to query when using the <<multi-field-query-format, multi-field query format>>.
1386+
Fields can include boost values using the `^` notation (e.g., `"field^2"`).
1387+
If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default.
1388+
1389+
`normalizer`::
1390+
(Optional, String)
1391+
+
1392+
The normalizer to use when using the <<multi-field-query-format, multi-field query format>>.
1393+
See <<linear-retriever-normalizers, normalizers>> for supported values.
1394+
Required when `query` is specified.
1395+
+
1396+
[WARNING]
1397+
====
1398+
Avoid using `none` as that will disable normalization and may bias the result set towards lexical matches.
1399+
See <<multi-field-field-grouping, field grouping>> for more information.
1400+
====
1401+
13521402
`retrievers`::
1353-
(Required, array of objects)
1403+
(Optional, array of objects)
13541404
+
13551405
A list of the sub-retrievers' configuration, that we will take into account and whose result sets
13561406
we will merge through a weighted sum. Each configuration can have a different weight and normalization depending
13571407
on the specified retriever.
13581408

1359-
Each entry specifies the following parameters:
1409+
include::common-parms.asciidoc[tag=compound-retriever-rank-window-size]
1410+
1411+
include::common-parms.asciidoc[tag=compound-retriever-filter]
13601412

1361-
* `retriever`::
1413+
Each entry in the `retrievers` array specifies the following parameters:
1414+
1415+
`retriever`::
13621416
(Required, a <<retriever, retriever>> object)
13631417
+
13641418
Specifies the retriever for which we will compute the top documents for. The retriever will produce `rank_window_size`
13651419
results, which will later be merged based on the specified `weight` and `normalizer`.
13661420

1367-
* `weight`::
1421+
`weight`::
13681422
(Optional, float)
13691423
+
13701424
The weight that each score of this retriever's top docs will be multiplied with. Must be greater or equal to 0. Defaults to 1.0.
13711425

1372-
* `normalizer`::
1426+
`normalizer`::
13731427
(Optional, String)
13741428
+
1375-
Specifies how we will normalize the retriever's scores, before applying the specified `weight`.
1376-
Available values are: `minmax`, and `none`. Defaults to `none`.
1377-
1378-
** `none`
1379-
** `minmax` :
1380-
A `MinMaxScoreNormalizer` that normalizes scores based on the following formula
1381-
+
1382-
```
1383-
score = (score - min) / (max - min)
1384-
```
1429+
Specifies how the retriever’s score will be normalized before applying the specified `weight`.
1430+
See <<linear-retriever-normalizers, normalizers>> for supported values.
1431+
Defaults to `none`.
13851432

13861433
See also <<retrievers-examples-linear-retriever, this hybrid search example>> using a linear retriever on how to
13871434
independently configure and apply normalizers to retrievers.

docs/reference/search/retriever.asciidoc

Lines changed: 232 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,28 @@ POST /restaurants/_bulk?refresh
121121
122122
PUT /movies
123123
124+
PUT /books
125+
{
126+
"mappings": {
127+
"properties": {
128+
"title": {
129+
"type": "text",
130+
"copy_to": "title_semantic"
131+
},
132+
"description": {
133+
"type": "text",
134+
"copy_to": "description_semantic"
135+
},
136+
"title_semantic": {
137+
"type": "semantic_text"
138+
},
139+
"description_semantic": {
140+
"type": "semantic_text"
141+
}
142+
}
143+
}
144+
}
145+
124146
PUT _query_rules/my-ruleset
125147
{
126148
"rules": [
@@ -151,6 +173,8 @@ PUT _query_rules/my-ruleset
151173
DELETE /restaurants
152174
153175
DELETE /movies
176+
177+
DELETE /books
154178
--------------------------------------------------
155179
// TEARDOWN
156180
////
@@ -282,9 +306,19 @@ A retriever that normalizes and linearly combines the scores of other retrievers
282306

283307
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=linear-retriever-components]
284308

285-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
309+
[[linear-retriever-normalizers]]
310+
===== Normalizers
286311

287-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter]
312+
The `linear` retriever supports the following normalizers:
313+
314+
* `none`: No normalization
315+
* `minmax`: Normalizes scores based on the following formula:
316+
+
317+
....
318+
score = (score - min) / (max - min)
319+
....
320+
321+
* `l2_norm`: Normalizes scores using the L2 norm of the score values
288322

289323
[[rrf-retriever]]
290324
==== RRF Retriever
@@ -912,6 +946,202 @@ GET movies/_search
912946
<1> The `rule` retriever is the outermost retriever, applying rules to the search results that were previously reranked using the `rrf` retriever.
913947
<2> The `rrf` retriever returns results from all of its sub-retrievers, and the output of the `rrf` retriever is used as input to the `rule` retriever.
914948

949+
[discrete]
950+
[[multi-field-query-format]]
951+
=== Multi-field query format
952+
953+
The `linear` and `rrf` retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers.
954+
This format automatically generates appropriate inner retrievers based on the field types and query parameters.
955+
This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches.
956+
957+
[discrete]
958+
[[multi-field-field-grouping]]
959+
==== Field grouping
960+
961+
The multi-field query format groups queried fields into two categories:
962+
963+
- **Lexical fields**: fields that support term queries, such as `keyword` and `text` fields.
964+
- **Semantic fields**: <<semantic-text, `semantic_text` fields>>.
965+
966+
Each field group is queried separately and the scores/ranks are normalized such that each contributes 50% to the final score/rank.
967+
This balances the importance of lexical and semantic fields.
968+
Most indices contain more lexical than semantic fields, and without this grouping the results would often bias towards lexical field matches.
969+
970+
[WARNING]
971+
====
972+
In the `linear` retriever, this grouping relies on using a normalizer other than `none` (i.e., `minmax` or `l2_norm`).
973+
If you use the `none` normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches.
974+
====
975+
976+
[discrete]
977+
[[multi-field-field-boosting]]
978+
==== Linear retriever field boosting
979+
980+
When using the `linear` retriever, fields can be boosted using the `^` notation:
981+
982+
[source,console]
983+
----
984+
GET books/_search
985+
{
986+
"retriever": {
987+
"linear": {
988+
"query": "elasticsearch",
989+
"fields": [
990+
"title^3", <1>
991+
"description^2", <2>
992+
"title_semantic", <3>
993+
"description_semantic^2"
994+
],
995+
"normalizer": "minmax"
996+
}
997+
}
998+
}
999+
----
1000+
// TEST[continued]
1001+
1002+
<1> 3x weight
1003+
<2> 2x weight
1004+
<3> 1x weight (default)
1005+
1006+
Due to how the <<multi-field-field-grouping, field group scores>> are normalized, per-field boosts have no effect on the range of the final score.
1007+
Instead, they affect the importance of the field's score within its group.
1008+
1009+
For example, if the schema looks like:
1010+
1011+
[source,console]
1012+
----
1013+
PUT /books
1014+
{
1015+
"mappings": {
1016+
"properties": {
1017+
"title": {
1018+
"type": "text",
1019+
"copy_to": "title_semantic"
1020+
},
1021+
"description": {
1022+
"type": "text",
1023+
"copy_to": "description_semantic"
1024+
},
1025+
"title_semantic": {
1026+
"type": "semantic_text"
1027+
},
1028+
"description_semantic": {
1029+
"type": "semantic_text"
1030+
}
1031+
}
1032+
}
1033+
}
1034+
----
1035+
// TEST[skip:index created in test setup]
1036+
1037+
And we run this query:
1038+
1039+
[source,console]
1040+
----
1041+
GET books/_search
1042+
{
1043+
"retriever": {
1044+
"linear": {
1045+
"query": "elasticsearch",
1046+
"fields": [
1047+
"title",
1048+
"description",
1049+
"title_semantic",
1050+
"description_semantic"
1051+
],
1052+
"normalizer": "minmax"
1053+
}
1054+
}
1055+
}
1056+
----
1057+
// TEST[continued]
1058+
1059+
The score breakdown would be:
1060+
1061+
* Lexical fields (50% of score):
1062+
** `title`: 50% of lexical fields group score, 25% of final score
1063+
** `description`: 50% of lexical fields group score, 25% of final score
1064+
* Semantic fields (50% of score):
1065+
** `title_semantic`: 50% of semantic fields group score, 25% of final score
1066+
** `description_semantic`: 50% of semantic fields group score, 25% of final score
1067+
1068+
If we apply per-field boosts like so:
1069+
1070+
[source,console]
1071+
----
1072+
GET books/_search
1073+
{
1074+
"retriever": {
1075+
"linear": {
1076+
"query": "elasticsearch",
1077+
"fields": [
1078+
"title^3",
1079+
"description^2",
1080+
"title_semantic",
1081+
"description_semantic^2"
1082+
],
1083+
"normalizer": "minmax"
1084+
}
1085+
}
1086+
}
1087+
----
1088+
// TEST[continued]
1089+
1090+
The score breakdown would change to:
1091+
1092+
* Lexical fields (50% of score):
1093+
** `title`: 60% of lexical fields group score, 30% of final score
1094+
** `description`: 40% of lexical fields group score, 20% of final score
1095+
* Semantic fields (50% of score):
1096+
** `title_semantic`: 33% of semantic fields group score, 16.5% of final score
1097+
** `description_semantic`: 66% of semantic fields group score, 33% of final score
1098+
1099+
[discrete]
1100+
[[multi-field-wildcard-field-patterns]]
1101+
==== Wildcard field patterns
1102+
1103+
Field names support the `*` wildcard character to match multiple fields:
1104+
1105+
[source,console]
1106+
----
1107+
GET books/_search
1108+
{
1109+
"retriever": {
1110+
"rrf": {
1111+
"query": "machine learning",
1112+
"fields": [
1113+
"title*", <1>
1114+
"*_text" <2>
1115+
]
1116+
}
1117+
}
1118+
}
1119+
----
1120+
// TEST[continued]
1121+
1122+
<1> Match fields that start with `title`
1123+
<2> Match fields that end with `_text`
1124+
1125+
Note, however, that wildcard field patterns will only resolve to fields that either:
1126+
1127+
- Support term queries, such as `keyword` and `text` fields
1128+
- Are `semantic_text` fields
1129+
1130+
[discrete]
1131+
[[multi-field-limitations]]
1132+
==== Limitations
1133+
1134+
- **Single index**: Multi-field queries currently work with single index searches only
1135+
- **CCS (Cross Cluster Search)**: Multi-field queries do not support remote cluster searches
1136+
1137+
[discrete]
1138+
[[multi-field-examples]]
1139+
==== Examples
1140+
1141+
- <<retrievers-examples-rrf-multi-field-query-format, RRF with the multi-field query format>>
1142+
- <<retrievers-examples-linear-multi-field-query-format, Linear retriever with the multi-field query format>>
1143+
1144+
9151145
[discrete]
9161146
[[retriever-common-parameters]]
9171147
=== Common usage guidelines

0 commit comments

Comments
 (0)