@@ -121,6 +121,28 @@ POST /restaurants/_bulk?refresh
121
121
122
122
PUT /movies
123
123
124
+ PUT /books
125
+ {
126
+ "mappings": {
127
+ "properties": {
128
+ "title": {
129
+ "type": "text",
130
+ "copy_to": "title_semantic"
131
+ },
132
+ "description": {
133
+ "type": "text",
134
+ "copy_to": "description_semantic"
135
+ },
136
+ "title_semantic": {
137
+ "type": "semantic_text"
138
+ },
139
+ "description_semantic": {
140
+ "type": "semantic_text"
141
+ }
142
+ }
143
+ }
144
+ }
145
+
124
146
PUT _query_rules/my-ruleset
125
147
{
126
148
"rules": [
@@ -151,6 +173,8 @@ PUT _query_rules/my-ruleset
151
173
DELETE /restaurants
152
174
153
175
DELETE /movies
176
+
177
+ DELETE /books
154
178
--------------------------------------------------
155
179
// TEARDOWN
156
180
////
@@ -282,9 +306,19 @@ A retriever that normalizes and linearly combines the scores of other retrievers
282
306
283
307
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=linear-retriever-components]
284
308
285
- include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
309
+ [[linear-retriever-normalizers]]
310
+ ===== Normalizers
286
311
287
- include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter]
312
+ The `linear` retriever supports the following normalizers:
313
+
314
+ * `none`: No normalization
315
+ * `minmax`: Normalizes scores based on the following formula:
316
+ +
317
+ ....
318
+ score = (score - min) / (max - min)
319
+ ....
320
+
321
+ * `l2_norm`: Normalizes scores using the L2 norm of the score values
288
322
289
323
[[rrf-retriever]]
290
324
==== RRF Retriever
@@ -912,6 +946,202 @@ GET movies/_search
912
946
<1> The `rule` retriever is the outermost retriever, applying rules to the search results that were previously reranked using the `rrf` retriever.
913
947
<2> The `rrf` retriever returns results from all of its sub-retrievers, and the output of the `rrf` retriever is used as input to the `rule` retriever.
914
948
949
+ [discrete]
950
+ [[multi-field-query-format]]
951
+ === Multi-field query format
952
+
953
+ The `linear` and `rrf` retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers.
954
+ This format automatically generates appropriate inner retrievers based on the field types and query parameters.
955
+ This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches.
956
+
957
+ [discrete]
958
+ [[multi-field-field-grouping]]
959
+ ==== Field grouping
960
+
961
+ The multi-field query format groups queried fields into two categories:
962
+
963
+ - **Lexical fields**: fields that support term queries, such as `keyword` and `text` fields.
964
+ - **Semantic fields**: <<semantic-text, `semantic_text` fields>>.
965
+
966
+ Each field group is queried separately and the scores/ranks are normalized such that each contributes 50% to the final score/rank.
967
+ This balances the importance of lexical and semantic fields.
968
+ Most indices contain more lexical than semantic fields, and without this grouping the results would often bias towards lexical field matches.
969
+
970
+ [WARNING]
971
+ ====
972
+ In the `linear` retriever, this grouping relies on using a normalizer other than `none` (i.e., `minmax` or `l2_norm`).
973
+ If you use the `none` normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches.
974
+ ====
975
+
976
+ [discrete]
977
+ [[multi-field-field-boosting]]
978
+ ==== Linear retriever field boosting
979
+
980
+ When using the `linear` retriever, fields can be boosted using the `^` notation:
981
+
982
+ [source,console]
983
+ ----
984
+ GET books/_search
985
+ {
986
+ "retriever": {
987
+ "linear": {
988
+ "query": "elasticsearch",
989
+ "fields": [
990
+ "title^3", <1>
991
+ "description^2", <2>
992
+ "title_semantic", <3>
993
+ "description_semantic^2"
994
+ ],
995
+ "normalizer": "minmax"
996
+ }
997
+ }
998
+ }
999
+ ----
1000
+ // TEST[continued]
1001
+
1002
+ <1> 3x weight
1003
+ <2> 2x weight
1004
+ <3> 1x weight (default)
1005
+
1006
+ Due to how the <<multi-field-field-grouping, field group scores>> are normalized, per-field boosts have no effect on the range of the final score.
1007
+ Instead, they affect the importance of the field's score within its group.
1008
+
1009
+ For example, if the schema looks like:
1010
+
1011
+ [source,console]
1012
+ ----
1013
+ PUT /books
1014
+ {
1015
+ "mappings": {
1016
+ "properties": {
1017
+ "title": {
1018
+ "type": "text",
1019
+ "copy_to": "title_semantic"
1020
+ },
1021
+ "description": {
1022
+ "type": "text",
1023
+ "copy_to": "description_semantic"
1024
+ },
1025
+ "title_semantic": {
1026
+ "type": "semantic_text"
1027
+ },
1028
+ "description_semantic": {
1029
+ "type": "semantic_text"
1030
+ }
1031
+ }
1032
+ }
1033
+ }
1034
+ ----
1035
+ // TEST[skip:index created in test setup]
1036
+
1037
+ And we run this query:
1038
+
1039
+ [source,console]
1040
+ ----
1041
+ GET books/_search
1042
+ {
1043
+ "retriever": {
1044
+ "linear": {
1045
+ "query": "elasticsearch",
1046
+ "fields": [
1047
+ "title",
1048
+ "description",
1049
+ "title_semantic",
1050
+ "description_semantic"
1051
+ ],
1052
+ "normalizer": "minmax"
1053
+ }
1054
+ }
1055
+ }
1056
+ ----
1057
+ // TEST[continued]
1058
+
1059
+ The score breakdown would be:
1060
+
1061
+ * Lexical fields (50% of score):
1062
+ ** `title`: 50% of lexical fields group score, 25% of final score
1063
+ ** `description`: 50% of lexical fields group score, 25% of final score
1064
+ * Semantic fields (50% of score):
1065
+ ** `title_semantic`: 50% of semantic fields group score, 25% of final score
1066
+ ** `description_semantic`: 50% of semantic fields group score, 25% of final score
1067
+
1068
+ If we apply per-field boosts like so:
1069
+
1070
+ [source,console]
1071
+ ----
1072
+ GET books/_search
1073
+ {
1074
+ "retriever": {
1075
+ "linear": {
1076
+ "query": "elasticsearch",
1077
+ "fields": [
1078
+ "title^3",
1079
+ "description^2",
1080
+ "title_semantic",
1081
+ "description_semantic^2"
1082
+ ],
1083
+ "normalizer": "minmax"
1084
+ }
1085
+ }
1086
+ }
1087
+ ----
1088
+ // TEST[continued]
1089
+
1090
+ The score breakdown would change to:
1091
+
1092
+ * Lexical fields (50% of score):
1093
+ ** `title`: 60% of lexical fields group score, 30% of final score
1094
+ ** `description`: 40% of lexical fields group score, 20% of final score
1095
+ * Semantic fields (50% of score):
1096
+ ** `title_semantic`: 33% of semantic fields group score, 16.5% of final score
1097
+ ** `description_semantic`: 66% of semantic fields group score, 33% of final score
1098
+
1099
+ [discrete]
1100
+ [[multi-field-wildcard-field-patterns]]
1101
+ ==== Wildcard field patterns
1102
+
1103
+ Field names support the `*` wildcard character to match multiple fields:
1104
+
1105
+ [source,console]
1106
+ ----
1107
+ GET books/_search
1108
+ {
1109
+ "retriever": {
1110
+ "rrf": {
1111
+ "query": "machine learning",
1112
+ "fields": [
1113
+ "title*", <1>
1114
+ "*_text" <2>
1115
+ ]
1116
+ }
1117
+ }
1118
+ }
1119
+ ----
1120
+ // TEST[continued]
1121
+
1122
+ <1> Match fields that start with `title`
1123
+ <2> Match fields that end with `_text`
1124
+
1125
+ Note, however, that wildcard field patterns will only resolve to fields that either:
1126
+
1127
+ - Support term queries, such as `keyword` and `text` fields
1128
+ - Are `semantic_text` fields
1129
+
1130
+ [discrete]
1131
+ [[multi-field-limitations]]
1132
+ ==== Limitations
1133
+
1134
+ - **Single index**: Multi-field queries currently work with single index searches only
1135
+ - **CCS (Cross Cluster Search)**: Multi-field queries do not support remote cluster searches
1136
+
1137
+ [discrete]
1138
+ [[multi-field-examples]]
1139
+ ==== Examples
1140
+
1141
+ - <<retrievers-examples-rrf-multi-field-query-format, RRF with the multi-field query format>>
1142
+ - <<retrievers-examples-linear-multi-field-query-format, Linear retriever with the multi-field query format>>
1143
+
1144
+
915
1145
[discrete]
916
1146
[[retriever-common-parameters]]
917
1147
=== Common usage guidelines
0 commit comments