@@ -121,6 +121,28 @@ POST /restaurants/_bulk?refresh
121121
122122PUT /movies
123123
124+ PUT /books
125+ {
126+ "mappings": {
127+ "properties": {
128+ "title": {
129+ "type": "text",
130+ "copy_to": "title_semantic"
131+ },
132+ "description": {
133+ "type": "text",
134+ "copy_to": "description_semantic"
135+ },
136+ "title_semantic": {
137+ "type": "semantic_text"
138+ },
139+ "description_semantic": {
140+ "type": "semantic_text"
141+ }
142+ }
143+ }
144+ }
145+
124146PUT _query_rules/my-ruleset
125147{
126148 "rules": [
@@ -151,6 +173,8 @@ PUT _query_rules/my-ruleset
151173DELETE /restaurants
152174
153175DELETE /movies
176+
177+ DELETE /books
154178--------------------------------------------------
155179// TEARDOWN
156180////
@@ -282,9 +306,19 @@ A retriever that normalizes and linearly combines the scores of other retrievers
282306
283307include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=linear-retriever-components]
284308
285- include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
309+ [[linear-retriever-normalizers]]
310+ ===== Normalizers
286311
287- include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter]
312+ The `linear` retriever supports the following normalizers:
313+
314+ * `none`: No normalization
315+ * `minmax`: Normalizes scores based on the following formula:
316+ +
317+ ....
318+ score = (score - min) / (max - min)
319+ ....
320+
321+ * `l2_norm`: Normalizes scores using the L2 norm of the score values
288322
289323[[rrf-retriever]]
290324==== RRF Retriever
@@ -912,6 +946,202 @@ GET movies/_search
912946<1> The `rule` retriever is the outermost retriever, applying rules to the search results that were previously reranked using the `rrf` retriever.
913947<2> The `rrf` retriever returns results from all of its sub-retrievers, and the output of the `rrf` retriever is used as input to the `rule` retriever.
914948
949+ [discrete]
950+ [[multi-field-query-format]]
951+ === Multi-field query format
952+
953+ The `linear` and `rrf` retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers.
954+ This format automatically generates appropriate inner retrievers based on the field types and query parameters.
955+ This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches.
956+
957+ [discrete]
958+ [[multi-field-field-grouping]]
959+ ==== Field grouping
960+
961+ The multi-field query format groups queried fields into two categories:
962+
963+ - **Lexical fields**: fields that support term queries, such as `keyword` and `text` fields.
964+ - **Semantic fields**: <<semantic-text, `semantic_text` fields>>.
965+
966+ Each field group is queried separately and the scores/ranks are normalized such that each contributes 50% to the final score/rank.
967+ This balances the importance of lexical and semantic fields.
968+ Most indices contain more lexical than semantic fields, and without this grouping the results would often bias towards lexical field matches.
969+
970+ [WARNING]
971+ ====
972+ In the `linear` retriever, this grouping relies on using a normalizer other than `none` (i.e., `minmax` or `l2_norm`).
973+ If you use the `none` normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches.
974+ ====
975+
976+ [discrete]
977+ [[multi-field-field-boosting]]
978+ ==== Linear retriever field boosting
979+
980+ When using the `linear` retriever, fields can be boosted using the `^` notation:
981+
982+ [source,console]
983+ ----
984+ GET books/_search
985+ {
986+ "retriever": {
987+ "linear": {
988+ "query": "elasticsearch",
989+ "fields": [
990+ "title^3", <1>
991+ "description^2", <2>
992+ "title_semantic", <3>
993+ "description_semantic^2"
994+ ],
995+ "normalizer": "minmax"
996+ }
997+ }
998+ }
999+ ----
1000+ // TEST[continued]
1001+
1002+ <1> 3x weight
1003+ <2> 2x weight
1004+ <3> 1x weight (default)
1005+
1006+ Due to how the <<multi-field-field-grouping, field group scores>> are normalized, per-field boosts have no effect on the range of the final score.
1007+ Instead, they affect the importance of the field's score within its group.
1008+
1009+ For example, if the schema looks like:
1010+
1011+ [source,console]
1012+ ----
1013+ PUT /books
1014+ {
1015+ "mappings": {
1016+ "properties": {
1017+ "title": {
1018+ "type": "text",
1019+ "copy_to": "title_semantic"
1020+ },
1021+ "description": {
1022+ "type": "text",
1023+ "copy_to": "description_semantic"
1024+ },
1025+ "title_semantic": {
1026+ "type": "semantic_text"
1027+ },
1028+ "description_semantic": {
1029+ "type": "semantic_text"
1030+ }
1031+ }
1032+ }
1033+ }
1034+ ----
1035+ // TEST[skip:index created in test setup]
1036+
1037+ And we run this query:
1038+
1039+ [source,console]
1040+ ----
1041+ GET books/_search
1042+ {
1043+ "retriever": {
1044+ "linear": {
1045+ "query": "elasticsearch",
1046+ "fields": [
1047+ "title",
1048+ "description",
1049+ "title_semantic",
1050+ "description_semantic"
1051+ ],
1052+ "normalizer": "minmax"
1053+ }
1054+ }
1055+ }
1056+ ----
1057+ // TEST[continued]
1058+
1059+ The score breakdown would be:
1060+
1061+ * Lexical fields (50% of score):
1062+ ** `title`: 50% of lexical fields group score, 25% of final score
1063+ ** `description`: 50% of lexical fields group score, 25% of final score
1064+ * Semantic fields (50% of score):
1065+ ** `title_semantic`: 50% of semantic fields group score, 25% of final score
1066+ ** `description_semantic`: 50% of semantic fields group score, 25% of final score
1067+
1068+ If we apply per-field boosts like so:
1069+
1070+ [source,console]
1071+ ----
1072+ GET books/_search
1073+ {
1074+ "retriever": {
1075+ "linear": {
1076+ "query": "elasticsearch",
1077+ "fields": [
1078+ "title^3",
1079+ "description^2",
1080+ "title_semantic",
1081+ "description_semantic^2"
1082+ ],
1083+ "normalizer": "minmax"
1084+ }
1085+ }
1086+ }
1087+ ----
1088+ // TEST[continued]
1089+
1090+ The score breakdown would change to:
1091+
1092+ * Lexical fields (50% of score):
1093+ ** `title`: 60% of lexical fields group score, 30% of final score
1094+ ** `description`: 40% of lexical fields group score, 20% of final score
1095+ * Semantic fields (50% of score):
1096+ ** `title_semantic`: 33% of semantic fields group score, 16.5% of final score
1097+ ** `description_semantic`: 66% of semantic fields group score, 33% of final score
1098+
1099+ [discrete]
1100+ [[multi-field-wildcard-field-patterns]]
1101+ ==== Wildcard field patterns
1102+
1103+ Field names support the `*` wildcard character to match multiple fields:
1104+
1105+ [source,console]
1106+ ----
1107+ GET books/_search
1108+ {
1109+ "retriever": {
1110+ "rrf": {
1111+ "query": "machine learning",
1112+ "fields": [
1113+ "title*", <1>
1114+ "*_text" <2>
1115+ ]
1116+ }
1117+ }
1118+ }
1119+ ----
1120+ // TEST[continued]
1121+
1122+ <1> Match fields that start with `title`
1123+ <2> Match fields that end with `_text`
1124+
1125+ Note, however, that wildcard field patterns will only resolve to fields that either:
1126+
1127+ - Support term queries, such as `keyword` and `text` fields
1128+ - Are `semantic_text` fields
1129+
1130+ [discrete]
1131+ [[multi-field-limitations]]
1132+ ==== Limitations
1133+
1134+ - **Single index**: Multi-field queries currently work with single index searches only
1135+ - **CCS (Cross Cluster Search)**: Multi-field queries do not support remote cluster searches
1136+
1137+ [discrete]
1138+ [[multi-field-examples]]
1139+ ==== Examples
1140+
1141+ - <<retrievers-examples-rrf-multi-field-query-format, RRF with the multi-field query format>>
1142+ - <<retrievers-examples-linear-multi-field-query-format, Linear retriever with the multi-field query format>>
1143+
1144+
9151145[discrete]
9161146[[retriever-common-parameters]]
9171147=== Common usage guidelines
0 commit comments