[8.19] Simplified Linear and RRF Retrievers Docs (#130842)

Mikep86 · web-flow · commit 0b16ce85a58c · 2025-07-10T10:48:13.000-04:00
diff --git a/docs/reference/rest-api/common-parms.asciidoc b/docs/reference/rest-api/common-parms.asciidoc
@@ -1310,8 +1310,26 @@ See <<index-wait-for-active-shards>>.
 end::wait_for_active_shards[]
 
 tag::rrf-retrievers[]
+
+[NOTE]
+====
+Either `query` or `retrievers` must be specified.
+Combining `query` and `retrievers` is not supported.
+====
+
+`query`::
+(Optional, String)
++
+The query to use when using the <<multi-field-query-format, multi-field query format>>.
+
+`fields`::
+(Optional, array of strings)
++
+The fields to query when using the <<multi-field-query-format, multi-field query format>>.
+If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default.
+
 `retrievers`::
-(Required, array of retriever objects)
+(Optional, array of retriever objects)
 +
 A list of child retrievers to specify which sets of returned top documents
 will have the RRF formula applied to them. Each child retriever carries an
@@ -1337,7 +1355,7 @@ This value determines the size of the individual result sets per
 query. A higher value will improve result relevance at the cost of performance. The final
 ranked result set is pruned down to the search request's <<search-size-param, size>>.
 `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`.
-Defaults to the `size` parameter.
+Defaults to 10.
 end::compound-retriever-rank-window-size[]
 
 tag::compound-retriever-filter[]
@@ -1349,39 +1367,68 @@ according to each retriever's specifications.
 end::compound-retriever-filter[]
 
 tag::linear-retriever-components[]
+
+[NOTE]
+====
+Either `query` or `retrievers` must be specified.
+Combining `query` and `retrievers` is not supported.
+====
+
+`query`::
+(Optional, String)
++
+The query to use when using the <<multi-field-query-format, multi-field query format>>.
+
+`fields`::
+(Optional, array of strings)
++
+The fields to query when using the <<multi-field-query-format, multi-field query format>>.
+Fields can include boost values using the `^` notation (e.g., `"field^2"`).
+If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default.
+
+`normalizer`::
+(Optional, String)
++
+The normalizer to use when using the <<multi-field-query-format, multi-field query format>>.
+See <<linear-retriever-normalizers, normalizers>> for supported values.
+Required when `query` is specified.
++
+[WARNING]
+====
+Avoid using `none` as that will disable normalization and may bias the result set towards lexical matches.
+See <<multi-field-field-grouping, field grouping>> for more information.
+====
+
 `retrievers`::
-(Required, array of objects)
+(Optional, array of objects)
 +
 A list of the sub-retrievers' configuration, that we will take into account and whose result sets
 we will merge through a weighted sum. Each configuration can have a different weight and normalization depending
 on the specified retriever.
 
-Each entry specifies the following parameters:
+include::common-parms.asciidoc[tag=compound-retriever-rank-window-size]
+
+include::common-parms.asciidoc[tag=compound-retriever-filter]
 
-* `retriever`::
+Each entry in the `retrievers` array specifies the following parameters:
+
+`retriever`::
 (Required, a <<retriever, retriever>> object)
 +
 Specifies the retriever for which we will compute the top documents for. The retriever will produce `rank_window_size`
 results, which will later be merged based on the specified `weight` and `normalizer`.
 
-* `weight`::
+`weight`::
 (Optional, float)
 +
 The weight that each score of this retriever's top docs will be multiplied with. Must be greater or equal to 0. Defaults to 1.0.
 
-* `normalizer`::
+`normalizer`::
 (Optional, String)
 +
-Specifies how we will normalize the retriever's scores, before applying the specified `weight`.
-Available values are: `minmax`, and `none`. Defaults to `none`.
-
-** `none`
-** `minmax` :
-A `MinMaxScoreNormalizer` that normalizes scores based on the following formula
-+
-```
-score = (score - min) / (max - min)
-```
+Specifies how the retriever’s score will be normalized before applying the specified `weight`.
+See <<linear-retriever-normalizers, normalizers>> for supported values.
+Defaults to `none`.
 
 See also <<retrievers-examples-linear-retriever, this hybrid search example>> using a linear retriever on how to
 independently configure and apply normalizers to retrievers.
diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc
@@ -121,6 +121,28 @@ POST /restaurants/_bulk?refresh
 
 PUT /movies
 
+PUT /books
+{
+  "mappings": {
+    "properties": {
+      "title": {
+        "type": "text",
+        "copy_to": "title_semantic"
+      },
+      "description": {
+        "type": "text",
+        "copy_to": "description_semantic"
+      },
+      "title_semantic": {
+        "type": "semantic_text"
+      },
+      "description_semantic": {
+        "type": "semantic_text"
+      }
+    }
+  }
+}
+
 PUT _query_rules/my-ruleset
 {
     "rules": [
@@ -151,6 +173,8 @@ PUT _query_rules/my-ruleset
 DELETE /restaurants
 
 DELETE /movies
+
+DELETE /books
 --------------------------------------------------
 // TEARDOWN
 ////
@@ -282,9 +306,19 @@ A retriever that normalizes and linearly combines the scores of other retrievers
 
 include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=linear-retriever-components]
 
-include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
+[[linear-retriever-normalizers]]
+===== Normalizers
 
-include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter]
+The `linear` retriever supports the following normalizers:
+
+* `none`: No normalization
+* `minmax`: Normalizes scores based on the following formula:
++
+....
+score = (score - min) / (max - min)
+....
+
+* `l2_norm`: Normalizes scores using the L2 norm of the score values
 
 [[rrf-retriever]]
 ==== RRF Retriever
@@ -912,6 +946,202 @@ GET movies/_search
 <1> The `rule` retriever is the outermost retriever, applying rules to the search results that were previously reranked using the `rrf` retriever.
 <2> The `rrf` retriever returns results from all of its sub-retrievers, and the output of the `rrf` retriever is used as input to the `rule` retriever.
 
+[discrete]
+[[multi-field-query-format]]
+=== Multi-field query format
+
+The `linear` and `rrf` retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers.
+This format automatically generates appropriate inner retrievers based on the field types and query parameters.
+This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches.
+
+[discrete]
+[[multi-field-field-grouping]]
+==== Field grouping
+
+The multi-field query format groups queried fields into two categories:
+
+- **Lexical fields**: fields that support term queries, such as `keyword` and `text` fields.
+- **Semantic fields**: <<semantic-text, `semantic_text` fields>>.
+
+Each field group is queried separately and the scores/ranks are normalized such that each contributes 50% to the final score/rank.
+This balances the importance of lexical and semantic fields.
+Most indices contain more lexical than semantic fields, and without this grouping the results would often bias towards lexical field matches.
+
+[WARNING]
+====
+In the `linear` retriever, this grouping relies on using a normalizer other than `none` (i.e., `minmax` or `l2_norm`).
+If you use the `none` normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches.
+====
+
+[discrete]
+[[multi-field-field-boosting]]
+==== Linear retriever field boosting
+
+When using the `linear` retriever, fields can be boosted using the `^` notation:
+
+[source,console]
+----
+GET books/_search
+{
+  "retriever": {
+    "linear": {
+      "query": "elasticsearch",
+      "fields": [
+        "title^3",                <1>
+        "description^2",          <2>
+        "title_semantic",         <3>
+        "description_semantic^2"
+      ],
+      "normalizer": "minmax"
+    }
+  }
+}
+----
+// TEST[continued]
+
+<1> 3x weight
+<2> 2x weight
+<3> 1x weight (default)
+
+Due to how the <<multi-field-field-grouping, field group scores>> are normalized, per-field boosts have no effect on the range of the final score.
+Instead, they affect the importance of the field's score within its group.
+
+For example, if the schema looks like:
+
+[source,console]
+----
+PUT /books
+{
+  "mappings": {
+    "properties": {
+      "title": {
+        "type": "text",
+        "copy_to": "title_semantic"
+      },
+      "description": {
+        "type": "text",
+        "copy_to": "description_semantic"
+      },
+      "title_semantic": {
+        "type": "semantic_text"
+      },
+      "description_semantic": {
+        "type": "semantic_text"
+      }
+    }
+  }
+}
+----
+// TEST[skip:index created in test setup]
+
+And we run this query:
+
+[source,console]
+----
+GET books/_search
+{
+  "retriever": {
+    "linear": {
+      "query": "elasticsearch",
+      "fields": [
+        "title",
+        "description",
+        "title_semantic",
+        "description_semantic"
+      ],
+      "normalizer": "minmax"
+    }
+  }
+}
+----
+// TEST[continued]
+
+The score breakdown would be:
+
+* Lexical fields (50% of score):
+  ** `title`: 50% of lexical fields group score, 25% of final score
+  ** `description`: 50% of lexical fields group score, 25% of final score
+* Semantic fields (50% of score):
+  ** `title_semantic`: 50% of semantic fields group score, 25% of final score
+  ** `description_semantic`: 50% of semantic fields group score, 25% of final score
+
+If we apply per-field boosts like so:
+
+[source,console]
+----
+GET books/_search
+{
+  "retriever": {
+    "linear": {
+      "query": "elasticsearch",
+      "fields": [
+        "title^3",
+        "description^2",
+        "title_semantic",
+        "description_semantic^2"
+      ],
+      "normalizer": "minmax"
+    }
+  }
+}
+----
+// TEST[continued]
+
+The score breakdown would change to:
+
+* Lexical fields (50% of score):
+  ** `title`: 60% of lexical fields group score, 30% of final score
+  ** `description`: 40% of lexical fields group score, 20% of final score
+* Semantic fields (50% of score):
+  ** `title_semantic`: 33% of semantic fields group score, 16.5% of final score
+  ** `description_semantic`: 66% of semantic fields group score, 33% of final score
+
+[discrete]
+[[multi-field-wildcard-field-patterns]]
+==== Wildcard field patterns
+
+Field names support the `*` wildcard character to match multiple fields:
+
+[source,console]
+----
+GET books/_search
+{
+  "retriever": {
+    "rrf": {
+      "query": "machine learning",
+      "fields": [
+        "title*",    <1>
+        "*_text"     <2>
+      ]
+    }
+  }
+}
+----
+// TEST[continued]
+
+<1> Match fields that start with `title`
+<2> Match fields that end with `_text`
+
+Note, however, that wildcard field patterns will only resolve to fields that either:
+
+- Support term queries, such as `keyword` and `text` fields
+- Are `semantic_text` fields
+
+[discrete]
+[[multi-field-limitations]]
+==== Limitations
+
+- **Single index**: Multi-field queries currently work with single index searches only
+- **CCS (Cross Cluster Search)**: Multi-field queries do not support remote cluster searches
+
+[discrete]
+[[multi-field-examples]]
+==== Examples
+
+- <<retrievers-examples-rrf-multi-field-query-format, RRF with the multi-field query format>>
+- <<retrievers-examples-linear-multi-field-query-format, Linear retriever with the multi-field query format>>
+
+
 [discrete]
 [[retriever-common-parameters]]
 === Common usage guidelines
diff --git a/docs/reference/search/search-your-data/retrievers-examples.asciidoc b/docs/reference/search/search-your-data/retrievers-examples.asciidoc