Linear retriever top level option for normalizer (elastic#129693)

mridula-s109 · elasticsearchmachine · kderusso · web-flow · commit f84ec74b2170 · 2025-08-21T13:09:59.000+02:00
* Per component normalizer is removed * Modified LinearRetrieverBuilder to propagate top level normalizer to each and every sub level * [CI] Auto commit changes from spotless * Component is modified * Retriever builder is also modified according to the new changes: * [CI] Auto commit changes from spotless * Spotless check done * Code changes made * FIX: Cast rewritten builder in LinearRetrieverBuilder * modified the builder * Update retrievers.md * Update retrievers.md * Update docs/changelog/129693.yaml * Update docs/changelog/129693.yaml Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co> * Update retrievers.md * WIP * [CI] Auto commit changes from spotless * WIP * Resolved errors * Fixed the retrievers * Reverted it to main * reverted * cleaned up * cleaned it up * Modified and cleaned code * Compilation and styling clean * Parsing issues resolved * Unittestspass but parsing issue equality persists * Parsing and builder tests are passing * Add comprehensive normalizer testing and cleanup duplicate files - Add extensive YAML REST tests covering 12 normalizer scenarios in 10_linear_retriever.yml - Add end-to-end integration test for mixed normalizer inheritance - Remove duplicate 10_linear_retriever_normalizers.yml file as requested - Cover edge cases: zero scores, large differences, error handling, field+query format - Ensure robust testing for production-level quality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Apply spotless code formatting * Final changes * Removed unnecessary changes from doc * Cleaned up test * Cleaned up * Reviewed the code * Cleaned up comments' * Reverted RetrieverBuilder * Cleaned up yaml * Fixed samuel comments * Worked on Michael comments * Reverted the retriever example change * The test was modified to include equalioty check * cleaned up resolve normalizer * optimised the parsing test * cleaned up duplicates * Added cluster features * Modified docs * worked on all the changes * Update 10_linear_retriever.yml * Nitpicks and some other enhancement comments resolved --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co> Co-authored-by: Claude <noreply@anthropic.com>
diff --git a/docs/changelog/129693.yaml b/docs/changelog/129693.yaml
@@ -0,0 +1,5 @@
+pr: 129693
+summary: Add top level normalizer for linear retriever
+area: Search
+type: enhancement
+issues: []
diff --git a/docs/reference/elasticsearch/rest-apis/retrievers/linear-retriever.md b/docs/reference/elasticsearch/rest-apis/retrievers/linear-retriever.md
@@ -31,9 +31,16 @@ Combining `query` and `retrievers` is not supported.
 `normalizer` {applies_to}`stack: ga 9.1`
 :   (Optional, String)
 
-    The normalizer to use when using the [multi-field query format](../retrievers.md#multi-field-query-format).
+    The top-level normalizer to use when combining results.
     See [normalizers](#linear-retriever-normalizers) for supported values.
     Required when `query` is specified.
+    
+    When used with the [multi-field query format](../retrievers.md#multi-field-query-format) (`query` parameter), normalizes scores per [field grouping](../retrievers.md#multi-field-field-grouping).
+    Otherwise serves as the default normalizer for any sub-retriever that doesn't specify its own normalizer. Per-retriever normalizers always take precedence over the top-level normalizer.
+
+    :::{note}
+    **Top-level normalizer support for sub-retrievers**: The ability to use a top-level normalizer as a default for sub-retrievers was introduced in Elasticsearch 9.2+. In earlier versions, only per-retriever normalizers are supported.
+    :::
 
     ::::{warning}
     Avoid using `none` as that will disable normalization and may bias the result set towards lexical matches.
@@ -74,9 +81,10 @@ Each entry in the `retrievers` array specifies the following parameters:
 `normalizer`
 :   (Optional, String)
 
-    Specifies how the retriever’s score will be normalized before applying the specified `weight`.
+    Specifies how the retriever's score will be normalized before applying the specified `weight`.
     See [normalizers](#linear-retriever-normalizers) for supported values.
-    Defaults to `none`.
+    If not specified, uses the top-level `normalizer` or defaults to `none` if no top-level normalizer is set.
+    {applies_to}`stack: ga 9.2`
 
 See also [this hybrid search example](retrievers-examples.md#retrievers-examples-linear-retriever) using a linear retriever on how to independently configure and apply normalizers to retrievers.
 
@@ -94,7 +102,7 @@ The `linear` retriever supports the following normalizers:
 
 ## Example
 
-This example of a hybrid search weights KNN results five times more heavily than BM25 results in the final ranking.
+This example of a hybrid search weights KNN results five times more heavily than BM25 results in the final ranking, with a top-level normalizer applied to all retrievers.
 
 ```console
 GET my_index/_search
@@ -105,23 +113,33 @@ GET my_index/_search
         {
           "retriever": {
             "knn": {
-              ...
+              "field": "title_vector",
+              "query_vector": [0.1, 0.2, 0.3],
+              "k": 10,
+              "num_candidates": 100
             }
           },
           "weight": 5 # KNN query weighted 5x
         },
         {
           "retriever": {
             "standard": {
-              ...
+              "query": {
+                "match": {
+                  "title": "elasticsearch"
+                }
+              }
             }
           },
           "weight": 1.5 # BM25 query weighted 1.5x
         }
-      ]
+      ],
+      "normalizer": "minmax"
     }
   }
 }
 ```
 
+In this example, the `minmax` normalizer is applied to both the kNN retriever and the standard retriever. The top-level normalizer serves as a default that can be overridden by individual sub-retrievers. When using the multi-field query format, the top-level normalizer is applied to all generated inner retrievers.
+
 See also [this hybrid search example](retrievers-examples.md#retrievers-examples-linear-retriever).
diff --git a/x-pack/plugin/rank-rrf/src/internalClusterTest/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverIT.java b/x-pack/plugin/rank-rrf/src/internalClusterTest/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverIT.java
@@ -835,4 +835,37 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
         );
         assertThat(numAsyncCalls.get(), equalTo(4));
     }
+
+    public void testMixedNormalizerInheritance() throws IOException {
+        client().prepareIndex(INDEX).setId("1").setSource("field1", "elasticsearch only", "field2", "no technology here").get();
+        client().prepareIndex(INDEX).setId("2").setSource("field1", "no elasticsearch", "field2", "technology only").get();
+        client().prepareIndex(INDEX).setId("3").setSource("field1", "search term", "field2", "no technology").get();
+        refresh(INDEX);
+
+        LinearRetrieverBuilder linearRetriever = new LinearRetrieverBuilder(
+            List.of(
+                CompoundRetrieverBuilder.RetrieverSource.from(
+                    new StandardRetrieverBuilder(QueryBuilders.matchQuery("field1", "elasticsearch"))
+                ),
+                CompoundRetrieverBuilder.RetrieverSource.from(
+                    new StandardRetrieverBuilder(QueryBuilders.matchQuery("field2", "technology"))
+                ),
+                CompoundRetrieverBuilder.RetrieverSource.from(new StandardRetrieverBuilder(QueryBuilders.matchQuery("field1", "search")))
+            ),
+            null,
+            null,
+            MinMaxScoreNormalizer.INSTANCE,
+            10,
+            new float[] { 1.0f, 1.0f, 1.0f },
+            new ScoreNormalizer[] { null, L2ScoreNormalizer.INSTANCE, null }
+        );
+
+        assertThat(linearRetriever.getNormalizers()[0], equalTo(MinMaxScoreNormalizer.INSTANCE));
+        assertThat(linearRetriever.getNormalizers()[1], equalTo(L2ScoreNormalizer.INSTANCE));
+        assertThat(linearRetriever.getNormalizers()[2], equalTo(MinMaxScoreNormalizer.INSTANCE));
+
+        assertResponse(client().prepareSearch(INDEX).setSource(new SearchSourceBuilder().retriever(linearRetriever)), searchResponse -> {
+            assertThat(searchResponse.getHits().getTotalHits().value(), equalTo(3L));
+        });
+    }
 }
diff --git a/x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java b/x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java
@@ -22,6 +22,7 @@
 public class RankRRFFeatures implements FeatureSpecification {
 
     public static final NodeFeature LINEAR_RETRIEVER_SUPPORTED = new NodeFeature("linear_retriever_supported");
+    public static final NodeFeature LINEAR_RETRIEVER_TOP_LEVEL_NORMALIZER = new NodeFeature("linear_retriever.top_level_normalizer");
 
     @Override
     public Set<NodeFeature> getFeatures() {
@@ -37,7 +38,8 @@ public Set<NodeFeature> getTestFeatures() {
             LINEAR_RETRIEVER_MINSCORE_FIX,
             LinearRetrieverBuilder.MULTI_FIELDS_QUERY_FORMAT_SUPPORT,
             RRFRetrieverBuilder.MULTI_FIELDS_QUERY_FORMAT_SUPPORT,
-            RRFRetrieverBuilder.WEIGHTED_SUPPORT
+            RRFRetrieverBuilder.WEIGHTED_SUPPORT,
+            LINEAR_RETRIEVER_TOP_LEVEL_NORMALIZER
         );
     }
 }
diff --git a/x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java b/x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java
@@ -43,6 +43,7 @@
 import static org.elasticsearch.action.ValidateActions.addValidationError;
 import static org.elasticsearch.xcontent.ConstructingObjectParser.optionalConstructorArg;
 import static org.elasticsearch.xpack.rank.RankRRFFeatures.LINEAR_RETRIEVER_SUPPORTED;
+import static org.elasticsearch.xpack.rank.linear.LinearRetrieverComponent.DEFAULT_NORMALIZER;
 import static org.elasticsearch.xpack.rank.linear.LinearRetrieverComponent.DEFAULT_WEIGHT;
 
 /**
@@ -92,7 +93,7 @@ public final class LinearRetrieverBuilder extends CompoundRetrieverBuilder<Linea
             for (LinearRetrieverComponent component : retrieverComponents) {
                 innerRetrievers.add(RetrieverSource.from(component.retriever));
                 weights[index] = component.weight;
-                normalizers[index] = component.normalizer;
+                normalizers[index] = resolveNormalizer(component.normalizer, normalizer);
                 index++;
             }
             return new LinearRetrieverBuilder(innerRetrievers, fields, query, normalizer, rankWindowSize, weights, normalizers);
@@ -118,10 +119,20 @@ private static float[] getDefaultWeight(List<RetrieverSource> innerRetrievers) {
     private static ScoreNormalizer[] getDefaultNormalizers(List<RetrieverSource> innerRetrievers) {
         int size = innerRetrievers != null ? innerRetrievers.size() : 0;
         ScoreNormalizer[] normalizers = new ScoreNormalizer[size];
-        Arrays.fill(normalizers, IdentityScoreNormalizer.INSTANCE);
+        Arrays.fill(normalizers, DEFAULT_NORMALIZER);
         return normalizers;
     }
 
+    private static ScoreNormalizer resolveNormalizer(ScoreNormalizer componentNormalizer, ScoreNormalizer topLevelNormalizer) {
+        if (componentNormalizer != null) {
+            return componentNormalizer;
+        }
+        if (topLevelNormalizer != null) {
+            return topLevelNormalizer;
+        }
+        return DEFAULT_NORMALIZER;
+    }
+
     public static LinearRetrieverBuilder fromXContent(XContentParser parser, RetrieverParserContext context) throws IOException {
         if (context.clusterSupportsFeature(LINEAR_RETRIEVER_SUPPORTED) == false) {
             throw new ParsingException(parser.getTokenLocation(), "unknown retriever [" + NAME + "]");
@@ -167,7 +178,10 @@ public LinearRetrieverBuilder(
         this.query = query;
         this.normalizer = normalizer;
         this.weights = weights;
-        this.normalizers = normalizers;
+        this.normalizers = new ScoreNormalizer[normalizers.length];
+        for (int i = 0; i < normalizers.length; i++) {
+            this.normalizers[i] = resolveNormalizer(normalizers[i], normalizer);
+        }
     }
 
     public LinearRetrieverBuilder(
@@ -221,19 +235,7 @@ public ActionRequestValidationException validate(
                 ),
                 validationException
             );
-        } else if (innerRetrievers.isEmpty() == false && normalizer != null) {
-            validationException = addValidationError(
-                String.format(
-                    Locale.ROOT,
-                    "[%s] [%s] cannot be provided when [%s] is specified",
-                    getName(),
-                    NORMALIZER_FIELD.getPreferredName(),
-                    RETRIEVERS_FIELD.getPreferredName()
-                ),
-                validationException
-            );
         }
-
         return validationException;
     }
 
diff --git a/x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverComponent.java b/x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverComponent.java
@@ -38,7 +38,7 @@ public LinearRetrieverComponent(RetrieverBuilder retrieverBuilder, Float weight,
         assert retrieverBuilder != null;
         this.retriever = retrieverBuilder;
         this.weight = weight == null ? DEFAULT_WEIGHT : weight;
-        this.normalizer = normalizer == null ? DEFAULT_NORMALIZER : normalizer;
+        this.normalizer = normalizer; // Don't default to identity, allow null for top-level fallback
         if (this.weight < 0) {
             throw new IllegalArgumentException("[weight] must be non-negative");
         }
diff --git a/x-pack/plugin/rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderParsingTests.java b/x-pack/plugin/rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderParsingTests.java
@@ -19,6 +19,7 @@
 import org.elasticsearch.xcontent.NamedXContentRegistry;
 import org.elasticsearch.xcontent.ParseField;
 import org.elasticsearch.xcontent.XContentParser;
+import org.elasticsearch.xcontent.XContentType;
 import org.junit.AfterClass;
 import org.junit.BeforeClass;
 
@@ -27,10 +28,17 @@
 import java.util.List;
 
 import static java.util.Collections.emptyList;
+import static org.hamcrest.Matchers.instanceOf;
 
 public class LinearRetrieverBuilderParsingTests extends AbstractXContentTestCase<LinearRetrieverBuilder> {
     private static List<NamedXContentRegistry.Entry> xContentRegistryEntries;
 
+    private static final ScoreNormalizer[] SCORE_NORMALIZERS = new ScoreNormalizer[] {
+        null,
+        MinMaxScoreNormalizer.INSTANCE,
+        L2ScoreNormalizer.INSTANCE,
+        IdentityScoreNormalizer.INSTANCE };
+
     @BeforeClass
     public static void init() {
         xContentRegistryEntries = new SearchModule(Settings.EMPTY, emptyList()).getNamedXContents();
@@ -108,10 +116,46 @@ protected NamedXContentRegistry xContentRegistry() {
     }
 
     private static ScoreNormalizer randomScoreNormalizer() {
-        if (randomBoolean()) {
-            return MinMaxScoreNormalizer.INSTANCE;
-        } else {
-            return IdentityScoreNormalizer.INSTANCE;
+        return randomFrom(SCORE_NORMALIZERS);
+    }
+
+    public void testTopLevelNormalizer() throws IOException {
+        String json = """
+            {
+              "linear": {
+                "retrievers": [
+                  {
+                    "retriever": {
+                      "test": {
+                        "value": "test1"
+                      }
+                    },
+                    "weight": 1.0,
+                    "normalizer": "none"
+                  },
+                  {
+                    "retriever": {
+                      "test": {
+                        "value": "test2"
+                      }
+                    },
+                    "weight": 1.0,
+                    "normalizer": "none"
+                  }
+                ],
+                "normalizer": "minmax"
+              }
+            }""";
+
+        try (XContentParser parser = createParser(XContentType.JSON.xContent(), json)) {
+            LinearRetrieverBuilder builder = doParseInstance(parser);
+            // Test that the top-level normalizer is properly applied - the individual
+            // Per-retriever 'none' should override top-level 'minmax'
+            ScoreNormalizer[] normalizers = builder.getNormalizers();
+            assertEquals(2, normalizers.length);
+            for (ScoreNormalizer normalizer : normalizers) {
+                assertThat(normalizer, instanceOf(IdentityScoreNormalizer.class));
+            }
         }
     }
 }
diff --git a/x-pack/plugin/rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderTests.java b/x-pack/plugin/rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderTests.java
diff --git a/x-pack/plugin/rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/10_linear_retriever.yml b/x-pack/plugin/rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/10_linear_retriever.yml
diff --git a/x-pack/plugin/rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml b/x-pack/plugin/rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml

Original file line number	Diff line number	Diff line change
`@@ -38,7 +38,7 @@ public LinearRetrieverComponent(RetrieverBuilder retrieverBuilder, Float weight,`
`38`	`38`	`assert retrieverBuilder != null;`
`39`	`39`	`this.retriever = retrieverBuilder;`
`40`	`40`	`this.weight = weight == null ? DEFAULT_WEIGHT : weight;`
`41`		`- this.normalizer = normalizer == null ? DEFAULT_NORMALIZER : normalizer;`
	`41`	`+ this.normalizer = normalizer; // Don't default to identity, allow null for top-level fallback`
`42`	`42`	`if (this.weight < 0) {`
`43`	`43`	`throw new IllegalArgumentException("[weight] must be non-negative");`
`44`	`44`	`}`