Skip to content

Commit f84ec74

Browse files
mridula-s109elasticsearchmachinekderussoclaude
authored
Linear retriever top level option for normalizer (elastic#129693)
* Per component normalizer is removed * Modified LinearRetrieverBuilder to propagate top level normalizer to each and every sub level * [CI] Auto commit changes from spotless * Component is modified * Retriever builder is also modified according to the new changes: * [CI] Auto commit changes from spotless * Spotless check done * Code changes made * FIX: Cast rewritten builder in LinearRetrieverBuilder * modified the builder * Update retrievers.md * Update retrievers.md * Update docs/changelog/129693.yaml * Update docs/changelog/129693.yaml Co-authored-by: Kathleen DeRusso <[email protected]> * Update retrievers.md * WIP * [CI] Auto commit changes from spotless * WIP * Resolved errors * Fixed the retrievers * Reverted it to main * reverted * cleaned up * cleaned it up * Modified and cleaned code * Compilation and styling clean * Parsing issues resolved * Unittestspass but parsing issue equality persists * Parsing and builder tests are passing * Add comprehensive normalizer testing and cleanup duplicate files - Add extensive YAML REST tests covering 12 normalizer scenarios in 10_linear_retriever.yml - Add end-to-end integration test for mixed normalizer inheritance - Remove duplicate 10_linear_retriever_normalizers.yml file as requested - Cover edge cases: zero scores, large differences, error handling, field+query format - Ensure robust testing for production-level quality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Apply spotless code formatting * Final changes * Removed unnecessary changes from doc * Cleaned up test * Cleaned up * Reviewed the code * Cleaned up comments' * Reverted RetrieverBuilder * Cleaned up yaml * Fixed samuel comments * Worked on Michael comments * Reverted the retriever example change * The test was modified to include equalioty check * cleaned up resolve normalizer * optimised the parsing test * cleaned up duplicates * Added cluster features * Modified docs * worked on all the changes * Update 10_linear_retriever.yml * Nitpicks and some other enhancement comments resolved --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Kathleen DeRusso <[email protected]> Co-authored-by: Claude <[email protected]>
1 parent 098eb51 commit f84ec74

File tree

10 files changed

+777
-47
lines changed

10 files changed

+777
-47
lines changed

docs/changelog/129693.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 129693
2+
summary: Add top level normalizer for linear retriever
3+
area: Search
4+
type: enhancement
5+
issues: []

docs/reference/elasticsearch/rest-apis/retrievers/linear-retriever.md

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,16 @@ Combining `query` and `retrievers` is not supported.
3131
`normalizer` {applies_to}`stack: ga 9.1`
3232
: (Optional, String)
3333

34-
The normalizer to use when using the [multi-field query format](../retrievers.md#multi-field-query-format).
34+
The top-level normalizer to use when combining results.
3535
See [normalizers](#linear-retriever-normalizers) for supported values.
3636
Required when `query` is specified.
37+
38+
When used with the [multi-field query format](../retrievers.md#multi-field-query-format) (`query` parameter), normalizes scores per [field grouping](../retrievers.md#multi-field-field-grouping).
39+
Otherwise serves as the default normalizer for any sub-retriever that doesn't specify its own normalizer. Per-retriever normalizers always take precedence over the top-level normalizer.
40+
41+
:::{note}
42+
**Top-level normalizer support for sub-retrievers**: The ability to use a top-level normalizer as a default for sub-retrievers was introduced in Elasticsearch 9.2+. In earlier versions, only per-retriever normalizers are supported.
43+
:::
3744

3845
::::{warning}
3946
Avoid using `none` as that will disable normalization and may bias the result set towards lexical matches.
@@ -74,9 +81,10 @@ Each entry in the `retrievers` array specifies the following parameters:
7481
`normalizer`
7582
: (Optional, String)
7683

77-
Specifies how the retrievers score will be normalized before applying the specified `weight`.
84+
Specifies how the retriever's score will be normalized before applying the specified `weight`.
7885
See [normalizers](#linear-retriever-normalizers) for supported values.
79-
Defaults to `none`.
86+
If not specified, uses the top-level `normalizer` or defaults to `none` if no top-level normalizer is set.
87+
{applies_to}`stack: ga 9.2`
8088

8189
See also [this hybrid search example](retrievers-examples.md#retrievers-examples-linear-retriever) using a linear retriever on how to independently configure and apply normalizers to retrievers.
8290

@@ -94,7 +102,7 @@ The `linear` retriever supports the following normalizers:
94102
95103
## Example
96104
97-
This example of a hybrid search weights KNN results five times more heavily than BM25 results in the final ranking.
105+
This example of a hybrid search weights KNN results five times more heavily than BM25 results in the final ranking, with a top-level normalizer applied to all retrievers.
98106
99107
```console
100108
GET my_index/_search
@@ -105,23 +113,33 @@ GET my_index/_search
105113
{
106114
"retriever": {
107115
"knn": {
108-
...
116+
"field": "title_vector",
117+
"query_vector": [0.1, 0.2, 0.3],
118+
"k": 10,
119+
"num_candidates": 100
109120
}
110121
},
111122
"weight": 5 # KNN query weighted 5x
112123
},
113124
{
114125
"retriever": {
115126
"standard": {
116-
...
127+
"query": {
128+
"match": {
129+
"title": "elasticsearch"
130+
}
131+
}
117132
}
118133
},
119134
"weight": 1.5 # BM25 query weighted 1.5x
120135
}
121-
]
136+
],
137+
"normalizer": "minmax"
122138
}
123139
}
124140
}
125141
```
126142

143+
In this example, the `minmax` normalizer is applied to both the kNN retriever and the standard retriever. The top-level normalizer serves as a default that can be overridden by individual sub-retrievers. When using the multi-field query format, the top-level normalizer is applied to all generated inner retrievers.
144+
127145
See also [this hybrid search example](retrievers-examples.md#retrievers-examples-linear-retriever).

x-pack/plugin/rank-rrf/src/internalClusterTest/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverIT.java

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -835,4 +835,37 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
835835
);
836836
assertThat(numAsyncCalls.get(), equalTo(4));
837837
}
838+
839+
public void testMixedNormalizerInheritance() throws IOException {
840+
client().prepareIndex(INDEX).setId("1").setSource("field1", "elasticsearch only", "field2", "no technology here").get();
841+
client().prepareIndex(INDEX).setId("2").setSource("field1", "no elasticsearch", "field2", "technology only").get();
842+
client().prepareIndex(INDEX).setId("3").setSource("field1", "search term", "field2", "no technology").get();
843+
refresh(INDEX);
844+
845+
LinearRetrieverBuilder linearRetriever = new LinearRetrieverBuilder(
846+
List.of(
847+
CompoundRetrieverBuilder.RetrieverSource.from(
848+
new StandardRetrieverBuilder(QueryBuilders.matchQuery("field1", "elasticsearch"))
849+
),
850+
CompoundRetrieverBuilder.RetrieverSource.from(
851+
new StandardRetrieverBuilder(QueryBuilders.matchQuery("field2", "technology"))
852+
),
853+
CompoundRetrieverBuilder.RetrieverSource.from(new StandardRetrieverBuilder(QueryBuilders.matchQuery("field1", "search")))
854+
),
855+
null,
856+
null,
857+
MinMaxScoreNormalizer.INSTANCE,
858+
10,
859+
new float[] { 1.0f, 1.0f, 1.0f },
860+
new ScoreNormalizer[] { null, L2ScoreNormalizer.INSTANCE, null }
861+
);
862+
863+
assertThat(linearRetriever.getNormalizers()[0], equalTo(MinMaxScoreNormalizer.INSTANCE));
864+
assertThat(linearRetriever.getNormalizers()[1], equalTo(L2ScoreNormalizer.INSTANCE));
865+
assertThat(linearRetriever.getNormalizers()[2], equalTo(MinMaxScoreNormalizer.INSTANCE));
866+
867+
assertResponse(client().prepareSearch(INDEX).setSource(new SearchSourceBuilder().retriever(linearRetriever)), searchResponse -> {
868+
assertThat(searchResponse.getHits().getTotalHits().value(), equalTo(3L));
869+
});
870+
}
838871
}

x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
public class RankRRFFeatures implements FeatureSpecification {
2323

2424
public static final NodeFeature LINEAR_RETRIEVER_SUPPORTED = new NodeFeature("linear_retriever_supported");
25+
public static final NodeFeature LINEAR_RETRIEVER_TOP_LEVEL_NORMALIZER = new NodeFeature("linear_retriever.top_level_normalizer");
2526

2627
@Override
2728
public Set<NodeFeature> getFeatures() {
@@ -37,7 +38,8 @@ public Set<NodeFeature> getTestFeatures() {
3738
LINEAR_RETRIEVER_MINSCORE_FIX,
3839
LinearRetrieverBuilder.MULTI_FIELDS_QUERY_FORMAT_SUPPORT,
3940
RRFRetrieverBuilder.MULTI_FIELDS_QUERY_FORMAT_SUPPORT,
40-
RRFRetrieverBuilder.WEIGHTED_SUPPORT
41+
RRFRetrieverBuilder.WEIGHTED_SUPPORT,
42+
LINEAR_RETRIEVER_TOP_LEVEL_NORMALIZER
4143
);
4244
}
4345
}

x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
import static org.elasticsearch.action.ValidateActions.addValidationError;
4444
import static org.elasticsearch.xcontent.ConstructingObjectParser.optionalConstructorArg;
4545
import static org.elasticsearch.xpack.rank.RankRRFFeatures.LINEAR_RETRIEVER_SUPPORTED;
46+
import static org.elasticsearch.xpack.rank.linear.LinearRetrieverComponent.DEFAULT_NORMALIZER;
4647
import static org.elasticsearch.xpack.rank.linear.LinearRetrieverComponent.DEFAULT_WEIGHT;
4748

4849
/**
@@ -92,7 +93,7 @@ public final class LinearRetrieverBuilder extends CompoundRetrieverBuilder<Linea
9293
for (LinearRetrieverComponent component : retrieverComponents) {
9394
innerRetrievers.add(RetrieverSource.from(component.retriever));
9495
weights[index] = component.weight;
95-
normalizers[index] = component.normalizer;
96+
normalizers[index] = resolveNormalizer(component.normalizer, normalizer);
9697
index++;
9798
}
9899
return new LinearRetrieverBuilder(innerRetrievers, fields, query, normalizer, rankWindowSize, weights, normalizers);
@@ -118,10 +119,20 @@ private static float[] getDefaultWeight(List<RetrieverSource> innerRetrievers) {
118119
private static ScoreNormalizer[] getDefaultNormalizers(List<RetrieverSource> innerRetrievers) {
119120
int size = innerRetrievers != null ? innerRetrievers.size() : 0;
120121
ScoreNormalizer[] normalizers = new ScoreNormalizer[size];
121-
Arrays.fill(normalizers, IdentityScoreNormalizer.INSTANCE);
122+
Arrays.fill(normalizers, DEFAULT_NORMALIZER);
122123
return normalizers;
123124
}
124125

126+
private static ScoreNormalizer resolveNormalizer(ScoreNormalizer componentNormalizer, ScoreNormalizer topLevelNormalizer) {
127+
if (componentNormalizer != null) {
128+
return componentNormalizer;
129+
}
130+
if (topLevelNormalizer != null) {
131+
return topLevelNormalizer;
132+
}
133+
return DEFAULT_NORMALIZER;
134+
}
135+
125136
public static LinearRetrieverBuilder fromXContent(XContentParser parser, RetrieverParserContext context) throws IOException {
126137
if (context.clusterSupportsFeature(LINEAR_RETRIEVER_SUPPORTED) == false) {
127138
throw new ParsingException(parser.getTokenLocation(), "unknown retriever [" + NAME + "]");
@@ -167,7 +178,10 @@ public LinearRetrieverBuilder(
167178
this.query = query;
168179
this.normalizer = normalizer;
169180
this.weights = weights;
170-
this.normalizers = normalizers;
181+
this.normalizers = new ScoreNormalizer[normalizers.length];
182+
for (int i = 0; i < normalizers.length; i++) {
183+
this.normalizers[i] = resolveNormalizer(normalizers[i], normalizer);
184+
}
171185
}
172186

173187
public LinearRetrieverBuilder(
@@ -221,19 +235,7 @@ public ActionRequestValidationException validate(
221235
),
222236
validationException
223237
);
224-
} else if (innerRetrievers.isEmpty() == false && normalizer != null) {
225-
validationException = addValidationError(
226-
String.format(
227-
Locale.ROOT,
228-
"[%s] [%s] cannot be provided when [%s] is specified",
229-
getName(),
230-
NORMALIZER_FIELD.getPreferredName(),
231-
RETRIEVERS_FIELD.getPreferredName()
232-
),
233-
validationException
234-
);
235238
}
236-
237239
return validationException;
238240
}
239241

x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverComponent.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ public LinearRetrieverComponent(RetrieverBuilder retrieverBuilder, Float weight,
3838
assert retrieverBuilder != null;
3939
this.retriever = retrieverBuilder;
4040
this.weight = weight == null ? DEFAULT_WEIGHT : weight;
41-
this.normalizer = normalizer == null ? DEFAULT_NORMALIZER : normalizer;
41+
this.normalizer = normalizer; // Don't default to identity, allow null for top-level fallback
4242
if (this.weight < 0) {
4343
throw new IllegalArgumentException("[weight] must be non-negative");
4444
}

x-pack/plugin/rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderParsingTests.java

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
import org.elasticsearch.xcontent.NamedXContentRegistry;
2020
import org.elasticsearch.xcontent.ParseField;
2121
import org.elasticsearch.xcontent.XContentParser;
22+
import org.elasticsearch.xcontent.XContentType;
2223
import org.junit.AfterClass;
2324
import org.junit.BeforeClass;
2425

@@ -27,10 +28,17 @@
2728
import java.util.List;
2829

2930
import static java.util.Collections.emptyList;
31+
import static org.hamcrest.Matchers.instanceOf;
3032

3133
public class LinearRetrieverBuilderParsingTests extends AbstractXContentTestCase<LinearRetrieverBuilder> {
3234
private static List<NamedXContentRegistry.Entry> xContentRegistryEntries;
3335

36+
private static final ScoreNormalizer[] SCORE_NORMALIZERS = new ScoreNormalizer[] {
37+
null,
38+
MinMaxScoreNormalizer.INSTANCE,
39+
L2ScoreNormalizer.INSTANCE,
40+
IdentityScoreNormalizer.INSTANCE };
41+
3442
@BeforeClass
3543
public static void init() {
3644
xContentRegistryEntries = new SearchModule(Settings.EMPTY, emptyList()).getNamedXContents();
@@ -108,10 +116,46 @@ protected NamedXContentRegistry xContentRegistry() {
108116
}
109117

110118
private static ScoreNormalizer randomScoreNormalizer() {
111-
if (randomBoolean()) {
112-
return MinMaxScoreNormalizer.INSTANCE;
113-
} else {
114-
return IdentityScoreNormalizer.INSTANCE;
119+
return randomFrom(SCORE_NORMALIZERS);
120+
}
121+
122+
public void testTopLevelNormalizer() throws IOException {
123+
String json = """
124+
{
125+
"linear": {
126+
"retrievers": [
127+
{
128+
"retriever": {
129+
"test": {
130+
"value": "test1"
131+
}
132+
},
133+
"weight": 1.0,
134+
"normalizer": "none"
135+
},
136+
{
137+
"retriever": {
138+
"test": {
139+
"value": "test2"
140+
}
141+
},
142+
"weight": 1.0,
143+
"normalizer": "none"
144+
}
145+
],
146+
"normalizer": "minmax"
147+
}
148+
}""";
149+
150+
try (XContentParser parser = createParser(XContentType.JSON.xContent(), json)) {
151+
LinearRetrieverBuilder builder = doParseInstance(parser);
152+
// Test that the top-level normalizer is properly applied - the individual
153+
// Per-retriever 'none' should override top-level 'minmax'
154+
ScoreNormalizer[] normalizers = builder.getNormalizers();
155+
assertEquals(2, normalizers.length);
156+
for (ScoreNormalizer normalizer : normalizers) {
157+
assertThat(normalizer, instanceOf(IdentityScoreNormalizer.class));
158+
}
115159
}
116160
}
117161
}

0 commit comments

Comments
 (0)