Esql mv rerank by afoucret · Pull Request #7 · afoucret/elasticsearch

afoucret · 2026-01-13T13:30:43Z

Have you signed the contributor license agreement?
Have you followed the contributor guidelines?
If submitting code, have you built your formula locally prior to submission with gradle check?
If submitting code, is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
If submitting code, have you checked that your submission is for an OS and architecture that we support?
If you are submitting this code for a class then read our policy for that.

Note

ES|QL: Rerank operator extended for multi‑value fields with new examples (incl. TOP_SNIPPETS); vector similarity functions and TEXT_EMBEDDING/KNN docs promoted to GA; tutorial and metadata docs updated; changelog entries added.
Benchmarks: New TSDB codec encode/decode benchmarks for multiple data patterns; existing benchmarks refactored to use dynamic getBlockSize().
Compression/Histogram: Upgrade native zstd to 1.5.7; add ExponentialHistogramUtils.removeMergeNoise helper and tests; expose DEFAULT_MAX_HISTOGRAM_BUCKETS.
Allocator: Internal refactor in BalancedShardsAllocator/NodeSorter to carry threshold via sorter.
REST API: Remove project_routing param from search and async_search.submit specs.
Docker/IronBank: Add IronBank Dockerfile, switch path in updatecli; bump UBI base tag to 9.7; update Wolfi/FIPS image digests; hardening manifest updated.
Tests: TSDB synthetic IDs snapshot/restore coverage; snapshot shutdown progress logging by node role; snapshot metrics tweaks; muted tests list updated.
Docs/Release notes: BBQ and ILM docs tweaks; known issues; mark 9.2.4/9.1.10 released.

^{Written by Cursor Bugbot for commit 1335684. This will update automatically on new commits. Configure here.}

…stic#139058) * remove implicit limit appended to each subquery

Continuation of elastic#139797, adding more tests for timeseries

…stic#140562) Update IronBank Dockerfile path in updatecli configuration and bump the oblt-updatecli-policies/ironbank/templates version.

…astic#140528) This PR removes the snapshot protection of FAIL and NULLIFY options for unmapped fields (only LOAD remains protected under snapshot). Follow up to elastic#140463. Related: elastic#138888.

…es aggregations (elastic#140594) closes elastic#140586

Examples of queries that are supported now: * `network.bytes_in * 8` * `network.eth0.rx + network.eth0.tx` * `max(network.total_bytes_in) * 8` * `network.total_bytes_in{cluster!="prod"} / network.total_bytes_in{cluster!="staging"}` Follow-up from elastic#140135

…sts (elastic#140649)

…-spec:string.Url_encode_component tests with table reads} elastic#140621

…dexingStandardSource elastic#140658

…dexingSyntheticSource elastic#139482

Test verifies that we can still search by id and all documents are present after restoring index from snapshot.

…vailable (elastic#140633)

…stManyRandomTextFieldsInSubqueryIntermediateResultsWithSortManyFields elastic#140664

…erations elastic#140665

…h instead of the string length.

Makes GetInferenceFieldsAction an indices action dependent on the indices read permission. This allows the action to be executed by users with read access to the indices queried. --------- Co-authored-by: Elliot Barlas <elliot.barlas@elastic.co>

…edFieldsWrongValue elastic#140673

* Finalize docs for v9.1.10 release * Update breaking-changes.md * Fix heading formatting for deprecations in release notes * Update index.md --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Charlotte Hoblik <116336412+charlotte-hoblik@users.noreply.github.com>

* Finalize docs for v9.2.4 release * Update breaking changes for version 9.1.10 and 9.2.4 * Update deprecations.md * Revise release notes for Elasticsearch 9.2.4 Updated release notes to reflect changes from version 9.1.10 to 9.2.4, including features, enhancements, and fixes. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Charlotte Hoblik <116336412+charlotte-hoblik@users.noreply.github.com>

…stic#140183) We have observed some edge cases where many inference failures can cause OOMs in ShardBulkInferenceActionFilter. This PR addresses this edge case by deduplicating the failures stored in memory.

…40625) This is already called out, but only at the very end in a section. This adds it right underneath the documentation for the `max_primary_shard_docs` configuration parameter.

Zstd version 1.5.7 improved the decompression speed for small blocks: https://github.com/facebook/zstd/releases/tag/v1.5.7 . We limit binary doc value blocks to a maximum of 1024 values, to reduce the performance impact of decompressing a whole block when only a few values are needed. For small values, this can result in small blocks, which are inefficient to decompress. The Zstd improvement will help mitigate this issue.

During hollowing, segment info files (.si) are replicated into the hollow commit blob which can trigger GETs to referenced blobs. This patch optimize the process by reading segment info from memory instead of performing GET requests. An issue is that segment info serialization is not deterministic as segment info map fields serialization is linked to their internal order which can change. To solve that problem, the patch enforce a map serialization order for segment info map fields (diagnostics and attributes). Closes ES-13399

Modifies SnapshotShardsService to stop logging snapshot shutting down progress on search nodes on serverless, since they do not have snapshots . This limits the functionality to indexing nodes only. Relates: ES-13363

…astic#140689) Fixes elastic#140639

…t limit (elastic#140715)

…tic#140692)

…s testPushDownMetadataTierInOrOperator {default} elastic#140750

…s testPushDownMetadataTierInOperator {default} elastic#140751

…s testPushDownMetadataTierInAndNotOperator {default} elastic#140752

…ingSnapshot elastic#140753

…ingSnapshotWithOtherRunningShardSnapshots elastic#140755

…deButDoNotComplete elastic#140759

…WhileRemovingNode elastic#140760

…SuccessInFlight elastic#140761

…ownProgressTracker elastic#140762

…GroupingAggregatorFunctionTests testSimpleWithCranky elastic#140763

Backports for elastic#139910 were released in different releases, causing some upgrade paths to be broken. This commit adds a note about a failure that can occur between 9.1.10 and 9.2.4.

This PR updates the FlattenedFieldMapper to use binary doc values instead of sorted set doc values

cursor · 2026-01-15T17:49:34Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/SnapshotShutdownIT.java

+                Level.INFO,
+                "Shard snapshot completion stats since shutdown began*"
+            );
+        snapshotShutdownProgressTrackerToNotRunExpectation.awaitMatched(1000);


Test expectations never registered with mock logger

Medium Severity

The PatternNotSeenEventExpectation objects in testStatefulNodesThatDoNotContainDataDoesNotLogSnapshotShuttingDownProgress and testStatefulCoordinatingOnlyNodeDoesNotLogSnapshotShuttingDownProgress are created as local variables but never added to mockLog via mockLog.addExpectation(...). When mockLog.assertAllExpectationsMatched() is called at the end, it only checks expectations in its internal list, which doesn't include these local expectations. The tests will always pass without actually verifying that log messages were not produced.

Additional Locations (1)

server/src/internalClusterTest/java/org/elasticsearch/snapshots/SnapshotShutdownIT.java#L797-L805

…tic#140027) This PR fixes the issue where `INLINE STATS GROUP BY null` was being incorrectly pruned by `PruneLeftJoinOnNullMatchingField`. Fixes elastic#139887 ## Problem For query: ``` FROM employees | INLINE STATS c = COUNT(*) BY n = null | KEEP c, n | LIMIT 3 ``` During `LogicalPlanOptimizer`: ``` Limit[3[INTEGER],false,false] \_EsqlProject[[c{r}#2, n{r}#4]] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}#7, n{r}#4]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}#7, n{r}#4]] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is an `Aggregate` (originating from `INLINE STATS`). Since `STATS` supports `GROUP BY null`, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics. During `LocalLogicalPlanOptimizer`: ``` ProjectExec[[c{r}#2, n{r}#4]] \_LimitExec[3[INTEGER],null] \_ExchangeExec[[c{r}#2, n{r}#4],false] \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<> Project[[c{r}#2, n{r}#4]] \_Limit[3[INTEGER],false,false] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is a `LocalRelation` (the `Aggregate` was optimized into a `LocalRelation` containing the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in the `LocalRelation`, incorrectly producing null values instead of the expected count. ## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`. For `INLINE STATS` joins, the right side can be: - `Aggregate` (before optimization), or - `LocalRelation` (after the aggregate is optimized) By checking `join.right() instanceof EsRelation`, we correctly skip the pruning optimization for `INLINE STATS` joins, preserving the expected query results when grouping by null.

fang-xing-esql and others added 21 commits January 14, 2026 07:46

[ES|QL] Remove implicit limit appended to each subquery branch (ela…

f207e27

…stic#139058) * remove implicit limit appended to each subquery

ESQL: Added more null aggs tests on RATE (elastic#140585)

5c046f6

Continuation of elastic#139797, adding more tests for timeseries

Make METADATA _tier attribute snapshot only (elastic#140578)

8c69439

[ES|QL] Refactor inference operator architecture for multi-value fiel…

aa0f003

…d support (elastic#139694)

Fix updatecli configuration for updating Iron Bank docker images (ela…

f16f955

…stic#140562) Update IronBank Dockerfile path in updatecli configuration and bump the oblt-updatecli-policies/ironbank/templates version.

Drop CPS project_routing param for search (elastic#140640)

c4fe99d

ES|QL: Add exponential_histogram merge aggregator tests (elastic#140563)

1ac4f67

ESQL: Enable nullify and fail unmapped resolution in tech-preview (el…

e0a15c4

…astic#140528) This PR removes the snapshot protection of FAIL and NULLIFY options for unmapped fields (only LOAD remains protected under snapshot). Follow up to elastic#140463. Related: elastic#138888.

PromQL: only accept children that return a range vector in cross seri…

7f7fdec

…es aggregations (elastic#140594) closes elastic#140586

[TEST] Don't inject boundary tuples for rate calculation when one exi…

0d0a9b2

…sts (elastic#140649)

Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {csv…

77f5fd7

…-spec:string.Url_encode_component tests with table reads} elastic#140621

[ES|QL] Text embedding function GA (elastic#140555)

d20ffd4

Mute org.elasticsearch.xpack.logsdb.RandomizedRollingUpgradeIT testIn…

1bab457

…dexingStandardSource elastic#140658

Mute org.elasticsearch.xpack.logsdb.RandomizedRollingUpgradeIT testIn…

1ef764f

…dexingSyntheticSource elastic#139482

Test snapshot with synthetic id (elastic#140458)

ed365c2

Test verifies that we can still search by id and all documents are present after restoring index from snapshot.

Unmute + add logging for CrossClusterCancellationIT.testCancelSkipUna…

72779d3

…vailable (elastic#140633)

Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackSubqueryIT te…

6cc84ea

…stManyRandomTextFieldsInSubqueryIntermediateResultsWithSortManyFields elastic#140664

Mute org.elasticsearch.datastreams.TSDBSyntheticIdsIT testRecoveredOp…

bf787fa

…erations elastic#140665

Update the way inference test service compute rerank score: use a has…

5d707e6

…h instead of the string length.

Implement rerank using muli-value fields.

d2ab99d

afoucret force-pushed the esql-mv-rerank branch from 1758c43 to d2ab99d Compare January 14, 2026 16:29

Mikep86 and others added 8 commits January 14, 2026 11:35

Update docs/changelog/140672.yaml

e78a783

Mute org.elasticsearch.xpack.esql.parser.SetParserTests testSetUnmapp…

ad29c09

…edFieldsWrongValue elastic#140673

Deduplicate Inference Failures in ShardBulkInferenceActionFilter (ela…

812006d

…stic#140183) We have observed some edge cases where many inference failures can cause OOMs in ShardBulkInferenceActionFilter. This PR addresses this edge case by deduplicating the failures stored in memory.

Add a callout for the 200M limit to max_primary_shard_docs (elastic#1…

f054060

…40625) This is already called out, but only at the very end in a section. This adds it right underneath the documentation for the `max_primary_shard_docs` configuration parameter.

Removing useIlm method from MachineLearningExtension (elastic#140128)

a26808f

afoucret and others added 27 commits January 15, 2026 14:12

Fixing bad field name in tests.

5c16a38

Merge branch 'main' into esql-mv-rerank

eaf4b55

Add more info in testTimeSeriesQuerying (elastic#140716)

9f07289

Fix testReadBlobWithReadTimeouts retries count (elastic#139999)

9e9d277

Small fix on tests.

54e1ab7

Feature/promql add integration tests batch4 (elastic#140560)

6668004

Snapshot shutdown progress tracker test fix (elastic#139447)

76e9a5a

Modifies SnapshotShardsService to stop logging snapshot shutting down progress on search nodes on serverless, since they do not have snapshots . This limits the functionality to indexing nodes only. Relates: ES-13363

Enable extended doc values parameters feature flag for ESQL tests (el…

f4ea30e

…astic#140689) Fixes elastic#140639

Refactor: Use single constant for default exponential histogram bucke…

ffbbb39

…t limit (elastic#140715)

Update the doc.

a5ef68d

ESQL: allow empty results (elastic#139181)

a7d8bdd

Remove all usages of TransportVersionUtils.randomVersionBetween (elas…

628dbe3

…tic#140692)

Mute org.elasticsearch.xpack.esql.optimizer.PhysicalPlanOptimizerTest…

ac66d2f

…s testPushDownMetadataTierInOrOperator {default} elastic#140750

Mute org.elasticsearch.xpack.esql.optimizer.PhysicalPlanOptimizerTest…

5cda4dd

…s testPushDownMetadataTierInOperator {default} elastic#140751

Mute org.elasticsearch.xpack.esql.optimizer.PhysicalPlanOptimizerTest…

c5bcd6c

…s testPushDownMetadataTierInAndNotOperator {default} elastic#140752

Mute org.elasticsearch.snapshots.SnapshotShutdownIT testRemoveNodeDur…

2d91f07

…ingSnapshot elastic#140753

Mute org.elasticsearch.snapshots.SnapshotShutdownIT testRemoveNodeDur…

ed858e9

…ingSnapshotWithOtherRunningShardSnapshots elastic#140755

Mute org.elasticsearch.snapshots.SnapshotShutdownIT testStartRemoveNo…

9b651a9

…deButDoNotComplete elastic#140759

Mute org.elasticsearch.snapshots.SnapshotShutdownIT testAbortSnapshot…

e805b9e

…WhileRemovingNode elastic#140760

Mute org.elasticsearch.snapshots.SnapshotShutdownIT testShutdownWhile…

127d38f

…SuccessInFlight elastic#140761

Mute org.elasticsearch.snapshots.SnapshotShutdownIT testSnapshotShutd…

4c2d69d

…ownProgressTracker elastic#140762

Mute org.elasticsearch.compute.aggregation.AllLastBytesRefByTimestamp…

8075f88

…GroupingAggregatorFunctionTests testSimpleWithCranky elastic#140763

Add known issue for upgrading to 9.2.4 (elastic#140738)

499d368

Backports for elastic#139910 were released in different releases, causing some upgrade paths to be broken. This commit adds a note about a failure that can occur between 9.1.10 and 9.2.4.

Store flattened field data in binary doc values (elastic#140246)

1c23ba4

This PR updates the FlattenedFieldMapper to use binary doc values instead of sorted set doc values

Merge branch 'main' into esql-mv-rerank

1335684

afoucret closed this Jan 15, 2026

cursor bot reviewed Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Esql mv rerank#7

Esql mv rerank#7
afoucret wants to merge 79 commits intoesql-mv-rerank-pocfrom
esql-mv-rerank

afoucret commented Jan 13, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

Conversation

afoucret commented Jan 13, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Jan 15, 2026

Choose a reason for hiding this comment

Test expectations never registered with mock logger

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

afoucret commented Jan 13, 2026 •

edited by cursor bot

Loading