ESQL: Aggressive release of shard contexts #129454

GalLalouche · 2025-06-15T14:12:35Z

Keep better track of shard contexts using RefCounted, so they can be released more aggressively during operator processing. For example, during TopN, we can potentially release some contexts if they don't pass the limit filter.

This is done in preparation of TopN fetch optimization, which will delay the fetching of additional columns to the data node coordinator, instead of doing it in each individual worker, thereby reducing IO. Since the node coordinator would need to maintain the shard contexts for a potentially longer duration, it is important we try to release what we can eariler.

An even more advanced optimization is to delay fetching to the main cluster coordinator, but that would be more involved, since we need to first figure out how to transport the shard contexts between nodes.

Summary of main changes:

DocVector now maintains a RefCounted instance per shard.
Things which can build or release DocVectors (e.g., LuceneSourceOperator, TopNOperator), can also hold RefCounted instances, so they can pass them to DocVector and also ensure contexts aren't released if they can still be potentially used later.
Driver's main loop iteration (runSingleLoopIteration), now closes its operators even between different operator processing. This is extra aggressive, and was mostly done to improve testability.
Added a couple of tests to TopNOperator and a new integration test EsqlTopNShardManagementIT, which uses the pausable plugin framework to check that TopNOperator releases things as early as possible..

Keep better track of shard contexts using `RefCounted`, so they can be released more aggressively during operator processing. For example, during TopN, we can potentially release some contexts if they don't pass the limit filter. This is done in preparation of TopN fetch optimization, which will delay the fetching of additional columns to the data node coordinator, instead of doing it in each individual worker, thereby reducing IO. Since the node coordinator would need to maintain the shard contexts for a potentially longer duration, it is important we try to release what we can eariler. An even more advanced optimization is to delay fetching to the main cluster coordinator, but that would be more involved, since we need to first figure out how to transport the shard contexts between nodes.

elasticsearchmachine · 2025-06-15T14:13:24Z

Hi @GalLalouche, I've created a changelog YAML for you.

GalLalouche · 2025-06-15T14:19:27Z

test/framework/src/main/java/org/elasticsearch/search/MockSearchService.java

    public SearchContext createSearchContext(ShardSearchRequest request, TimeValue timeout) throws IOException {
        SearchContext searchContext = super.createSearchContext(request, timeout);
-        onPutContext.accept(searchContext.readerContext());
+        try {


This was done after confirming with @dnhatn that onPutContext here can be replaced with onCreateSearchContext. The try/catch clause was copy pasted from above.

GalLalouche · 2025-06-15T14:22:36Z

...mpute/src/test/java/org/elasticsearch/compute/operator/TupleAbstractBlockSourceOperator.java

+ * A source operator whose output is the given tuple values. This operator produces pages
+ * with two Blocks. The returned pages preserve the order of values as given in the in initial list.
+ */
+public abstract class TupleAbstractBlockSourceOperator<T, S> extends AbstractBlockSourceOperator {


I've generalized the already existing TupleBlockSourceOperator to support more than just a tuple of two longs.

…ticsearch into feature/shard_ref_count

nik9000

I left a few comments, but it feels right to me!

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/ShardRefCounted.java

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

libs/core/src/main/java/org/elasticsearch/core/Releasables.java

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/DocVector.java

nik9000 · 2025-06-16T15:32:37Z

If you want me to review more let me know. Or I can wait until you remove draft.

GalLalouche · 2025-06-16T18:15:35Z

If you want me to review more let me know. Or I can wait until you remove draft.

Thanks @nik9000! I'm chasing down test failures right now, and will go over your comments in the meantime. I'll re-request a review when I'm done.

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/DocVector.java

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneOperator.java

.../esql/compute/src/main/java/org/elasticsearch/compute/operator/OrdinalsGroupingOperator.java

.../plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/topn/TopNOperator.java

.../esql/compute/src/main/java/org/elasticsearch/compute/operator/OrdinalsGroupingOperator.java

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/DocVector.java

…ticsearch into feature/shard_ref_count

This PR adds a late(r) materialization for TopN queries, such that the materialization happes in the "node_reduce" phase instead of during the "data" phase. For example, if the limit is 20, and each data node spawns 10 workers, we would only read 20 additional columns (i.e., ones not needed for the TopN) filters, instead of 200. To support this, the reducer node maintains a global list of all shard contexts used by its individual data workers (although some of those might be closed if they are no longer needed, thanks to #129454). There is some additional book-keeping involved, since previously, every data node held a local list of shard contexts, and used its local indices to access it. To avoid changing too much (this local-index logic is spread throughout much of the code!), a new global index is introduced, which replaces the local index after all the rows are merged together in the reduce phase's TopN.

This PR adds a late(r) materialization for TopN queries, such that the materialization happes in the "node_reduce" phase instead of during the "data" phase. For example, if the limit is 20, and each data node spawns 10 workers, we would only read 20 additional columns (i.e., ones not needed for the TopN) filters, instead of 200. To support this, the reducer node maintains a global list of all shard contexts used by its individual data workers (although some of those might be closed if they are no longer needed, thanks to elastic#129454). There is some additional book-keeping involved, since previously, every data node held a local list of shard contexts, and used its local indices to access it. To avoid changing too much (this local-index logic is spread throughout much of the code!), a new global index is introduced, which replaces the local index after all the rows are merged together in the reduce phase's TopN.

GalLalouche added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Jun 15, 2025

elasticsearchmachine added the v9.1.0 label Jun 15, 2025

Update docs/changelog/129454.yaml

06de0d7

GalLalouche assigned nik9000 and dnhatn Jun 15, 2025

GalLalouche commented Jun 15, 2025

View reviewed changes

GalLalouche unassigned nik9000 and dnhatn Jun 15, 2025

GalLalouche added 5 commits June 15, 2025 22:31

Fix compilation errors

25fd514

Fix bug caused by mishandling of errors during driver iteration

0210097

Merge branch 'feature/shard_ref_count' of github.com:GalLalouche/elas…

efc0dfd

…ticsearch into feature/shard_ref_count

Change order of removal from first to last

f51a7c4

Remove printlns

d6ebed2

nik9000 reviewed Jun 16, 2025

View reviewed changes

Add ref counter to OrdinalsGroupingOperator

8fb564a

Fix failing test (ManyShardsIT)

2843e20

GalLalouche force-pushed the feature/shard_ref_count branch 2 times, most recently from 0591cf6 to b5f56f8 Compare June 17, 2025 11:00

GalLalouche added 2 commits June 17, 2025 14:12

CR fixes

cfbf4b2

More test fixes

1ad8eb4

GalLalouche force-pushed the feature/shard_ref_count branch from b5f56f8 to 1ad8eb4 Compare June 17, 2025 12:49

GalLalouche added 3 commits June 17, 2025 17:38

Fix random DocVector generation (shard cannot be negative)

b0a35b1

Merge branch 'main' into feature/shard_ref_count

5b9b897

More edge cases for shard IDs in tests

73ddfe7

GalLalouche requested a review from nik9000 June 18, 2025 09:31

GalLalouche added 2 commits June 18, 2025 12:31

Merge branch 'main' into feature/shard_ref_count

e22cc53

Merge branch 'main' into feature/shard_ref_count

2c25218

nik9000 reviewed Jun 23, 2025

View reviewed changes

CR comments

e621964

GalLalouche requested a review from nik9000 June 23, 2025 19:52

nik9000 approved these changes Jun 23, 2025

View reviewed changes

.../esql/compute/src/main/java/org/elasticsearch/compute/operator/OrdinalsGroupingOperator.java Show resolved Hide resolved

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/DocVector.java Outdated Show resolved Hide resolved

GalLalouche added 3 commits June 24, 2025 13:26

CR fixes

5246f3a

Merge branch 'main' into feature/shard_ref_count

2f219c8

Merge branch 'main' into feature/shard_ref_count

4761a2e

GalLalouche enabled auto-merge (squash) June 24, 2025 11:48

GalLalouche added 2 commits June 24, 2025 16:55

Merge branch 'main' into feature/shard_ref_count

ce8462b

Merge branch 'main' into feature/shard_ref_count

aab0407

GalLalouche disabled auto-merge June 24, 2025 14:59

Move shardRefCounter logic to the super class

6270833

GalLalouche enabled auto-merge (squash) June 24, 2025 15:55

GalLalouche added 6 commits June 24, 2025 19:00

Merge branch 'main' into feature/shard_ref_count

97af248

Fix double ref counting from previous PR

de0d4bd

Merge branch 'main' into feature/shard_ref_count

bb9a42b

Merge branch 'main' into feature/shard_ref_count

69acdfd

Merge branch 'main' into feature/shard_ref_count

c71ae9d

Merge branch 'feature/shard_ref_count' of github.com:GalLalouche/elas…

c8af2c6

…ticsearch into feature/shard_ref_count

GalLalouche force-pushed the feature/shard_ref_count branch from c9bedcd to c8af2c6 Compare June 25, 2025 12:47

Merge branch 'main' into feature/shard_ref_count

3d28f52

GalLalouche merged commit 6970bd2 into elastic:main Jun 25, 2025
32 checks passed

GalLalouche mentioned this pull request Jun 27, 2025

[CI] EnrichIT testTopN failing #130122

Closed

GalLalouche mentioned this pull request Aug 21, 2025

ES|QL: Late materialization after TopN (Node level) #132757

Merged

ESQL: Aggressive release of shard contexts #129454

ESQL: Aggressive release of shard contexts #129454

Uh oh!

Conversation

GalLalouche commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 15, 2025

Uh oh!

GalLalouche Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

GalLalouche Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nik9000 commented Jun 16, 2025

Uh oh!

GalLalouche commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GalLalouche commented Jun 15, 2025 •

edited

Loading