Close data node consumer on listener completion #137698

benchaplin · 2025-11-06T17:05:40Z

Currently, we close the consumer (therefore decRef-ing all consumed shard results) once all shards in the batched query request are complete. I found that SearchWithRandomDisconnectsIT often causes the batched request to complete before all shards are done, leading to the leak with QuerySearchResults sitting in an un-closed consumer (to reproduce this locally, just run the suite with @Repeat(iterations = 100)).

I think we should mirror what is done on the coord node side - tie the consumer to the request listener (see AbstractSearchAsyncAction).

elasticsearchmachine · 2025-11-06T17:06:07Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

drempapis · 2025-11-07T08:45:13Z

server/src/main/java/org/elasticsearch/action/search/SearchQueryThenFetchAsyncAction.java

            this.task = task;
            this.countDown = new CountDown(queryPhaseResultConsumer.getNumShards());
            this.channel = channel;
+            this.listener = ActionListener.releaseBefore(queryPhaseResultConsumer, new ChannelActionListener<>(channel));


This approach structurally looks better, as it centralizes the consumer’s lifecycle, ensuring that every success/failure path passes through the same wrapper.

Both approaches behave the same,

Serialize with an open consumer,

Close the consumer

Send response (or failure)

Let respondAndRelease to free the bytes

The code in main

Uses a local ChannelActionListener and a try (queryPhaseResultConsumer) block.

On success, the consumer is closed at the end of the try-with-resources (consumer) block (i.e., after serialization finishes, before building/sending the transport response).

On failure, the consumer is also closed by the try-with-resources and failure is sent via channelListener.

The pr's code:

Wraps the channel listener with releaseBefore(consumer, …) so the consumer is always closed before sending success/failure.

On success, the consumer is closed right before delegating to the channel (via wrapper). Serialization happens with consumer open; then the wrapper closes it and writes.

On failure, listener.onFailure(e) closes the consumer first (via wrapper) and then writes the failure.

I prefer the updated code, as releasing the consumer uniformly on both success and failure is cleaner; however, I’m not convinced it addresses the underlying issue.

I ran multiple local executions with @Repeat of SearchWithRandomDisconnectsIT#testSearchWithRandomDisconnects and was unable to reproduce the failure. Not sure if something changed or if it’s just my machine.

Hm, you're right, I'm struggling to reproduce on main myself now. I could have sworn I continued to see failures after my #136889 fix, but I might be mistaken - perhaps that solved it.

What do you think? I'm thinking to table this change, which might still be a worthy improvement, and just unmute SearchWithRandomDisconnectsIT for now. We can see if it's still failing in CI.

benchaplin · 2025-11-07T18:21:58Z

Closing this for now as I've opened #137763 to simply unmute. This may be a worthy improvement for the future but not a priority. Shout out @drempapis for double checking me here.

Close consumer on listener completion

18dd8e3

benchaplin added >non-issue auto-backport Automatically create backport pull requests when merged Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations branch:9.2 branch:9.1 labels Nov 6, 2025

elasticsearchmachine added v9.2.2 v9.1.8 labels Nov 6, 2025

elasticsearchmachine added v9.3.0 and removed branch:9.2 branch:9.1 labels Nov 6, 2025

[CI] Auto commit changes from spotless

e8fd965

drempapis reviewed Nov 7, 2025

View reviewed changes

benchaplin mentioned this pull request Nov 7, 2025

[CI] FieldSortIT testSortMixedFieldTypes failing #129445

Closed

benchaplin closed this Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Close data node consumer on listener completion #137698

Close data node consumer on listener completion #137698

Uh oh!

benchaplin commented Nov 6, 2025

Uh oh!

elasticsearchmachine commented Nov 6, 2025

Uh oh!

drempapis Nov 7, 2025

Uh oh!

benchaplin Nov 7, 2025

Uh oh!

benchaplin commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Close data node consumer on listener completion #137698

Close data node consumer on listener completion #137698

Uh oh!

Conversation

benchaplin commented Nov 6, 2025

Uh oh!

elasticsearchmachine commented Nov 6, 2025

Uh oh!

drempapis Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

benchaplin Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

benchaplin commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants