Skip to content

Implement chunked fetch streaming with circuit breaker integration#139124

Open
drempapis wants to merge 428 commits intoelastic:mainfrom
drempapis:chunked_fetch_phase
Open

Implement chunked fetch streaming with circuit breaker integration#139124
drempapis wants to merge 428 commits intoelastic:mainfrom
drempapis:chunked_fetch_phase

Conversation

@drempapis
Copy link
Contributor

@drempapis drempapis commented Dec 5, 2025

In the current implementation, when Elasticsearch executes a search query that returns a large number of documents, the fetch phase retrieves the actual document content from each shard, which can lead to significant memory pressure on data nodes.

  • Data Node
    • All SearchHit objects are built and held in memory simultaneously before being serialized and sent to the coordinator. For large result sets (e.g., 1000 or more documents with nested fields), this can consume gigabytes of heap memory.
  • Transport
    • Big messages are transferred through the network.
  • Coordinator Node
    • Receives the complete response from each shard at once, accumulating all hits in memory before building the final response. With multiple shards, memory usage multiplies even for one query.
  • Result
    • OutOfMemoryError (OOM) crashes, especially during concurrent large queries or when document sizes are unpredictable.

This PR implements chunked streaming for the fetch phase to reduce memory pressure when handling large result sets. Instead of accumulating all search hits in memory on the data node before sending them to the coordinator, hits are streamed in configurable chunks (default: 256 KB) as they are produced. Memory usage is bounded by circuit breakers on both the data and coordinator nodes.

How OOM is Prevented on the Data Node

  • Immediate Serialization
    • Each SearchHit is serialized to bytes immediately after fetching, then the object is released. The bytes are enqueued in chunks for processing..
  • Byte-Based Chunking (default 256KB)
    • Chunks are emitted when serialized bytes exceed the 256KB threshold. This bounds the maximum buffer size regardless of document count or size.
  • Circuit Breaker Reservation
    • Before each chunk is enqueued for sending, memory is reserved via CB.addEstimateBytesAndMaybeBreak(). If the breaker trips (too much memory), the operation fails fast with CircuitBreakingException instead of OOM.
    • Circuit breaker memory accounting is more accurate in this implementation. It tracks the full serialized SearchHit size (including all fields, metadata, and nested structures), whereas the traditional implementation only accounts for the _source field bytes.
  • ThrottledTaskRunner Backpressure
    • Limits concurrent in-flight chunks to maxInFlightChunks. When at capacity, new chunks queue internally. This prevents unbounded chunk accumulation when the coordinator is slow.
  • ACK-Based Memory Release
    • Circuit breaker memory is released only when the coordinator ACKs each chunk. This creates natural backpressure, if the coordinator is slow, data node memory stays high, eventually tripping the circuit-breaker.

How OOM is Prevented on the Coordinator Node

  • Incremental Chunk Reception
    • Instead of receiving all hits at once, the coordinator receives small chunks (>= 256KB each). Memory grows incrementally as chunks arrive.
  • Circuit Breaker Tracking
    • FetchPhaseResponseStream tracks accumulated bytes and reserves memory on the coordinator's circuit breaker (for all shards). If breaker trips, new chunks are rejected.
  • ACK Flow Control
    • The coordinator only ACKs a chunk after successfully processing it. If the coordinator is overwhelmed, it stops ACKing, which throttles the data node via backpressure.
  • Cleanup on Failure
    • If any error occurs, closeInternal() releases all circuit breaker bytes and cleans up accumulated hits, preventing memory leaks.

Flow Diagram

image

The implementation followed the paradigm of TransportRepositoryVerifyIntegrityCoordinationAction but it streams only between the coordinator and data-nodes.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it :)

@drempapis drempapis added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations >refactoring labels Dec 11, 2025
@drempapis
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of thoughts about blocking of threads.

@drempapis
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@drempapis
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with pmc-3n-4g please

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with wikipedia please

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with geoshape please
Buildkite benchmark this with geoshape-3n-4g please

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with geoshape please

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with sql-3n-4g please

@drempapis
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with sql-3n-4g please

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with sql-3n-4g please

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with noaa-3n-2g please

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with noaa-3n-2g please

@drempapis
Copy link
Contributor Author

Buildkite benchmark this with esql please

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 21, 2026

💚 Build Succeeded

This build ran two esql benchmarks to evaluate performance impact of this PR.

History

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants