-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Description
The current memory handling in the fetch phase and during multi-search accumulation does not fully utilize the reference-counted lifecycle of SearchHit.
As a result, memory used by hits is not consistently tracked by the request circuit breaker, and large fetch results or large multi-search responses can accumulate without proper accounting.
Additionally, the fetch phase still returns results as a single large payload rather than chunking, which increases peak memory usage and limits Elasticsearch’s ability to protect itself from large responses.
This issue tracks the core changes needed to improve stability and ensure predictable, safe memory behavior in fetch and multi-search operations:
-
Adapt the memory accounting in the FetchPhase to follow the lifecycle of the ref-counted
SearchHit.
(This ensures memory is correctly accounted on the coordinating node and during multi-search accumulation) -
Revise the request circuit breaker limit for small heaps to provide safer headroom for large fetch responses.
-
Chunk FetchPhase results instead of producing a single large blob, reducing peak memory usage and enabling safer coordination behavior. ( Implement chunked fetch streaming with circuit breaker integration #139124)
- Fetch streaming should process hits in doc-ID order instead of score order to avoid leaf re-entry and enable batched leaf-level optimizations, while preserving result ordering via position metadata. (Streaming fetch - Serialize hits in doc-ID order and let the coordinator reorder #144464)
- Investigate relaxing Lucene reader thread-affinity to enable asynchronous chunked fetch execution without thread pinning (Investigate relaxing Lucene's thread-affinity assertions to enable cross-thread reader handoff during streaming fetch #144467)
- Enable chunked streaming fetch in CCS to reduce memory pressure and improve resilience across clusters (Enable chunked streaming fetch for Cross-Cluster Search (CCS) #144469)
- Investigate migrating the four synchronous FetchPhase.execute call sites to the chunked path and remove the non-streaming overloads.(Investigate replacing synchronous execute overloads with the chunked/streaming paths in FetchPhase #144571)
-
Delay circuit breaker release in fetch phase until after response transmission to prevent undercounting(Delay circuit breaker release until fetch response is sent #139243)
-
Track serialized response bytes in the request circuit breaker until network write completion. (Defer circuit breaker release until transport write completes #143136)