-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Currently Elasticsearch in aggregation search is to scatter the requests and then gather these sub-requests into coordinator. then coordinator reduce these requests in heap.
However, if all the sub-request seems huge in memory, coordinator may have a horrible huge cost of heap. then may cause to high latency GC or OOM.
Can we import new style for this: a new role for coordinator-agg node: data node do not return a complete sub-request result but a series of blocks of the sub-request continuously and coordinator agg node receive many blocks in parallel from multiple data-node. and coordinator agg node can pre-aggregate these blocks early and in parallel way and continuously.
When the coordinator agg node receive every data-node of last series of blocks, coordinator agg node can return this final aggregate result to user. by aggregate the smaller blocks in coordinator agg node continuously and in parallel, the memory cost and control is more smoothly compare to current style.
In the end, we may have two kinds of coordinator node:
- original common coordinator node is good at common query search cases with high QPS
- stream aggregation coordinator node is good at specific super huge memory cost of aggregation search with low QPS due to cost many threads to continuously aggregate blocks in parallel