Add partial reduce nodes for reducing intermediate aggregation results

Currently Elasticsearch in aggregation search is to scatter the requests and then gather these sub-requests into coordinator. then coordinator reduce these requests in heap.

However, if all the sub-request seems huge in memory, coordinator may have a  horrible huge cost of heap. then may cause to high latency GC or OOM.

Can we import new style for this: a new role for **coordinator-agg** node: data node do not return a complete sub-request result but a series of **blocks** of the sub-request continuously  and coordinator agg node receive many blocks in parallel from multiple data-node.  and coordinator agg node can **pre-aggregate** these blocks early and in parallel way and continuously.

When the coordinator agg node receive every data-node of last series of blocks, coordinator agg node can return this final aggregate result to user. by aggregate the smaller blocks in coordinator agg node continuously and in parallel, the memory cost and control is more smoothly compare to current style.

In the end, we may have two kinds of coordinator node:
1. original common coordinator node is good at common query search cases with high QPS
2. stream aggregation coordinator node is good at specific super huge memory cost of aggregation search with low QPS due to cost many threads to continuously aggregate blocks in parallel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add partial reduce nodes for reducing intermediate aggregation results #56748

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add partial reduce nodes for reducing intermediate aggregation results #56748

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions