Skip to content

Add partial reduce nodes for reducing intermediate aggregation results #56748

@hackerwin7

Description

@hackerwin7

Currently Elasticsearch in aggregation search is to scatter the requests and then gather these sub-requests into coordinator. then coordinator reduce these requests in heap.

However, if all the sub-request seems huge in memory, coordinator may have a horrible huge cost of heap. then may cause to high latency GC or OOM.

Can we import new style for this: a new role for coordinator-agg node: data node do not return a complete sub-request result but a series of blocks of the sub-request continuously and coordinator agg node receive many blocks in parallel from multiple data-node. and coordinator agg node can pre-aggregate these blocks early and in parallel way and continuously.

When the coordinator agg node receive every data-node of last series of blocks, coordinator agg node can return this final aggregate result to user. by aggregate the smaller blocks in coordinator agg node continuously and in parallel, the memory cost and control is more smoothly compare to current style.

In the end, we may have two kinds of coordinator node:

  1. original common coordinator node is good at common query search cases with high QPS
  2. stream aggregation coordinator node is good at specific super huge memory cost of aggregation search with low QPS due to cost many threads to continuously aggregate blocks in parallel

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions