Expose intermediary states in aggregation functions #16239
-
|
Hello, While looking at Datafusion (what an awesome project !!), I wondered if it's possible to expose intermediary states (ie: before merge_batch) to allow what clickhouse calls "-Merge", "-State", "-MergeState" combinators. For example, uniqState returns a statistical structure (kind of count min sketch) that can be merge later, while querying. With this, it's easy to keep a uniqState by minute, and query uniqMerge by hour. Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
👋 -- perhaps https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.Accumulator.html#tymethod.state is relevant |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for your reply.
|
Beta Was this translation helpful? Give feedback.
Thanks for your reply.
I had the big picture but the documentation made me realize that I "just" have to write two UDFA:
-State, that uses the original function to build states and then just serialize the state as a ScalarValue,-Merge, it will deserialize state, and use the merge_batch and evaluate function to build the final result.