Implement `t-digest` algorithm for online histogram calculation

This would be used to calculate dataset histogram during ingestion. `t-digest` algorithm is very popular (especially in map-reduce ops in Apache Spark).

Computing Extremely Accurate Quantiles Using t-Digests (Dunning & Ertl 2019):
https://arxiv.org/abs/1902.04023

Some light weight explainer:
https://www.gresearch.com/news/approximate-percentiles-with-t-digests

There are two python libraries that do it:

- `tdigest`: https://github.com/tdunning/t-digest
- `tdigest-rs` (🦀): https://github.com/G-Research/tdigest-rs

E.g. each distributed worker calculates their part of the data distribution representation. At the end, they would be combined for the final histogram.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement `t-digest` algorithm for online histogram calculation #603

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement t-digest algorithm for online histogram calculation #603

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implement `t-digest` algorithm for online histogram calculation #603