Add compute functions to ExponentialHistogramState #136749

JonasKunz · 2025-10-17T13:13:21Z

Part of #135625 , follow up of #136075.

Makes ExponentialHistogramState mimic the TDigestState provided functionality used by aggregations, so that we can build the drop-in replacement HistogramState consisting of both. This will then allow us to apply e.g. percentile aggregation on exponential histograms in addition to T-Digests and a mix of both.

We also had to implement a centroids() functionality, which is implemented by returning the mean values of the populated histogram buckets. Based on my research, centroids() is only used in the boxplot aggregation in order to define the length of the whiskers, where this usage should be fine.

elasticsearchmachine · 2025-10-17T14:22:30Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java

kkrik-es · 2025-10-20T12:04:40Z

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java

+    public double cdf(double x) {
+        ExponentialHistogram histogram = histogram();
+        long numValuesLess = ExponentialHistogramQuantile.estimateRank(histogram, x, false);
+        long numValuesLessOrEqual = ExponentialHistogramQuantile.estimateRank(histogram, x, true);


Just noticed, these don't check if the value is outside min/max? You should do that separately, return 0 for < min and 1 for > max.

Correctness wise, we already do this here.

So you are suggesting to do this for performance reasons?
I'm not sure if this is worth it, because it increases the complexity: We currently clamp the bucket POLRE to the histogram min / max. So if the POLRE was clamped, we need to return different values based on whether the requested rank was inclusive or exclusive equal values.

I don't think that this is a common enough case to justify this extra code

Why does this increase the complexity? Isn't this a simple condition to return early, that can be added at the top of ExponentialHistogramQuantile.estimateRank? It should be uncontroversial, in terms of correctness, and more efficient.

Btw, this is unrelated to this pr so not a blocker.

You are totally right, I had this messed up in my head. I confused the < min() precondition with <= min(). The latter would increase complexity due to having to account for inclusivity, the first one is trivial.

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java

JonasKunz added 3 commits October 17, 2025 11:00

Stash function implementations

5263162

Add javadoc

43ab80a

Add tests for cdf

858b663

elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.3.0 labels Oct 17, 2025

Merge branch 'main' into exp-histo-state-functions

a8aa297

JonasKunz added :Analytics/Aggregations Aggregations >non-issue labels Oct 17, 2025

[CI] Auto commit changes from spotless

0d87fd4

JonasKunz marked this pull request as ready for review October 17, 2025 14:22

JonasKunz requested review from felixbarny and kkrik-es October 17, 2025 14:22

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 17, 2025

felixbarny approved these changes Oct 17, 2025

View reviewed changes

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java Outdated Show resolved Hide resolved

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java Show resolved Hide resolved

JonasKunz added 2 commits October 20, 2025 12:59

Review fixes

f075130

Merge branch 'main' into exp-histo-state-functions

04b7e2f

kkrik-es reviewed Oct 20, 2025

View reviewed changes

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java Outdated Show resolved Hide resolved

kkrik-es reviewed Oct 20, 2025

View reviewed changes

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java Show resolved Hide resolved

JonasKunz added 2 commits October 20, 2025 14:23

fix copy pasta

9a45671

Add javadoc about algorithm assumptions

85beed1

kkrik-es approved these changes Oct 21, 2025

View reviewed changes

JonasKunz added 3 commits October 21, 2025 09:56

Add early-outs for rank estimation

63cdf6e

Fix wrong max handling, add handling for empty histograms

db949c2

Merge branch 'main' into exp-histo-state-functions

beee15e

JonasKunz merged commit 11c77ba into elastic:main Oct 21, 2025
34 checks passed

JonasKunz deleted the exp-histo-state-functions branch October 21, 2025 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add compute functions to ExponentialHistogramState #136749

Add compute functions to ExponentialHistogramState #136749

JonasKunz commented Oct 17, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

kkrik-es Oct 20, 2025

Uh oh!

JonasKunz Oct 20, 2025

Uh oh!

kkrik-es Oct 21, 2025

Uh oh!

JonasKunz Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add compute functions to ExponentialHistogramState #136749

Add compute functions to ExponentialHistogramState #136749

Conversation

JonasKunz commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

kkrik-es Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JonasKunz commented Oct 17, 2025 •

edited

Loading