Add compute functions to ExponentialHistogramState #136749

JonasKunz · 2025-10-17T13:13:21Z

Part of #135625 , follow up of #136075.

Makes ExponentialHistogramState mimic the TDigestState provided functionality used by aggregations, so that we can build the drop-in replacement HistogramState consisting of both. This will then allow us to apply e.g. percentile aggregation on exponential histograms in addition to T-Digests and a mix of both.

We also had to implement a centroids() functionality, which is implemented by returning the mean values of the populated histogram buckets. Based on my research, centroids() is only used in the boxplot aggregation in order to define the length of the whiskers, where this usage should be fine.

elasticsearchmachine · 2025-10-17T14:22:30Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

felixbarny · 2025-10-17T15:43:11Z

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java

+     * @return an array of the mean values of the populated histogram buckets with their counts
+     */
+    public Collection<Centroid> centroids() {
+        List<Centroid> centroids = new ArrayList<>();


would it make sense to pre-allocate the list using centroidCount?

felixbarny · 2025-10-17T15:58:34Z

...r/src/main/java/org/elasticsearch/search/aggregations/metrics/ExponentialHistogramState.java

+        // negative buckets are in decreasing order, we want increasing order, therefore reverse
+        Collections.reverse(centroids);


I'm a bit confused by this.
This had me believe that you start with the lowest values and go up to the highest ones:

elasticsearch/libs/exponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java

Lines 42 to 46 in a0f415d

// They store all buckets for the negative range first, with the bucket indices in ascending order,

// followed by all buckets for the positive range, also with their indices in ascending order.

// This means we store the buckets ordered by their boundaries in ascending order (from -INF to +INF).

private final long[] bucketIndices;

private final long[] bucketCounts;

I guess the last sentence in the comment is wrong then? The indices are ascending but the highest index for the negative scale has the lowest value. Did I get that right?

JonasKunz added 3 commits October 17, 2025 11:00

Stash function implementations

5263162

Add javadoc

43ab80a

Add tests for cdf

858b663

elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.3.0 labels Oct 17, 2025

Merge branch 'main' into exp-histo-state-functions

a8aa297

JonasKunz added :Analytics/Aggregations Aggregations >non-issue labels Oct 17, 2025

[CI] Auto commit changes from spotless

0d87fd4

JonasKunz marked this pull request as ready for review October 17, 2025 14:22

JonasKunz requested review from felixbarny and kkrik-es October 17, 2025 14:22

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 17, 2025

felixbarny approved these changes Oct 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add compute functions to ExponentialHistogramState #136749

Add compute functions to ExponentialHistogramState #136749

Uh oh!

JonasKunz commented Oct 17, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Oct 17, 2025

Uh oh!

felixbarny Oct 17, 2025

Uh oh!

felixbarny Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// negative buckets are in decreasing order, we want increasing order, therefore reverse
		Collections.reverse(centroids);

	// They store all buckets for the negative range first, with the bucket indices in ascending order,
	// followed by all buckets for the positive range, also with their indices in ascending order.
	// This means we store the buckets ordered by their boundaries in ascending order (from -INF to +INF).
	private final long[] bucketIndices;
	private final long[] bucketCounts;

Add compute functions to ExponentialHistogramState #136749

Are you sure you want to change the base?

Add compute functions to ExponentialHistogramState #136749

Uh oh!

Conversation

JonasKunz commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 17, 2025

Uh oh!

felixbarny Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

felixbarny Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JonasKunz commented Oct 17, 2025 •

edited

Loading