Add sum to exponential histograms #133381

JonasKunz · 2025-08-22T10:11:30Z

Adds support for handling the sum of all values to the exponential histograms libs and the corresponding ES type.

The sum is required for exponential histogram
However, it is optional when ingesting data. If no sum is provided, it will be estimated based on the populated buckets on ingestion.
The sum must be 0.0 for empty histograms

elasticsearchmachine · 2025-08-22T10:22:24Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

.../main/java/org/elasticsearch/xpack/exponentialhistogram/ExponentialHistogramFieldMapper.java

...istogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramUtils.java

felixbarny · 2025-08-22T14:18:32Z

...istogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramUtils.java

-            negativeBuckets.advance();
-        }
-        while (positiveBuckets.hasNext()) {
+        while (negativeBuckets.hasNext() || positiveBuckets.hasNext()) {


I found the two separate loops easier to follow. Is the only reason you're using a single loop to track the highest bucket so that you know which sign to use for the infinity? If so, couldn't you store the max negative and positive index?

To reiterate for other from our slack chat:
Unfortunately the parallel iteration is required, because positive and negative buckets can cancel each other out.
E.g. let's assume that all of the buckets of the histogram below are already at +-Infinity:

negative: indices: [999, 1000, 1001, 1002, 1003] counts: [1,2, 1, 1, 1] positive: indices: [999, 1000, 1001, 1002, 1003] counts: [2,1, 1, 1, 1]

We have to find the largest bucket where the counts do not cancel each other out, in this case it would be index 1000 and the negative range would be the winner.

However, I think I can remove a lot of the mental overhead here by reusing the MergingBucketIterator with some slight adjustments.

felixbarny

Thanks for iterating! The new approach for re-using MergingBucketIterator looks good to me and a lot easier to understand.

kkrik-es · 2025-08-25T10:41:26Z

...istogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramUtils.java

+        assert negativeBuckets.scale() == positiveBuckets.scale();
+
+        // for each bucket index, sum up the counts, but account for the positive/negative sign
+        BucketIterator it = new MergingBucketIterator(negativeBuckets, -1, positiveBuckets, 1, positiveBuckets.scale());


Not very excited about the negative counts - what are the semantics? I'd rather we have a utility function that's called on each iterator that internally multiplies the sum with -1 for negative buckets.

In 5f94454 I've replaced the "factors" with the ability to provide a custom operator to do the count merging.
Is that what you were thinking of?

kkrik-es · 2025-08-25T10:46:29Z

.../src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java

+        return sum;
+    }
+
+    void setSum(double sum) {


This is only exposed for testing? If so, let's add a comment to call it out.

I see, we try to avoid recalculating in merging. Sounds good - I don't know how I feel about not validating the passed value but it can be expensive and tricky to do once.

I think it should be sufficient for us to do the validations required on ingestion and trust the values to be sane internally.
Also we don't just avoid recalculating while merging for performance reasons: The calculation we have is just an estimation. User can instead provide the exact sum on ingestion, which means we'll preserve exactness when merging, giving exact averages. In OTLP, the sum is provided by default.

kkrik-es

Thanks, looks good. We can inline LongBinaryOperator if we ever see it showing up in flamegraphs.

Add sum to exponential histograms

140438c

elasticsearchmachine added v9.2.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 22, 2025

Add missing test case

8c6c002

JonasKunz marked this pull request as ready for review August 22, 2025 10:21

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 22, 2025

JonasKunz added >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Aug 22, 2025

JonasKunz requested a review from felixbarny August 22, 2025 10:22

JonasKunz requested a review from kkrik-es August 22, 2025 10:22

[CI] Auto commit changes from spotless

3f8d3c3

felixbarny approved these changes Aug 22, 2025

View reviewed changes

.../main/java/org/elasticsearch/xpack/exponentialhistogram/ExponentialHistogramFieldMapper.java Show resolved Hide resolved

JonasKunz added 4 commits August 22, 2025 12:45

Merge branch 'main' into histo-sum

611a7da

Fix and test infinity handling

cde5e65

Fix benchmark

5a9e788

Avoid NaN in sum computation

91067c7

JonasKunz requested a review from felixbarny August 22, 2025 13:59

felixbarny approved these changes Aug 22, 2025

View reviewed changes

felixbarny reviewed Aug 22, 2025

View reviewed changes

...istogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramUtils.java Show resolved Hide resolved

[CI] Auto commit changes from spotless

3d1294f

felixbarny reviewed Aug 22, 2025

View reviewed changes

JonasKunz added 2 commits August 25, 2025 09:56

Refactor sum computation to reuse MergingBucketIterator

f92ec9e

Merge branch 'main' into histo-sum

90e2338

JonasKunz requested a review from felixbarny August 25, 2025 08:04

felixbarny approved these changes Aug 25, 2025

View reviewed changes

wording fix

7899f0c

kkrik-es reviewed Aug 25, 2025

View reviewed changes

kkrik-es approved these changes Aug 25, 2025

View reviewed changes

Replace factors with custom merge operator

5f94454

JonasKunz requested a review from kkrik-es August 26, 2025 07:07

kkrik-es approved these changes Aug 27, 2025

View reviewed changes

JonasKunz merged commit f85f59d into elastic:main Aug 27, 2025
33 checks passed

JonasKunz deleted the histo-sum branch August 27, 2025 10:27

JonasKunz mentioned this pull request Aug 27, 2025

Add min support to exponential histograms #133639

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sum to exponential histograms #133381

Add sum to exponential histograms #133381

Uh oh!

JonasKunz commented Aug 22, 2025

Uh oh!

elasticsearchmachine commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

felixbarny Aug 22, 2025

Uh oh!

JonasKunz Aug 25, 2025

Uh oh!

felixbarny left a comment

Uh oh!

kkrik-es Aug 25, 2025

Uh oh!

JonasKunz Aug 25, 2025

Uh oh!

kkrik-es Aug 25, 2025

Uh oh!

kkrik-es Aug 25, 2025

Uh oh!

JonasKunz Aug 25, 2025

Uh oh!

kkrik-es left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add sum to exponential histograms #133381

Add sum to exponential histograms #133381

Uh oh!

Conversation

JonasKunz commented Aug 22, 2025

Uh oh!

elasticsearchmachine commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

felixbarny Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

felixbarny left a comment

Choose a reason for hiding this comment

Uh oh!

kkrik-es Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants