-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add sum to exponential histograms #133381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
.../main/java/org/elasticsearch/xpack/exponentialhistogram/ExponentialHistogramFieldMapper.java
Show resolved
Hide resolved
...istogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramUtils.java
Show resolved
Hide resolved
| negativeBuckets.advance(); | ||
| } | ||
| while (positiveBuckets.hasNext()) { | ||
| while (negativeBuckets.hasNext() || positiveBuckets.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the two separate loops easier to follow. Is the only reason you're using a single loop to track the highest bucket so that you know which sign to use for the infinity? If so, couldn't you store the max negative and positive index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reiterate for other from our slack chat:
Unfortunately the parallel iteration is required, because positive and negative buckets can cancel each other out.
E.g. let's assume that all of the buckets of the histogram below are already at +-Infinity:
negative:
indices: [999, 1000, 1001, 1002, 1003]
counts: [1,2, 1, 1, 1]
positive:
indices: [999, 1000, 1001, 1002, 1003]
counts: [2,1, 1, 1, 1]
We have to find the largest bucket where the counts do not cancel each other out, in this case it would be index 1000 and the negative range would be the winner.
However, I think I can remove a lot of the mental overhead here by reusing the MergingBucketIterator with some slight adjustments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! The new approach for re-using MergingBucketIterator looks good to me and a lot easier to understand.
| assert negativeBuckets.scale() == positiveBuckets.scale(); | ||
|
|
||
| // for each bucket index, sum up the counts, but account for the positive/negative sign | ||
| BucketIterator it = new MergingBucketIterator(negativeBuckets, -1, positiveBuckets, 1, positiveBuckets.scale()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very excited about the negative counts - what are the semantics? I'd rather we have a utility function that's called on each iterator that internally multiplies the sum with -1 for negative buckets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 5f94454 I've replaced the "factors" with the ability to provide a custom operator to do the count merging.
Is that what you were thinking of?
| return sum; | ||
| } | ||
|
|
||
| void setSum(double sum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only exposed for testing? If so, let's add a comment to call it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we try to avoid recalculating in merging. Sounds good - I don't know how I feel about not validating the passed value but it can be expensive and tricky to do once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be sufficient for us to do the validations required on ingestion and trust the values to be sane internally.
Also we don't just avoid recalculating while merging for performance reasons: The calculation we have is just an estimation. User can instead provide the exact sum on ingestion, which means we'll preserve exactness when merging, giving exact averages. In OTLP, the sum is provided by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good. We can inline LongBinaryOperator if we ever see it showing up in flamegraphs.
Adds support for handling the sum of all values to the exponential histograms libs and the corresponding ES type.
0.0for empty histograms