-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add compute functions to ExponentialHistogramState #136749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
* @return an array of the mean values of the populated histogram buckets with their counts | ||
*/ | ||
public Collection<Centroid> centroids() { | ||
List<Centroid> centroids = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to pre-allocate the list using centroidCount
?
// negative buckets are in decreasing order, we want increasing order, therefore reverse | ||
Collections.reverse(centroids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this.
This had me believe that you start with the lowest values and go up to the highest ones:
Lines 42 to 46 in a0f415d
// They store all buckets for the negative range first, with the bucket indices in ascending order, | |
// followed by all buckets for the positive range, also with their indices in ascending order. | |
// This means we store the buckets ordered by their boundaries in ascending order (from -INF to +INF). | |
private final long[] bucketIndices; | |
private final long[] bucketCounts; |
I guess the last sentence in the comment is wrong then? The indices are ascending but the highest index for the negative scale has the lowest value. Did I get that right?
Part of #135625 , follow up of #136075.
Makes
ExponentialHistogramState
mimic theTDigestState
provided functionality used by aggregations, so that we can build the drop-in replacementHistogramState
consisting of both. This will then allow us to apply e.g. percentile aggregation on exponential histograms in addition to T-Digests and a mix of both.We also had to implement a
centroids()
functionality, which is implemented by returning the mean values of the populated histogram buckets. Based on my research,centroids()
is only used in the boxplot aggregation in order to define the length of the whiskers, where this usage should be fine.