-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add implementation for exponential histogram merging and percentiles #131220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7711093 to
a46914a
Compare
...ponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/DownscaleStats.java
Outdated
Show resolved
Hide resolved
...ponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/DownscaleStats.java
Outdated
Show resolved
Hide resolved
...ponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/DownscaleStats.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Some parts could be more readable, but histograms are tricky to implement..
I suggest you wait for Mark to take a look as well, unless he can't complete the review this week. Also, @felixbarny should stamp.
…entional-histos # Conflicts: # libs/exponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogram.java
...va/org/elasticsearch/benchmark/exponentialhistogram/ExponentialHistogramGenerationBench.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/benchmark/exponentialhistogram/ExponentialHistogramMergeBench.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java
Outdated
Show resolved
Hide resolved
...stogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramMerger.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A this point, I've reviewed all of the main code, but have not yet gone through the tests. It's end of day for me, and as I have a lot of comments on the main code, I don't want to hold up submitting this review for another day while I get to the tests.
A couple of general consideration:
Memory Accounting
We need to think about how we're going to track memory for this data structure. In our T-Digest fork, we did this by creating an interface that wraps array-like behavior. This allows us to build an implementation in core elasticsearch that interacts with the circuit breakers for memory tracking within the library, without needing to expose the CB infrastructure. I would like us to do something similar here.
I'm open to adding this in a follow-up, but I don't think we should build the rest of the feature without memory accounting.
Exception Typing
There are many places this library throws IllegalArgumentException. If these bubble up to the REST error handler, they will be returned to the user as 400 errors, but that is probably not correct. Errors such as values being out of range seem to me much more likely to be the result of misuse elsewhere in Elasticsearch, rather than a user sending bad data or queries. Sending back a 400 in such cases is not correct.
This might be worth a larger discussion.
Clarity of Terminology
The concept of "index" (and "index order") in the sparse representation is quite difficult to follow. We have arrays with consecutive indices that store the bucket-index values, which are not consecutive integers as I understand it (because of the sparse representation). I've left several comments to this effect throughout the review. I think adding some definitions of terms to the README would help, and possibly some additional clarity as to when we are referring to the bucket ID number vs its position in the array of buckets. I expect that we are not going to be touching this code frequently, so I want to make it as easy as possible for people to ramp up on to this in the future.
...ponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/DownscaleStats.java
Show resolved
Hide resolved
| */ | ||
| int getRequiredScaleReductionToReduceBucketCountBy(int desiredCollapsedBucketCount) { | ||
| if (desiredCollapsedBucketCount < 0) { | ||
| throw new IllegalArgumentException("desiredCollapsedBucketCount must be greater than or equal to 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building on what @kkrik-es said above, I am dubious about throwing IllegalArgumentException from a bunch of places in this library. Elasticsearch maps exception classes onto HTTP status codes when replying to requests, and IllegalArgumentException is used for 400 errors, indicating a bad user query. But an error this deep in the library, it seems unlikely the user of elasticsearch can adjust their query to correct for the error.
I think asserts (and enough testing to be confidant they aren't being tripped) are the right choice here. Errors here are non-actionable at run time, and potentially confusing to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo throwing InvalidArgumentException is the Java standard practice when validating inputs. Even if we avoid it in our code, even e.g. java.lang will raise them if you use any core methods wrongly. So to me avoiding IllegalArgumentException because it causes 400 errors feels like a code smell.
I think asserts (and enough testing to be confidant they aren't being tripped) are the right choice here
Tests can never prevent 100% of bugs, there might always be the 0.1% case that slipped through. My problem with assertions is that they are (to my knowledge) not active in production. This means that if an invariant is broken, the code will (a) explode later with e.g. ArrayIndexOutOfBoundsException or (b) return wrong results which in my opinion is much worse because it can go undetected.
For (a) this makes dealing with production issues much harder. Most of the time you have to be able to reproduce it to understand what's happening, as you need to re-trigger it with assertions enabled. For (b) it makes it close to impossible, because it needs to be detected first.
That said, I'm fine with leaving my opinion aside and following the style and decisions of your team.
I already changed a lot of exceptions to assertions for package-private methods in c63ba90, which I think I pushed to late to have it make into your review. In 941223a I now also avoided IllegalArgumentExceptions entirely due to the 400s problem, with one exception:
The quantile-computation will throw one if the input is outside of the [0, 1] range, which should be fine imo.
Other than that there's only IllegalStateExceptions left, which are thrown when there is no other way of continuing the control flow in a meaningful way. I guess those should be fine, as they end up as 5xx errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to me avoiding IllegalArgumentException because it causes 400 errors feels like a code smell.
I don't disagree with that, but changing that behavior in Elasticsearch right now would be very challenging.
I'm open to leaving them in the library and catching them when we use them in Elasticsearch, but we need to remember to do that. The goal is to make sure that 400 errors that make it back to users are actionable for them, and I'm open to multiple ways of achieving that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree with that, but changing that behavior in Elasticsearch right now would be very challenging.
100%. That's why I now just removed the IllegalArgumentExceptions. So I assume the remaining IllegalStateException are fine?
...gram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramGenerator.java
Show resolved
Hide resolved
...gram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramGenerator.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java
Show resolved
Hide resolved
libs/exponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/ZeroBucket.java
Show resolved
Hide resolved
libs/exponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/ZeroBucket.java
Show resolved
Hide resolved
libs/exponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/ZeroBucket.java
Show resolved
Hide resolved
337d426 to
d34da19
Compare
I don't think this is worth documenting in the README, as it is an implementation detail of the I've reworked the specific javadoc you highlighted to avoid confusion, enhanced the documentation at the top explaining the difference and double checked that we don't use Hope this is good enough? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, with the understanding that we can't start using this in production until it's using the circuit breakers for memory protection. Once tests are passing and we're satisfied with the licensing plan, I'm +1 to merge this.
Implements exponential histogram merging and quantile estimation as a library.
Uses UDDSketch as merging algorithm and the OpenTelemetry definition of base 2, perfectly subsetting exponential histograms for the data structure. Check the readme included in this PR for more details.
gradle check?