Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ public class ExponentialHistogramQuantile {
* It returns the value of the element at rank {@code max(0, min(n - 1, (quantile * (n + 1)) - 1))}, where n is the total number of
* values and rank starts at 0. If the rank is fractional, the result is linearly interpolated from the values of the two
* neighboring ranks.
* The result is clamped to the histogram's minimum and maximum values.
*
* @param histo the histogram representing the distribution
* @param quantile the quantile to query, in the range [0, 1]
Expand Down Expand Up @@ -67,7 +68,7 @@ public static double getQuantile(ExponentialHistogram histo, double quantile) {
} else {
result = values.valueAtPreviousRank() * (1 - upperFactor) + values.valueAtRank() * upperFactor;
}
return removeNegativeZero(result);
return removeNegativeZero(Math.clamp(result, histo.min(), histo.max()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm percentiles shouldn't return values outside min and max.. Should we be doing interpolation between min/max and the borderline ranks instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably have hardcoded logic for p0 and p100, now that you have min and max.

Copy link
Contributor Author

@JonasKunz JonasKunz Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rephrase what this PR tries to solve:

  • Lets assume in our histogram the highest, populated bucket is [1,2].
  • The percentile algorithm assumes that all values which fell into the bucket have the value 1.3333 (point of least relative error). Therefore if a percentile falling into that bucket is requested, 1.3333 would be returned
  • However, if the max of the histogram is actually 1.1, we know that 1.333 is incorrect. The [1,2] bucket was populated with values in the range [1, 1.1].

So what this PR does is that if the percentile falls into the highest (or lowest) bucket, it adjusts the assumed value for that bucket to move inside of min and max respectively.
If the percentile we are estimating does not lie in the outermost buckets (the ones containing min and max), the clamping has no effect: The estimated bucket center is bigger than min and smaller than max anyway.

Therefore I don't understand what (a) the interpolation you are suggesting would do and (b) why we should have a hardcoded logic for p0 and p100, as those are covered by the existing logic correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see. If max is 1.1, i.e. less than the polre :P, the polre should not be used for the highest bucket. Instead, we should be interpolating between the polre of the second-highest bucket and the max value. Using the polre for the highest bucket is provably inaccurate, in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you mean now:
You are referring to the case where the percentile lies between the second highest and the highest bucket, and therefore is interpolated, right?

That means that it is better to clamp the ValueAndPreviousValue values before we do the interpolation, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, wanna give it a try and add some tests to see what you get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 95166d9, which also adds a test which failed with the previous behaviour.

}

private static double removeNegativeZero(double result) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,16 @@ public void testNoNegativeZeroReturned() {
assertThat(median, equalTo(0.0));
}

public void testPercentilesClampedToMinMax() {
ExponentialHistogram histogram = createAutoReleasedHistogram(
b -> b.scale(0).setNegativeBucket(1, 1).setPositiveBucket(1, 1).max(0.00001).min(-0.00002)
);
double p01 = ExponentialHistogramQuantile.getQuantile(histogram, 0.01);
double p99 = ExponentialHistogramQuantile.getQuantile(histogram, 0.99);
assertThat(p01, equalTo(-0.00002));
assertThat(p99, equalTo(0.00001));
}

public void testUniformDistribution() {
testDistributionQuantileAccuracy(new UniformRealDistribution(new Well19937c(randomInt()), 0, 100));
}
Expand Down