Skip to content

Conversation

@JonasKunz
Copy link
Contributor

Before this PR, the exponential histogram percentiles algorithm would return the midpoint of the bucket for percentile queries.

With the exact min and max known and stored with the histogram, this can lead to the confusing case where e.g. a p99 is estimated to be larger than the max of the histogram.

This PR avoids this problem by clamping estimated percentiles to the [min, max] range.
If the histogram was constructed without the exact min and max being known, those are estimated using the lower/upper bound of the outermost buckets. Therefore in that case, clamping will not have an effect on the percentile computation.

@JonasKunz JonasKunz self-assigned this Sep 29, 2025
@JonasKunz JonasKunz added :StorageEngine/Mapping The storage related side of mappings >non-issue labels Sep 29, 2025
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Sep 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

result = values.valueAtPreviousRank() * (1 - upperFactor) + values.valueAtRank() * upperFactor;
}
return removeNegativeZero(result);
return removeNegativeZero(Math.clamp(result, histo.min(), histo.max()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm percentiles shouldn't return values outside min and max.. Should we be doing interpolation between min/max and the borderline ranks instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably have hardcoded logic for p0 and p100, now that you have min and max.

Copy link
Contributor Author

@JonasKunz JonasKunz Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rephrase what this PR tries to solve:

  • Lets assume in our histogram the highest, populated bucket is [1,2].
  • The percentile algorithm assumes that all values which fell into the bucket have the value 1.3333 (point of least relative error). Therefore if a percentile falling into that bucket is requested, 1.3333 would be returned
  • However, if the max of the histogram is actually 1.1, we know that 1.333 is incorrect. The [1,2] bucket was populated with values in the range [1, 1.1].

So what this PR does is that if the percentile falls into the highest (or lowest) bucket, it adjusts the assumed value for that bucket to move inside of min and max respectively.
If the percentile we are estimating does not lie in the outermost buckets (the ones containing min and max), the clamping has no effect: The estimated bucket center is bigger than min and smaller than max anyway.

Therefore I don't understand what (a) the interpolation you are suggesting would do and (b) why we should have a hardcoded logic for p0 and p100, as those are covered by the existing logic correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see. If max is 1.1, i.e. less than the polre :P, the polre should not be used for the highest bucket. Instead, we should be interpolating between the polre of the second-highest bucket and the max value. Using the polre for the highest bucket is provably inaccurate, in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you mean now:
You are referring to the case where the percentile lies between the second highest and the highest bucket, and therefore is interpolated, right?

That means that it is better to clamp the ValueAndPreviousValue values before we do the interpolation, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, wanna give it a try and add some tests to see what you get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 95166d9, which also adds a test which failed with the previous behaviour.

@JonasKunz JonasKunz enabled auto-merge (squash) October 2, 2025 07:03
@JonasKunz JonasKunz merged commit 679c407 into elastic:main Oct 2, 2025
32 of 34 checks passed
@JonasKunz JonasKunz deleted the clamp-exp-histo-percentiles branch October 2, 2025 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants