-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Clamp exponential histogram percentiles to min/max #135632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
JonasKunz
merged 11 commits into
elastic:main
from
JonasKunz:clamp-exp-histo-percentiles
Oct 2, 2025
Merged
Changes from 3 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
aa22fd2
Clamp exponential histogram percentiles to min/max
JonasKunz 2239181
[CI] Auto commit changes from spotless
afb29d6
Merge branch 'main' into clamp-exp-histo-percentiles
JonasKunz 95166d9
Clamp before interpolation
JonasKunz ab952fd
Fix comment
JonasKunz e75933f
Fix test name
JonasKunz 9a0fe02
Fix testPercentilesClampedToMinMax test
JonasKunz 790e687
[CI] Auto commit changes from spotless
cfc971a
Merge branch 'main' into clamp-exp-histo-percentiles
JonasKunz 02ee881
Add test for min interpolation
JonasKunz 6b98807
Merge branch 'main' into clamp-exp-histo-percentiles
JonasKunz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm percentiles shouldn't return values outside min and max.. Should we be doing interpolation between min/max and the borderline ranks instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably have hardcoded logic for p0 and p100, now that you have min and max.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To rephrase what this PR tries to solve:
[1,2].1.3333(point of least relative error). Therefore if a percentile falling into that bucket is requested,1.3333would be returnedmaxof the histogram is actually1.1, we know that1.333is incorrect. The[1,2]bucket was populated with values in the range[1, 1.1].So what this PR does is that if the percentile falls into the highest (or lowest) bucket, it adjusts the assumed value for that bucket to move inside of
minandmaxrespectively.If the percentile we are estimating does not lie in the outermost buckets (the ones containing min and max), the clamping has no effect: The estimated bucket center is bigger than
minand smaller thanmaxanyway.Therefore I don't understand what (a) the interpolation you are suggesting would do and (b) why we should have a hardcoded logic for
p0andp100, as those are covered by the existing logic correctly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see. If
maxis 1.1, i.e. less than thepolre:P, thepolreshould not be used for the highest bucket. Instead, we should be interpolating between thepolreof the second-highest bucket and the max value. Using thepolrefor the highest bucket is provably inaccurate, in this case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what you mean now:
You are referring to the case where the percentile lies between the second highest and the highest bucket, and therefore is interpolated, right?
That means that it is better to clamp the
ValueAndPreviousValuevalues before we do the interpolation, correct?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, wanna give it a try and add some tests to see what you get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 95166d9, which also adds a test which failed with the previous behaviour.