-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix thread-safety of FixedCapacityExponentialHistogram.valueCount #137357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
|
||
| private record CachedCountsSum(int numBuckets, long countsSum) {} | ||
|
|
||
| private CachedCountsSum cachedCountsSum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud: this could be volatile or an AtomicReference so that other threads can see the cached value. But this doesn't affect correctness as the worst thing that could happen is that another thread needs to re-compute the value. That's probably not something we're trying to optimize for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was my though here too: AtomicReference would be the same as volatile here, as we don't do CAS.
However, we don't want to "pay" for volatile here, as it's okay if threads see the cached value too late, because then they will just compute it themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this one.
|
|
||
| private record CachedCountsSum(int numBuckets, long countsSum) {} | ||
|
|
||
| private CachedCountsSum cachedCountsSum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this one.
| * This works around that by running it once up front. Jonas will have a | ||
| * look at this one soon. | ||
| */ | ||
| histo.hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reverting this bit.
Follow up for #137350, fixes the actual root cause and removes the workaround.
The
FixedCapacityExponentialHistogramis mutable to allow for an efficient construction and reuse within the exponential histogram lib. However, the class is private to the library, meaning that it can only be accessed via theExponentialHistograminterface from outside of the library.The
ExponentialHistograminterface does not allow for mutations, thereforeFixedCapacityExponentialHistogramwhen viewed (and exclusively owned) this way should be thread safe.Prior to this PR this was not the case: We lazily compute the sum of the counts for buckets and cache it as needed.
To make this as efficient as possible, e.g. even after adding more buckets to the histogram, we remember for how many buckets we cached the sum and only add new ones on top.
Calling
valueCount()would mutate and read the relevant members in a loop, leading to a race condition.This PR (a) reproduces the problem in a test and (b) fixes it by replacing the racy member accesses with single reference load and store, which are therefore atomic.