Skip to content

Conversation

@JonasKunz
Copy link
Contributor

Follow up for #137350, fixes the actual root cause and removes the workaround.

The FixedCapacityExponentialHistogram is mutable to allow for an efficient construction and reuse within the exponential histogram lib. However, the class is private to the library, meaning that it can only be accessed via the ExponentialHistogram interface from outside of the library.

The ExponentialHistogram interface does not allow for mutations, therefore FixedCapacityExponentialHistogram when viewed (and exclusively owned) this way should be thread safe.

Prior to this PR this was not the case: We lazily compute the sum of the counts for buckets and cache it as needed.
To make this as efficient as possible, e.g. even after adding more buckets to the histogram, we remember for how many buckets we cached the sum and only add new ones on top.

Calling valueCount() would mutate and read the relevant members in a loop, leading to a race condition.
This PR (a) reproduces the problem in a test and (b) fixes it by replacing the racy member accesses with single reference load and store, which are therefore atomic.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team v9.3.0 labels Oct 30, 2025
@JonasKunz JonasKunz added >non-issue :StorageEngine/Mapping The storage related side of mappings and removed needs:triage Requires assignment of a team area label labels Oct 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)


private record CachedCountsSum(int numBuckets, long countsSum) {}

private CachedCountsSum cachedCountsSum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud: this could be volatile or an AtomicReference so that other threads can see the cached value. But this doesn't affect correctness as the worst thing that could happen is that another thread needs to re-compute the value. That's probably not something we're trying to optimize for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was my though here too: AtomicReference would be the same as volatile here, as we don't do CAS.
However, we don't want to "pay" for volatile here, as it's okay if threads see the cached value too late, because then they will just compute it themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this one.

@JonasKunz JonasKunz merged commit 70ba2af into elastic:main Oct 30, 2025
34 checks passed

private record CachedCountsSum(int numBuckets, long countsSum) {}

private CachedCountsSum cachedCountsSum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this one.

* This works around that by running it once up front. Jonas will have a
* look at this one soon.
*/
histo.hashCode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reverting this bit.

chrisparrinello pushed a commit to chrisparrinello/elasticsearch that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants