Closed
Conversation
The previous implementation would create a UDDSketch (with a backing HashMap) for every possible merge, and then call `compact_buckets` on that in order to ensure the number of compactions between the target and the source were equal. Profiling this, we found out that in a `rollup` call of a lot of data, the `compact_buckets` was pretty much the main contributor to all the CPU time. However, if we merge a different sketch into this sketch, we don't need to actually compact_buckets all the time, we can directly consume the keys and counts, and apply some compact_key calls to it. This prevents a lot of heap allocations, as compact_buckets does a fully copy of the backing `HashMap`, and then rebuilding it. For a particular workload, this reduced the execution time from 30 to 12 seconds.
Profiling showed that this function is quite the hotspot. By changing the implementation slightly, instead of walking the tree using the Linked List, but iterate directly over the values, we improve the throughput of certain CPU bound queries. We've seen a reduction in time needed of > 50% for certain rollup queries. Due to the way entry() was called, and the way the Borrow Checker is unable to help us keep 2 mutable references into a map, we were doing double lookups into the backing HashMap pretty much always when this function was called. However, looking at the code, the only callers of this function only wanted to either increment by 1, or by a count. Therefore, make a function to actually support that usecase, which doesn't have this problem with the Borrow Checker, as it doesn't have to return a mutable reference: It actually does the work immediately
We got it slightly wrong previously, when we used the number of values to reserve Heap memory, but we actually want the number of buckets.
68dd130 to
46c6562
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.