Skip to content

Feike/test cicd#861

Closed
feikesteenbergen wants to merge 13 commits intomainfrom
feike/test_cicd
Closed

Feike/test cicd#861
feikesteenbergen wants to merge 13 commits intomainfrom
feike/test_cicd

Conversation

@feikesteenbergen
Copy link
Member

No description provided.

The previous implementation would create a UDDSketch (with a backing
HashMap) for every possible merge, and then call `compact_buckets` on
that in order to ensure the number of compactions between the target and
the source were equal.

Profiling this, we found out that in a `rollup` call of a lot of data,
the `compact_buckets` was pretty much the main contributor to all the
CPU time.

However, if we merge a different sketch into this sketch, we don't need
to actually compact_buckets all the time, we can directly consume the
keys and counts, and apply some compact_key calls to it.

This prevents a lot of heap allocations, as compact_buckets does a fully
copy of the backing `HashMap`, and then rebuilding it.

For a particular workload, this reduced the execution time from 30 to 12
seconds.
Profiling showed that this function is quite the hotspot. By changing
the implementation slightly, instead of walking the tree using the
Linked List, but iterate directly over the values, we improve the
throughput of certain CPU bound queries.

We've seen a reduction in time needed of > 50% for certain rollup
queries.

Due to the way entry() was called, and the way the Borrow Checker is
unable to help us keep 2 mutable references into a map, we were doing
double lookups into the backing HashMap pretty much always when this
function was called.

However, looking at the code, the only callers of this function only
wanted to either increment by 1, or by a count.

Therefore, make a function to actually support that usecase, which
doesn't have this problem with the Borrow Checker, as it doesn't have to
return a mutable reference: It actually does the work immediately
We got it slightly wrong previously, when we used the number of values
to reserve Heap memory, but we actually want the number of buckets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant