Feike/test cicd by feikesteenbergen · Pull Request #861 · timescale/timescaledb-toolkit

feikesteenbergen · 2025-04-15T09:02:31Z

No description provided.

The previous implementation would create a UDDSketch (with a backing HashMap) for every possible merge, and then call `compact_buckets` on that in order to ensure the number of compactions between the target and the source were equal. Profiling this, we found out that in a `rollup` call of a lot of data, the `compact_buckets` was pretty much the main contributor to all the CPU time. However, if we merge a different sketch into this sketch, we don't need to actually compact_buckets all the time, we can directly consume the keys and counts, and apply some compact_key calls to it. This prevents a lot of heap allocations, as compact_buckets does a fully copy of the backing `HashMap`, and then rebuilding it. For a particular workload, this reduced the execution time from 30 to 12 seconds.

Profiling showed that this function is quite the hotspot. By changing the implementation slightly, instead of walking the tree using the Linked List, but iterate directly over the values, we improve the throughput of certain CPU bound queries. We've seen a reduction in time needed of > 50% for certain rollup queries. Due to the way entry() was called, and the way the Borrow Checker is unable to help us keep 2 mutable references into a map, we were doing double lookups into the backing HashMap pretty much always when this function was called. However, looking at the code, the only callers of this function only wanted to either increment by 1, or by a count. Therefore, make a function to actually support that usecase, which doesn't have this problem with the Borrow Checker, as it doesn't have to return a mutable reference: It actually does the work immediately

We got it slightly wrong previously, when we used the number of values to reserve Heap memory, but we actually want the number of buckets.

feikesteenbergen added 13 commits April 15, 2025 11:03

Ensure we test multiple merge_sketch implementations

c6903f0

Reserve HashMap capacity if length is known

cb54e79

Prevent memory allocation when creating new Sketch

3865ac8

Use the correct size when allocating memory

0213571

We got it slightly wrong previously, when we used the number of values to reserve Heap memory, but we actually want the number of buckets.

Bump Rust version

a36e502

implement Ord for SketchHashKey

955ea36

Remove unneeded serde derives

dc29d2a

Constrain UDDSketch struct members to valid ranges

46a0f49

Tests still succeed

fe4554c

Compact using stack

ef0696c

Cargo fmt

46c6562

feikesteenbergen force-pushed the feike/test_cicd branch from 68dd130 to 46c6562 Compare April 15, 2025 09:04

feikesteenbergen closed this Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feike/test cicd#861

Feike/test cicd#861
feikesteenbergen wants to merge 13 commits intomainfrom
feike/test_cicd

feikesteenbergen commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

feikesteenbergen commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant