Feike/malloc performance improvements by feikesteenbergen · Pull Request #852 · timescale/timescaledb-toolkit

feikesteenbergen · 2025-04-04T11:41:11Z

Attempt to reduce malloc calls and improve performance

TLDR: reduces malloc calls by 70% without requiring additional memory.

Why? We used to reallocate the HashMap backing the sketches every compaction.
Now, we use a Vec to temporarily store the values while compacting, allowing
us to reuse the HashMap

this Vec can quickly be deallocated again if needed
however, we can reuse it if we know we are going to need multiple compcations, which is the case for merge_sketches.

This is a work in progress, and has some rough edges

Baseline

After having added some tests, these are the numbers:

random_stress: (*10): 8.07 seconds
Call Tree: (fuzzing_test)
- 91.2% add_value
  - 61.7% compact_buckets
  - 29.5% increment

memory

        observed        |         message         |  max   | malloc | free
------------------------+-------------------------+--------+--------+-------
 2025-04-04 07:35:25+00 | db3024a Add malloc test | 212528 |  86923 | 86919

Switching to a permanent Vec

Works, is faster, less malloc, yet high memory usage (during the tests)

Call Tree: (fuzzing_test)
- 81.6% add_value
  - 44.3% compact_buckets
  - 37.3% increment

memory

        observed        |                          message                           |   max    | malloc | free
------------------------+------------------------------------------------------------+----------+--------+-------
 2025-04-04 08:52:29+00 | 2b5155b Use a Vec and sorting to compact the SketchHashMap | 18852400 |  33416 | 33411
 2025-04-04 08:21:26+00 | 765fb32 Implement Ord for HashKey                          |   214880 |  86833 | 86824
 2025-04-04 08:18:43+00 | 5b35545 Implement Ord for HashKey, some tweaks             |   213312 |  86896 | 86890
 2025-04-04 08:14:30+00 | 5b35545 Implement Ord for HashKey, some tweaks             |   213120 |  86934 | 86927
 2025-04-04 07:44:38+00 | c4f13e5 Introduce swap variable for faster compaction      |   238928 |  86805 | 86798
 2025-04-04 07:39:47+00 | c4f13e5 Introduce swap variable for faster compaction      |   242894 |  86935 | 86929
 2025-04-04 07:35:25+00 | db3024a Add malloc test                                    |   212528 |  86923 | 86919

Switching to a temp Vec

Call Tree: (fuzzing_test)
- 88.5% add_value
  - 47.0% compact_buckets
  - 41.0% increment

memory


        observed        |                          message                           |   max    | malloc | free
------------------------+------------------------------------------------------------+----------+--------+-------
 2025-04-04 09:07:37+00 | f907c17 Remove permanent swap Vec in favor of temp one     |   139408 |  35532 | 35527
 2025-04-04 09:03:51+00 | b3a7214 Only clone other UDDSketch when needed             | 18787161 |  23418 | 23412
 2025-04-04 08:52:29+00 | 2b5155b Use a Vec and sorting to compact the SketchHashMap | 18852400 |  33416 | 33411
 2025-04-04 08:21:26+00 | 765fb32 Implement Ord for HashKey                          |   214880 |  86833 | 86824
 2025-04-04 07:44:38+00 | c4f13e5 Introduce swap variable for faster compaction      |   238928 |  86805 | 86798

Switching to a reusable Vec

random_stress: (*10): 8.07 seconds
Call Tree: (fuzzing_test)
- 82.4% add_value
  - 45.7% compact_buckets
  - 36.7% increment

memory

        observed        |                          message                           |   max    | malloc | free
------------------------+------------------------------------------------------------+----------+--------+-------
 2025-04-04 09:18:55+00 | 4de6be6 Reuse the swap Vec for repeated compaction calls   |   214192 |  25526 | 25521
 2025-04-04 09:07:37+00 | f907c17 Remove permanent swap Vec in favor of temp one     |   139408 |  35532 | 35527
 2025-04-04 09:06:33+00 | b3a7214 Only clone other UDDSketch when needed             |   217072 |  35522 | 35515
 2025-04-04 07:44:38+00 | c4f13e5 Introduce swap variable for faster compaction      |   238928 |  86805 | 86798

replaced https://docs.timescale.com/latest/using-timescaledb/continuous-aggregates with https://docs.timescale.com/use-timescale/latest/continuous-aggregates/ Signed-off-by: Anthony Shaw <109225504+xyztony@users.noreply.github.com>

This was deprecated: rust-lang/cargo#13349

This is needed for allowing certain sort operations to take place. It is also true that a HashKey has a total Ordering, so there wasn't much to change anyways.

Profiling showed us that we spend a lot of time in the compact function (60%+ for the current test suite). We therefore want to reduce: - malloc calls - cpu usage in general We do so by introducing a Vec that is only used during the compaction itself, the actual HashMap will be reused every time. The Vec itself will also stick around once it has been allocated. We also instead of walking through the HashMap by following the linked list, drain the HashMap, which should reduce the amount of comparisons required, and therefore reduce CPU

This reduces the amount of memory allocations drastically

feikesteenbergen · 2025-04-04T11:52:45Z

crates/udd-sketch/src/lib.rs


        while self.buckets.len() > self.max_buckets as usize {
-            self.compact_buckets();
+            self.compact_buckets(&mut Vec::new());


Could reuse a vec here as well

This implementation isn't optimal yet, however, it is there to ensure correctness

feikesteenbergen · 2025-04-10T13:13:34Z

Closing in favor of the simpler #853

tools/release and others added 11 commits April 2, 2025 11:42

start 1.20.0-dev

748c260

Update tdigest.md

176b345

replaced https://docs.timescale.com/latest/using-timescaledb/continuous-aggregates with https://docs.timescale.com/use-timescale/latest/continuous-aggregates/ Signed-off-by: Anthony Shaw <109225504+xyztony@users.noreply.github.com>

Support newer cargo versions

18ad1c4

This was deprecated: rust-lang/cargo#13349

Default to PostgreSQL 17

0e6b749

Add malloc test

db3024a

Introduce swap variable for faster compaction

c4f13e5

Implement Ord for HashKey

765fb32

This is needed for allowing certain sort operations to take place. It is also true that a HashKey has a total Ordering, so there wasn't much to change anyways.

Only clone other UDDSketch when needed

b3a7214

This reduces the amount of memory allocations drastically

Remove permanent swap Vec in favor of temp one

f907c17

Reuse the swap Vec for repeated compaction calls

4de6be6

feikesteenbergen commented Apr 4, 2025

View reviewed changes

feikesteenbergen added 12 commits April 6, 2025 20:54

wip

fb203c8

wip

1e2e436

wip

ff7cd4f

Exactly specify sketch size

3481ab5

Introduce UDDSketchRollup: Exact duplicate for now

216048f

wip

aac5990

wip

b0bb26e

Introduce swap variable

fbf6aa8

Tweak Swap Vec

4c0a115

Remove all manual calls to create swap

f91d0f4

Initial implementation of merge_items

fdbb26c

This implementation isn't optimal yet, however, it is there to ensure correctness

Still works!

ed0da08

feikesteenbergen closed this Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feike/malloc performance improvements#852

Feike/malloc performance improvements#852
feikesteenbergen wants to merge 23 commits intomainfrom
feike/malloc_performance_improvements

feikesteenbergen commented Apr 4, 2025 •

edited

Loading

Uh oh!

feikesteenbergen Apr 4, 2025

Uh oh!

feikesteenbergen commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

feikesteenbergen commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Attempt to reduce malloc calls and improve performance

Baseline

Switching to a permanent Vec

Switching to a temp Vec

Switching to a reusable Vec

Uh oh!

feikesteenbergen Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

feikesteenbergen commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feikesteenbergen commented Apr 4, 2025 •

edited

Loading