Skip to content

Feike/malloc performance improvements#852

Closed
feikesteenbergen wants to merge 23 commits intomainfrom
feike/malloc_performance_improvements
Closed

Feike/malloc performance improvements#852
feikesteenbergen wants to merge 23 commits intomainfrom
feike/malloc_performance_improvements

Conversation

@feikesteenbergen
Copy link
Member

@feikesteenbergen feikesteenbergen commented Apr 4, 2025

Attempt to reduce malloc calls and improve performance

TLDR: reduces malloc calls by 70% without requiring additional memory.

Why? We used to reallocate the HashMap backing the sketches every compaction.
Now, we use a Vec to temporarily store the values while compacting, allowing
us to reuse the HashMap

  • this Vec can quickly be deallocated again if needed
  • however, we can reuse it if we know we are going to need multiple compcations, which is the case for merge_sketches.

This is a work in progress, and has some rough edges

Baseline

After having added some tests, these are the numbers:

  • random_stress: (*10): 8.07 seconds
  • Call Tree: (fuzzing_test)
    • 91.2% add_value
      • 61.7% compact_buckets
      • 29.5% increment
  • memory
            observed        |         message         |  max   | malloc | free
    ------------------------+-------------------------+--------+--------+-------
     2025-04-04 07:35:25+00 | db3024a Add malloc test | 212528 |  86923 | 86919
    

Switching to a permanent Vec

Works, is faster, less malloc, yet high memory usage (during the tests)

  • Call Tree: (fuzzing_test)
    • 81.6% add_value
      • 44.3% compact_buckets
      • 37.3% increment
  • memory
            observed        |                          message                           |   max    | malloc | free
    ------------------------+------------------------------------------------------------+----------+--------+-------
     2025-04-04 08:52:29+00 | 2b5155b Use a Vec and sorting to compact the SketchHashMap | 18852400 |  33416 | 33411
     2025-04-04 08:21:26+00 | 765fb32 Implement Ord for HashKey                          |   214880 |  86833 | 86824
     2025-04-04 08:18:43+00 | 5b35545 Implement Ord for HashKey, some tweaks             |   213312 |  86896 | 86890
     2025-04-04 08:14:30+00 | 5b35545 Implement Ord for HashKey, some tweaks             |   213120 |  86934 | 86927
     2025-04-04 07:44:38+00 | c4f13e5 Introduce swap variable for faster compaction      |   238928 |  86805 | 86798
     2025-04-04 07:39:47+00 | c4f13e5 Introduce swap variable for faster compaction      |   242894 |  86935 | 86929
     2025-04-04 07:35:25+00 | db3024a Add malloc test                                    |   212528 |  86923 | 86919
    

Switching to a temp Vec

  • Call Tree: (fuzzing_test)
    • 88.5% add_value
      • 47.0% compact_buckets
      • 41.0% increment
  • memory
    
            observed        |                          message                           |   max    | malloc | free
    ------------------------+------------------------------------------------------------+----------+--------+-------
     2025-04-04 09:07:37+00 | f907c17 Remove permanent swap Vec in favor of temp one     |   139408 |  35532 | 35527
     2025-04-04 09:03:51+00 | b3a7214 Only clone other UDDSketch when needed             | 18787161 |  23418 | 23412
     2025-04-04 08:52:29+00 | 2b5155b Use a Vec and sorting to compact the SketchHashMap | 18852400 |  33416 | 33411
     2025-04-04 08:21:26+00 | 765fb32 Implement Ord for HashKey                          |   214880 |  86833 | 86824
     2025-04-04 07:44:38+00 | c4f13e5 Introduce swap variable for faster compaction      |   238928 |  86805 | 86798
    

Switching to a reusable Vec

  • random_stress: (*10): 8.07 seconds

  • Call Tree: (fuzzing_test)

    • 82.4% add_value
      • 45.7% compact_buckets
      • 36.7% increment
  • memory

            observed        |                          message                           |   max    | malloc | free
    ------------------------+------------------------------------------------------------+----------+--------+-------
     2025-04-04 09:18:55+00 | 4de6be6 Reuse the swap Vec for repeated compaction calls   |   214192 |  25526 | 25521
     2025-04-04 09:07:37+00 | f907c17 Remove permanent swap Vec in favor of temp one     |   139408 |  35532 | 35527
     2025-04-04 09:06:33+00 | b3a7214 Only clone other UDDSketch when needed             |   217072 |  35522 | 35515
     2025-04-04 07:44:38+00 | c4f13e5 Introduce swap variable for faster compaction      |   238928 |  86805 | 86798
    

tools/release and others added 11 commits April 2, 2025 11:42
This is needed for allowing certain sort operations to take place.
It is also true that a HashKey has a total Ordering, so there wasn't
much to change anyways.
Profiling showed us that we spend a lot of time in the compact function
(60%+ for the current test suite). We therefore want to reduce:

- malloc calls
- cpu usage in general

We do so by introducing a Vec that is only used during the compaction
itself, the actual HashMap will be reused every time. The Vec itself
will also stick around once it has been allocated.

We also instead of walking through the HashMap by following the linked
list, drain the HashMap, which should reduce the amount of comparisons
required, and therefore reduce CPU
This reduces the amount of memory allocations drastically

while self.buckets.len() > self.max_buckets as usize {
self.compact_buckets();
self.compact_buckets(&mut Vec::new());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could reuse a vec here as well

@feikesteenbergen
Copy link
Member Author

Closing in favor of the simpler #853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants