Skip to content

Conversation

two-horned
Copy link

Mempool calls the wrapped allocator N times when preheating it with N items.
I think it is more appropriate, if it only allocates one time. This should be faster and more importantly,
in case preheating fails (due to an allocation error), the state of the Mempool is easier to reason about.

@two-horned two-horned changed the title Allocate only once for preheating. MemoryPool: Allocate only once for preheating. Oct 1, 2025
@two-horned two-horned force-pushed the mempool_better_preheat branch 3 times, most recently from 0b46bb8 to a56f564 Compare October 3, 2025 12:39
@Justus2308
Copy link
Contributor

Justus2308 commented Oct 3, 2025

I thought about this too in #23234, I'm not sure if there's any performance benefit to be had here though. The pool is already backed by an ArenaAllocator which batch allocates anyway so it probably won't make any difference in practice to ask for one large or for several small allocations. Asking for many smaller allocations might even allow the arena to use its batch allocations more optimally.

If you want, you can use this piece of code I stole from the article about the original implementation of MemoryPool to test if there's actually any performance benefit (you'd have to modify it slightly to use preheating):

code
pub fn main() !void {
    const gpa = std.heap.c_allocator;
    var rand: std.Random.DefaultPrng = .init(0);
    const r = rand.random();
    var open: std.ArrayList(*anyopaque) = try .initCapacity(gpa, 300);
    defer open.deinit(gpa);
    inline for (.{ [1]u8, [8 << 10]u8, [1 << 10 << 10]u8 }) |T| {
        var pool: std.heap.MemoryPoolExtra(T, .{}) = .init(gpa);
        defer pool.deinit();
        defer open.clearRetainingCapacity();
        for (0..2_500_000) |_| {
            const fill_rate = open.items.len / 256;
            const i = r.int(usize);
            if (i <= fill_rate) {
                pool.destroy(@ptrCast(@alignCast(open.swapRemove(i))));
            } else {
                const item = try pool.create();
                item[0] = 123;
                try open.append(gpa, @ptrCast(item));
            }
        }
    }
}

const std = @import("std");

Just a warning, running this benchmark can trigger the OOM killer (at least on my machine), if it does just adjust the allocation sizes until it doesn't.

Maybe a good solution would be to make preheat return a usize count of how many nodes were actually allocated, that way you're still able to preheat optimistically until you hit OOM and it's easier to reason about how many nodes are currently available.

@two-horned
Copy link
Author

two-horned commented Oct 3, 2025

I ran a slightly modified version of your test and the new implementation indeed performs better (I have also modified the code to avoid Out-of-Memory errors).

Generally speaking the performance benefits are

  1. less allocation calls
  2. less fragmented memory
  3. better memory locality for data (if objects from the pool interact with each other)

Here is the test results:

❯ hyperfine --warmup 3 'zig run -OReleaseFast test_mempool_new.zig' && sleep 10 && hyperfine --warmup 3 'zig run -OReleaseFast test_mempool.zig'
Benchmark 1: zig run -OReleaseFast test_mempool_new.zig
  Time (mean ± σ):     267.7 ms ±   2.3 ms    [User: 171.6 ms, System: 95.6 ms]
  Range (min … max):   265.3 ms … 273.7 ms    11 runs
 
Benchmark 1: zig run -OReleaseFast test_mempool.zig
  Time (mean ± σ):     323.8 ms ±   5.2 ms    [User: 224.0 ms, System: 98.4 ms]
  Range (min … max):   311.6 ms … 332.3 ms    10 runs
My code looked like this
pub fn main() !void {
   const gpa = std.heap.page_allocator;
   var rand: std.Random.DefaultPrng = .init(0);
   const r = rand.random();
   var open: std.ArrayList(*anyopaque) = try .initCapacity(gpa, 300);
   defer open.deinit(gpa);
   inline for (.{ [1]u8, [8 << 10]u8, [1 << 10 << 10]u8 }) |T| {
       const iterations = 25_000_000 / @sizeOf(T);
       var pool: MemoryPoolExtra(T, .{}) = try .initPreheated(gpa, iterations);
       defer pool.deinit();
       defer open.clearRetainingCapacity();
       for (0..iterations) |_| {
           const fill_rate = open.items.len / 256;
           const i = r.int(usize);
           if (i <= fill_rate) {
               pool.destroy(@ptrCast(@alignCast(open.swapRemove(i))));
           } else {
               const item = try pool.create();
               item[0] = 123;
               try open.append(gpa, @ptrCast(item));
           }
       }
   }
}

const std = @import("std");
const MemoryPoolExtra = @import("std").heap.MemoryPoolExtra;
// you can replace this line with (if you download the file from my commit):
// const MemoryPoolExtra = @import("memory_pool.zig").MemoryPoolExtra;

But in my opinion even more important is, that when preheating fails, you know you haven't allocated any memory. Without reading the implementation, the old version does not clarify some allocations may have succeeded.

@two-horned
Copy link
Author

I have just implemented the TODO. Resetting a Memory pool is now possible.

The modes free_all and retain_capacity work similar to the arena allocator versions, however retain_with_limit takes a limit of n Item instances (in contrast to n bytes).

Why don't we need to save a list of nodes? Because building nodes is actually pretty cheap with the new preheat solution: only one allocation call and concatenating the nodes. The allocation is also guaranteed to succeed (if the arena has succeeded in resetting).

Added basic test to showcase the capabilities.

@two-horned two-horned changed the title MemoryPool: Allocate only once for preheating. MemoryPool: Allocate only once for preheating & impl. proper resetting. Oct 4, 2025
@two-horned
Copy link
Author

two-horned commented Oct 4, 2025

PS: also, the memory pool isn't even allowed to save a list of previously allocated memory and work with that (if we want to reset the arena allocator too), as the previous version suggests, because the underlying arena allocator actually frees its buffers and allocates one single buffer for preheating purposes (with the desired capacity).

I think, when calling reset on the memory pool, a reset on the arena is desired also (for example to avoid fragmentation in the long run), so disabling resetting of the underlying arena allcoator just to able to create a new list of free_nodes in [non-armotized] O(1) is a big trade off. i will try to illustrate the pros and cons in this table:

Solution Resets Arena Time complexity De-fragmentation Easy to implement
Keep track of allocations no O(1) no no
Current (my) solution yes amortized O(1) yes yes

More uniform comments.
@two-horned two-horned force-pushed the mempool_better_preheat branch from a7106e9 to e79f8b0 Compare October 11, 2025 10:56
Respect arena reset result.
@two-horned two-horned force-pushed the mempool_better_preheat branch from e79f8b0 to 728b54c Compare October 11, 2025 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants