Per client request memory allocator (pooled memory allocator) #249

metux · 2025-06-24T17:29:05Z

metux
Jun 24, 2025
Maintainer

Within the request handlers, we often have to allocate small pieces of memory, which of course needs to be free'd after the request
is done. Right now, this is cluttering the error pathes, risking that something's forgotten and leaking.
Therefore proposing a pooled memory allocator on per-request basis.

Generic pooled allocator:

struct mempool --> allocator instance data
void mempool_alloc(struct mempool pool, size_t size) --> allocate size bytes in that pool
void mempool_inject(struct mempool* pool, void *ptr) --> inject existing malloc'ed chunk into the pool
void mempool_drop(struct mempool* pool, void *ptr) --> drop mempool_alloc'ed chunk from pool (will not be destroyed automatically)
void mempool_clear(struct mempool* pool) --> free all previously allocated chunks

per-request allocator:

a struct mempool in ClientRec (or a per-request substructure)
main dispatch calling mempool_clear() on it after each request is finished (before going to next one or asleep)

Sniperq2 · 2025-06-25T03:26:12Z

Sniperq2
Jun 25, 2025

Can you explain benefits of this move,
so we could understand internals of xserver better? Thank you.

0 replies

metux · 2025-06-25T12:56:40Z

metux
Jun 25, 2025
Maintainer Author

Can you explain benefits of this move, so we could understand internals of xserver better? Thank you.

Have a closer look at all the Proc*() functions: these are request handlers. The Xserver always processes one request after another. Within those functions we often have temporary memory allocations, that have to be free'd again after the request is finished. The idea of this is doing the clean up automatically.

0 replies

steven-vd · 2025-06-26T07:33:39Z

steven-vd
Jun 26, 2025
Collaborator

So like a per-client arena that gets cleared after every request?

I've been analyzing memory consumption and there's definitely an issue with lots of small, temporary allocations. I think addressing this would be a good idea, but I'm not sure per-client allocators are the best way to go about it. A fairly common pattern in web-servers is using a threadlocal scratch-buffer for temporary allocations, which seems simpler, and if we zero the memory upon freeing, there shouldn't be any security concerns.

picrel is a screenshot of temporary allocations from heaptrack for opening a simple window with Xlib and initializing OpenGL.

0 replies

metux · 2025-06-26T09:33:45Z

metux
Jun 26, 2025
Maintainer Author

So like a per-client arena that gets cleared after every request?

Sort of. Not a separate memory zone, just garbage collection.
The problem behind is we have lots of fast-exit cases (eg. errors), where we still need to take care to free everything that had been allocated dynamically.

An alternative would be using alloca() here, BUT: it's very fragile, it doesn't care of allocation failures, so attackers could easily smash the stack by bugous requests. That's why we're using classic malloc()/calloc() and always checking result.

A fixed preallocated zone would suffer from similar ugly problems: we can't tell the upper bound of required memory - we could allocate a really huge chunk and wasting a lot memory, and still can't be sure.

A fairly common pattern in web-servers is using a threadlocal scratch-buffer for temporary allocations, which seems simpler, and if we zero the memory upon freeing, there shouldn't be any security concerns.

That would require us to spawn a new thread for each single request (and block the main thread until the request is finished).

0 replies

steven-vd · 2025-06-26T11:27:43Z

steven-vd
Jun 26, 2025
Collaborator

Not a separate memory zone, just garbage collection.

Can you elaborate on what you mean by "mempool"? To me a memory zone and memory pool sound like the same idea.

An alternative would be using alloca() here, BUT: it's very fragile, it doesn't care of allocation failures, so attackers could easily smash the stack by bugous requests. That's why we're using classic malloc()/calloc() and always checking result.

I agree that alloca() is too dangerous to use.

A fixed preallocated zone would suffer from similar ugly problems: we can't tell the upper bound of required memory - we could allocate a really huge chunk and wasting a lot memory, and still can't be sure.

Yes, but instead of it being fixed, we can start with allocating a known minimum and grow as needed. Basically a dynamic-array / vector / ArrayList. Pretty much the same as malloc(), except that's a general solution and has to consider fragmentation, while we can just grow linearly and discard the entire buffer when it's no longer needed.

That would require us to spawn a new thread for each single request (and block the main thread until the request is finished)

I don't see why TLS would necessitate spawning a new thread for every request, and it should circumvent locking/blocking (except the mutex locks inside malloc when we have to grow) via every thread having its own allocator. The way I envision it is we have a thread-local scratch buffer or simple linear allocator that would just be reset (and for security reasons zeroed out) at the beginning of every request. That way we benefit from simplified memory management (just grow the buffer as needed, rarely having to make syscalls. No need to free during a request, everything gets effectively freed when the allocator gets reset) while getting ahead of future multi-threading concerns without having to worry about synchronization.

It would look something like:
alloc.c:

typedef struct {
    void* buf;
    size_t buf_size;
    void* next_free_byte;
} OurCoolAllocator;

/* this would of course require a compiler that supports __thread */
__thread OurCoolAllocator the_allocator;

void*
OurCoolAlloc(size_t n)
{
    if (the_allocator.next_free_byte + n > the_allocator.buf + the_allocator.buf_size) {
        /* realloc plus error handling, optionally zero new memory */
    }
    void* res = the_allocator.next_free_byte;
    the_allocator.next_free_byte += n;
    return res;
}

void*
ResetOurCoolAllocator(void)
{
    memset(the_allocator.buf, 0, (the_allocator.next_free_byte - the_allocator.buf));
    the_allocator.next_free_byte = the_allocator.buf;
}

foo.c:

int
ProcFoo(ClientPtr client)
{
    char* bar = OurCoolAlloc(123);
    if (!bar) return BadAlloc;
    /* ... */
    char* baz = OurCoolAlloc(456);
    if (!baz) return BadAlloc; /* no need to free bar, because it will all be cleaned up later */
}

with Dispatch() calling ResetOurCoolAllocator() whenever a new request comes in. If I understand correctly, your proposal would result in an identical API in terms of allocating and not having to worry about cleaning up?

From what I gather, multi-threading isn't an immediate concern, so we could also just make it a simple global arena allocator / scratch-buffer / whatever, without TLS.

Note that when I say "reset the allocator", I mean just zero out the data, and set ptr_to_next_free_byte back to the beginning. We could also free the memory, but that would cause a bunch of calls to malloc()/calloc()/free() and likely syscalls, which I think is worse than hanging on to a few kb.

0 replies

ArmanHayots · 2025-06-26T12:00:37Z

ArmanHayots
Jun 26, 2025

garbage collection

Good move to make software secure and reliable but what about performance costs?

0 replies

Vincent-Dalstra · 2025-06-26T12:06:59Z

Vincent-Dalstra
Jun 26, 2025

The problem behind is we have lots of fast-exit cases (eg. errors), where we still need to take care to free everything that had been allocated dynamically.

I found a solution to this using the gcc attribute 'cleanup', which makes the compiler insert a function call whenever the variable is about to go out of scope:

void cleanup_malloc_char(char** p) {
      free(*p);
}

void example_fn(const char* foo, bool condition)
{
    __attribute__((cleanup(cleanup_malloc_char)))
    char *copy_of_foo = strdup(foo);
    if (NULL == copy_of_foo)
        return;                                              // <-- free(copy_of_foo)

    if (true) {
        __attribute__((cleanup(cleanup_malloc_char)))
        char* buffer = (char*)malloc(1024);
        if (NULL == buffer)
            return;                                           // <-- free(buffer), free(copy_of_foo)

        if (condition)
            return;                                          // <-- free(buffer), free(copy_of_foo)
        }                                                         // <-- free(buffer)
   return;                                                   // <-- free(copy_of_foo)
}

Though I can't speak for how portable it is across different compilers, as I almost exclusively do embedded with gcc. I believe clang supports it as well, though.

0 replies

mikedld · 2025-06-26T12:15:26Z

mikedld
Jun 26, 2025
Collaborator

I found a solution to this using the gcc attribute 'cleanup'

Please don't with any compiler-specific non-standard solutions. If you want destructors, let's just move to C++.

1 reply

Sniperq2 Jul 27, 2025

I would like to mention that there already lot's of GCC compiler-specific non-standard solutions in the code.

cepelinas9000 · 2025-06-26T12:30:18Z

cepelinas9000
Jun 26, 2025
Collaborator

I found a solution to this using the gcc attribute 'cleanup'

Please don't with any compiler-specific non-standard solutions. If you want destructors, let's just move to C++.

Why? the cleanup attribute now is relative standard, both gcc and clang support (https://clang.llvm.org/docs/AttributeReference.html#cleanup).

0 replies

Vincent-Dalstra · 2025-06-26T12:52:48Z

Vincent-Dalstra
Jun 26, 2025

I found a solution to this using the gcc attribute 'cleanup'

Please don't with any compiler-specific non-standard solutions. If you want destructors, let's just move to C++.

No love lost for c++, believe you me.

This method just works really well (for me at least), and has the advantage of being processed at compile time, so I felt that I should suggest it.

If there are reasons it can't be used in this codebase, though (e.g. compilers than gcc and clang?) then so be it. I don't have the knowledge or experience to judge that myself though.

I will note that unlike, say, pure or fastcall, it won't just work without the attribute, so you can't use #if defined(__GNUC__) to make it work on compilers that do not support the attribute.

0 replies

steven-vd · 2025-06-26T12:57:56Z

steven-vd
Jun 26, 2025
Collaborator

A benefit of cleanup over a custom allocator is that we could still use libc functions like strdup(). Those kinds of functions should be fairly easy to write a custom version of that accepts an allocator though, so I don't think it's a major issue.

0 replies

Vincent-Dalstra · 2025-06-26T17:37:29Z

Vincent-Dalstra
Jun 26, 2025

I wonder if we're overthinking things. If the only goal is to track dynamic memory so it can be cleaned up later, is it actually necessary to write a custom allocator, as opposed to just using malloc/calloc/ etc?

By way of example:

struct garbageman{
    	int count;
    	void *list[1000]; // Obligatory TODO: dynamically resize when more slots are needed
    };

    void garbageman_init(struct garbageman *gman)
    {
    	gman->count = 0;
    }

    void garbageman_add(struct garbageman *gman, void *mem)
    {
    	if (NULL == mem)
    		return;
    	gman->list[gman->count] = mem;
    	gman->count++;
    	return;
    }

    // This 'leaks' a slot in the list until garbageman_clear() runs
    void *garbageman_remove(struct garbageman *gman, void *mem)
    {
    	for (int i=0; i < gman->count; i++) {
    		if (gman->list[count] == mem) {
    			gman->list[count] = NULL;
    			return mem;
    		}
    	}
    	return NULL;
    }

    void garbageman_clear(struct garbageman *gman)
    {
    	for (int i=0; i < gman->count, i++)
    		free(garbageman->list[i]);
    	gman->count = 0;
    }

Then an equivalent to mempool_alloc would be:

void *garbageman_malloc(struct garbageman *gman, size_t size)
{
    void *mem = malloc(size);
    garbageman_add(gman, mem);
    return mem;
}

But if calling code wanted to allocate memory itself (or use a function that allocates memory, similar to strdup()), it can do so and just add it to the garbage collector afterwards.

void example_fn(const char *foo)
{
    char *copy_of_foo = strdup(foo);
    char *copy_of_foo = realloc(copy_of_foo, 1024);
    garbageman_add(gman, copy_of_foo);
}

Something I could see becoming a problem is the danger posed by mistakenly free'ing something owned by the garbage collector. Crashes and/or stack trace would be at the call to _clear() and not the mistaken free(). This could make debugging more difficult. Even worse for realloc()

0 replies

mSparks43 · 2025-06-26T22:56:29Z

mSparks43
Jun 26, 2025

I was only thinking yesterday that something like this might be a good idea. malloc/calloc are not suitable for realtime stuff due to their non determinisms, a lot of the calls that I saw shifted to callocs could very likely benefit from a dedicated memory pool and handler optimising for performance and total memory use.

Possibly the best place to start is one or two of the allocations that see the most action, get a profile of how long is being spent in them, and then optimise from there.

0 replies

callmetango · 2025-07-26T14:02:09Z

callmetango
Jul 26, 2025
Maintainer

Thank you for your contribution! We currently restructured the "Ideas" discussions and accordingly this discussion will be moved to the X11Libre 2 Rfcs Of The Core Team · Discussions · GitHub category.

0 replies

Per client request memory allocator (pooled memory allocator) #249

Uh oh!

metux Jun 24, 2025 Maintainer

Replies: 14 comments · 1 reply

Uh oh!

Uh oh!

metux Jun 25, 2025 Maintainer Author

Uh oh!

steven-vd Jun 26, 2025 Collaborator

Uh oh!

metux Jun 26, 2025 Maintainer Author

Uh oh!

Uh oh!

steven-vd Jun 26, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

mikedld Jun 26, 2025 Collaborator

Uh oh!

Uh oh!

cepelinas9000 Jun 26, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

steven-vd Jun 26, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

callmetango Jul 26, 2025 Maintainer

metux
Jun 24, 2025
Maintainer

Replies: 14 comments 1 reply

metux
Jun 25, 2025
Maintainer Author

steven-vd
Jun 26, 2025
Collaborator

metux
Jun 26, 2025
Maintainer Author

steven-vd
Jun 26, 2025
Collaborator

mikedld
Jun 26, 2025
Collaborator

cepelinas9000
Jun 26, 2025
Collaborator

steven-vd
Jun 26, 2025
Collaborator

callmetango
Jul 26, 2025
Maintainer