Per client request memory allocator (pooled memory allocator) #249
Replies: 14 comments 1 reply
-
Can you explain benefits of this move, |
Beta Was this translation helpful? Give feedback.
-
Have a closer look at all the Proc*() functions: these are request handlers. The Xserver always processes one request after another. Within those functions we often have temporary memory allocations, that have to be free'd again after the request is finished. The idea of this is doing the clean up automatically. |
Beta Was this translation helpful? Give feedback.
-
So like a per-client arena that gets cleared after every request? I've been analyzing memory consumption and there's definitely an issue with lots of small, temporary allocations. I think addressing this would be a good idea, but I'm not sure per-client allocators are the best way to go about it. A fairly common pattern in web-servers is using a threadlocal scratch-buffer for temporary allocations, which seems simpler, and if we zero the memory upon freeing, there shouldn't be any security concerns. picrel is a screenshot of temporary allocations from |
Beta Was this translation helpful? Give feedback.
-
Sort of. Not a separate memory zone, just garbage collection. An alternative would be using alloca() here, BUT: it's very fragile, it doesn't care of allocation failures, so attackers could easily smash the stack by bugous requests. That's why we're using classic malloc()/calloc() and always checking result. A fixed preallocated zone would suffer from similar ugly problems: we can't tell the upper bound of required memory - we could allocate a really huge chunk and wasting a lot memory, and still can't be sure.
That would require us to spawn a new thread for each single request (and block the main thread until the request is finished). |
Beta Was this translation helpful? Give feedback.
-
Can you elaborate on what you mean by "mempool"? To me a memory zone and memory pool sound like the same idea.
I agree that alloca() is too dangerous to use.
Yes, but instead of it being fixed, we can start with allocating a known minimum and grow as needed. Basically a dynamic-array / vector / ArrayList. Pretty much the same as malloc(), except that's a general solution and has to consider fragmentation, while we can just grow linearly and discard the entire buffer when it's no longer needed.
I don't see why TLS would necessitate spawning a new thread for every request, and it should circumvent locking/blocking (except the mutex locks inside malloc when we have to grow) via every thread having its own allocator. The way I envision it is we have a thread-local scratch buffer or simple linear allocator that would just be reset (and for security reasons zeroed out) at the beginning of every request. That way we benefit from simplified memory management (just grow the buffer as needed, rarely having to make syscalls. No need to free during a request, everything gets effectively freed when the allocator gets reset) while getting ahead of future multi-threading concerns without having to worry about synchronization. It would look something like: typedef struct {
void* buf;
size_t buf_size;
void* next_free_byte;
} OurCoolAllocator;
/* this would of course require a compiler that supports __thread */
__thread OurCoolAllocator the_allocator;
void*
OurCoolAlloc(size_t n)
{
if (the_allocator.next_free_byte + n > the_allocator.buf + the_allocator.buf_size) {
/* realloc plus error handling, optionally zero new memory */
}
void* res = the_allocator.next_free_byte;
the_allocator.next_free_byte += n;
return res;
}
void*
ResetOurCoolAllocator(void)
{
memset(the_allocator.buf, 0, (the_allocator.next_free_byte - the_allocator.buf));
the_allocator.next_free_byte = the_allocator.buf;
} foo.c: int
ProcFoo(ClientPtr client)
{
char* bar = OurCoolAlloc(123);
if (!bar) return BadAlloc;
/* ... */
char* baz = OurCoolAlloc(456);
if (!baz) return BadAlloc; /* no need to free bar, because it will all be cleaned up later */
} with From what I gather, multi-threading isn't an immediate concern, so we could also just make it a simple global arena allocator / scratch-buffer / whatever, without TLS. Note that when I say "reset the allocator", I mean just zero out the data, and set ptr_to_next_free_byte back to the beginning. We could also free the memory, but that would cause a bunch of calls to malloc()/calloc()/free() and likely syscalls, which I think is worse than hanging on to a few kb. |
Beta Was this translation helpful? Give feedback.
-
Good move to make software secure and reliable but what about performance costs? |
Beta Was this translation helpful? Give feedback.
-
I found a solution to this using the gcc attribute 'cleanup', which makes the compiler insert a function call whenever the variable is about to go out of scope: void cleanup_malloc_char(char** p) {
free(*p);
}
void example_fn(const char* foo, bool condition)
{
__attribute__((cleanup(cleanup_malloc_char)))
char *copy_of_foo = strdup(foo);
if (NULL == copy_of_foo)
return; // <-- free(copy_of_foo)
if (true) {
__attribute__((cleanup(cleanup_malloc_char)))
char* buffer = (char*)malloc(1024);
if (NULL == buffer)
return; // <-- free(buffer), free(copy_of_foo)
if (condition)
return; // <-- free(buffer), free(copy_of_foo)
} // <-- free(buffer)
return; // <-- free(copy_of_foo)
} Though I can't speak for how portable it is across different compilers, as I almost exclusively do embedded with gcc. I believe clang supports it as well, though. |
Beta Was this translation helpful? Give feedback.
-
Please don't with any compiler-specific non-standard solutions. If you want destructors, let's just move to C++. |
Beta Was this translation helpful? Give feedback.
-
Why? the |
Beta Was this translation helpful? Give feedback.
-
No love lost for c++, believe you me. This method just works really well (for me at least), and has the advantage of being processed at compile time, so I felt that I should suggest it. If there are reasons it can't be used in this codebase, though (e.g. compilers than gcc and clang?) then so be it. I don't have the knowledge or experience to judge that myself though. I will note that unlike, say, |
Beta Was this translation helpful? Give feedback.
-
A benefit of |
Beta Was this translation helpful? Give feedback.
-
I wonder if we're overthinking things. If the only goal is to track dynamic memory so it can be cleaned up later, is it actually necessary to write a custom allocator, as opposed to just using malloc/calloc/ etc? By way of example: struct garbageman{
int count;
void *list[1000]; // Obligatory TODO: dynamically resize when more slots are needed
};
void garbageman_init(struct garbageman *gman)
{
gman->count = 0;
}
void garbageman_add(struct garbageman *gman, void *mem)
{
if (NULL == mem)
return;
gman->list[gman->count] = mem;
gman->count++;
return;
}
// This 'leaks' a slot in the list until garbageman_clear() runs
void *garbageman_remove(struct garbageman *gman, void *mem)
{
for (int i=0; i < gman->count; i++) {
if (gman->list[count] == mem) {
gman->list[count] = NULL;
return mem;
}
}
return NULL;
}
void garbageman_clear(struct garbageman *gman)
{
for (int i=0; i < gman->count, i++)
free(garbageman->list[i]);
gman->count = 0;
} Then an equivalent to mempool_alloc would be: void *garbageman_malloc(struct garbageman *gman, size_t size)
{
void *mem = malloc(size);
garbageman_add(gman, mem);
return mem;
} But if calling code wanted to allocate memory itself (or use a function that allocates memory, similar to strdup()), it can do so and just add it to the garbage collector afterwards. void example_fn(const char *foo)
{
char *copy_of_foo = strdup(foo);
char *copy_of_foo = realloc(copy_of_foo, 1024);
garbageman_add(gman, copy_of_foo);
} Something I could see becoming a problem is the danger posed by mistakenly free'ing something owned by the garbage collector. Crashes and/or stack trace would be at the call to _clear() and not the mistaken free(). This could make debugging more difficult. Even worse for realloc() |
Beta Was this translation helpful? Give feedback.
-
I was only thinking yesterday that something like this might be a good idea. malloc/calloc are not suitable for realtime stuff due to their non determinisms, a lot of the calls that I saw shifted to callocs could very likely benefit from a dedicated memory pool and handler optimising for performance and total memory use. Possibly the best place to start is one or two of the allocations that see the most action, get a profile of how long is being spent in them, and then optimise from there. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your contribution! We currently restructured the "Ideas" discussions and accordingly this discussion will be moved to the X11Libre 2 Rfcs Of The Core Team · Discussions · GitHub category. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Within the request handlers, we often have to allocate small pieces of memory, which of course needs to be free'd after the request
is done. Right now, this is cluttering the error pathes, risking that something's forgotten and leaking.
Therefore proposing a pooled memory allocator on per-request basis.
Generic pooled allocator:
size
bytes in that poolper-request allocator:
Beta Was this translation helpful? Give feedback.
All reactions