The Windows implementation currently works like this:
- Reserve a chunk of address space large enough to contain an aligned address where our allocation can fit
- Calculate an aligned address in the returned space
- Free the reserved address space
- Allocate memory at the calculated address
Between the last 2 steps, a different thread could reserve the freed address space and make the allocation fail. To avoid wasting memory, the reserved space somehow has to be freed. Maybe there's a way to free a part of the chunk of address space allocated with VirtualAlloc? That way, we could keep the space for the allocation and free the rest.