-
Notifications
You must be signed in to change notification settings - Fork 147
Reduced memory usage in live sets #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 3 files
ChAoSUnItY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asides from the definitions migration suggestion, all changes looking good.
| void var_list_ensure_capacity(var_list_t *list, int min_capacity) | ||
| { | ||
| if (list->capacity >= min_capacity) { | ||
| return; | ||
| } | ||
|
|
||
| int new_capacity = list->capacity ? list->capacity : 8; | ||
|
|
||
| while (new_capacity < min_capacity) { | ||
| new_capacity <<= 1; | ||
| } | ||
|
|
||
| var_t **new_elements = arena_alloc(BB_ARENA, new_capacity * HOST_PTR_SIZE); | ||
|
|
||
| if (list->elements) | ||
| memcpy(new_elements, list->elements, list->size * HOST_PTR_SIZE); | ||
|
|
||
| list->elements = new_elements; | ||
| list->capacity = new_capacity; | ||
| } | ||
|
|
||
| void var_list_add_var(var_list_t *list, var_t *var) | ||
| { | ||
| for (int i = 0; i < list->size; i++) { | ||
| if (list->elements[i] == var) | ||
| return; | ||
| } | ||
|
|
||
| var_list_ensure_capacity(list, list->size + 1); | ||
| list->elements[list->size++] = var; | ||
| } | ||
|
|
||
| void var_list_assign_array(var_list_t *list, var_t **data, int count) | ||
| { | ||
| var_list_ensure_capacity(list, count); | ||
| memcpy(list->elements, data, count * HOST_PTR_SIZE); | ||
| list->size = count; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move these definitions to globals.c as var_list_t may benefits other data structures for later enhancement?
| if (list->elements) | ||
| memcpy(new_elements, list->elements, list->size * HOST_PTR_SIZE); | ||
|
|
||
| list->elements = new_elements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super familiar with the arena_*() API, but don't we need to call arena_free() on the old pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arena_free() is used to free the whole arena, there's currently no freeing function specifically for allocated objects other than the internal arena block and arena itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the clarification.
So IIUC, this will actually cause a memory leak that the current API cannot resolve, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well not really, this is properly handled in global_release() already:
Line 1344 in 76e51ef
| arena_free(BB_ARENA); |
So no memory leak is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but global_release() should only be called when the entire compiler is about to exit?
Therefore, when we use arena_alloc() to grow the old list->elements, I thought this was still a leakage until the entire compiler finishes?
Even if we don't call global_release(), the operating system should reclaim all used memory when the executable terminates.
But perhaps this isn't actually a serious problem because even if we don't immediately reclaim this memory that's no longer in use, the memory usage is still much less than the original static array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOW, the arena API thinks this memory is occupied and won't reuse it, but in fact, no pointers are actually referencing it anymore?
| var_t **new_elements = arena_alloc(BB_ARENA, new_capacity * HOST_PTR_SIZE); | ||
|
|
||
| if (list->elements) | ||
| memcpy(new_elements, list->elements, list->size * HOST_PTR_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the new code might introduce several additional memory copy operations, I imagine it will be slower. I'm not sure what the performance impact will be, but I guess the 75% memory saving is very likely worth that cost.
However, this is a trade-off. Reading the commit message alone gives the impression that this is a pure improvement without any drawbacks.
I'm wondering if this performance difference has been measured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I measured the performance difference and added the results to the
commit message and PR description.
Live sets in shecc are implemented using arrays of pointers, which preallocate hundreds of unused slots, leading to significant memory waste. This patch replaces arrays of pointers with arena-backed vectors, reducing memory usage substantially. Although this change introduces additional memcpy during dynamic resize in "var_list_ensure_capacity", which may add overhead. The working set is much smaller and the better cache locality with fewer minor page faults outweigh the amortized memcpy cost. Measured (compiling src/main.c): - RSS: ~1.24GB -> ~305MB (≈ 75.4% reduction) - Elapsed time: 0.50s -> 0.17s (≈ 66% faster) - Minor page faults: ~309k -> ~76k Changes include: - Replace live sets with "var_list_t" for dynamic resizing. - Added helper routine in "ssa.c" for managing "var_list_t" instances. - Updated related logic in "reg-alloc.c" and "ssa.c".
382591b to
cb5bd04
Compare
|
Thank @icgmilk for contributing! |
|
Though this PR is merged, it is still unclear the cause of even-larger-allocation as seen in "before" benchmark, where RSS peaked at 1239840, worth to notice that in #184, the RSS in "before" benchmark the value only peaked at 758556. After the dynamic array implemented, the value has dropped to near to #184's optimized allocation result. |
|
The higher RSS peak (1239840) in the 'before' benchmark compared to PR #184 (758556) is likely due to the static array allocation approach used in this PR's baseline. The original code allocated fixed-size arrays (MAX_ANALYSIS_STACK_SIZE) for live_gen, live_kill, live_in, and live_out in every basic_block structure, causing significant memory overhead even when most blocks don't need the full capacity. The dynamic array implementation reduces this waste by allocating only what's needed, bringing memory usage closer to the optimized levels seen in #184. src/defs.h |
Yes, let's create an issue for tracking. |
|
Creating an issue to track the memory allocation anomaly is a good idea. The PR shows dynamic arrays reduced RSS from 1239840 to levels comparable to #184's optimized result, but the unexplained spike in the 'before' benchmark (1239840 vs #184's 758556) warrants investigation to understand what caused the even-larger-allocation behavior. |
|
In rv32emu, we have CI that automatically tracks performance changes when related code is touched. I'm wondering if it's worthwhile to open an issue to discuss implementing similar automated tracking for memory usage. This could help us catch these kinds of problems and reduce future regressions. |
|
Based on this PR's memory optimization work and the performance measurement discussion, implementing automated memory usage tracking would be valuable. The PR shows a 75% memory reduction but required manual measurement to assess performance trade-offs. Automated tracking could help catch memory regressions early and provide data for similar optimization decisions. |
Live sets in shecc are implemented using arrays of pointers, which preallocate hundreds of unused slots, leading to significant memory waste.
This patch replaces arrays of pointers with arena-backed vectors, reducing memory usage substantially.
Although this change introduces additional
memcpyduring dynamic resize invar_list_ensure_capacity, which may add overhead. The working set is much smaller and the better cache locality with fewer minor page faults outweigh the amortizedmemcpycost.Changes
var_list_tfor dynamic resizing.ssa.cfor managingvar_list_tinstances.
reg-alloc.candssa.c.Performance analysis for out/shecc src/main.c
Using
/usr/bin/time -vanduftraceto benchmark memory usage.Before
/usr/bin/time -v:uftrace:After
/usr/bin/time -v:uftrace:This patch reduces memory usage by 75.4% and improves execution time by 66%.
After this patch,
var_list_add_var,var_list_ensure_capacitybecome visible hotspots. However,arena_callocandmemsetelapsed times drop substantially, which outweigh the amortized hotspots cost.Summary by cubic
Replaced fixed-size live set arrays with arena-backed dynamic lists to cut memory usage and speed up compilation. On src/main.c, RSS drops ~75% (1.2GB → ~305MB) and runtime improves ~66%.
Written for commit cb5bd04. Summary will update automatically on new commits.