Skip to content

Conversation

@ChAoSUnItY
Copy link
Collaborator

@ChAoSUnItY ChAoSUnItY commented Mar 27, 2025

In this patch, by introducing arena allocator, also known as regional allocator, and optimizing calloc byte buffer clearing algorithm, the compiler's memory usage, page fault occurrence frequency, and performance are significantly improved. The resident set size has been reduced up to 2/3 ~ 1/2.

Arena allocator's introduction allows future development safely allocating structures without worrying their lifetime. This has already been implemented in SSA unit's "basic_block_t" allocation, which now doesn't require traversal to free up all connected detached basic blocks.

In addition, calloc's byte buffer clearing algorithm has been changed into bulk clearing algorithm, which clears up 4 bytes at a time when requested byte buffer's size is large enough.

Performance analysis for out/shecc src/main.c (built by GCC)

uftrace comparison

image

RSS comparison

/usr/bin/time -v ./out/shecc src/main.c:

        Command being timed: "./out/shecc src/main.c"
        User time (seconds): 0.28
        System time (seconds): 1.10
        Percent of CPU this job got: 94%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.46
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 758556
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 188705
        Voluntary context switches: 0
        Involuntary context switches: 1
        Swaps: 0
        File system inputs: 0
        File system outputs: 464
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

/usr/bin/time -v ./out/shecc-opt src/main.c

        Command being timed: "./out/shecc-opt src/main.c"
        User time (seconds): 0.38
        System time (seconds): 0.31
        Percent of CPU this job got: 93%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.74
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 321748
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 89749
        Voluntary context switches: 0
        Involuntary context switches: 1
        Swaps: 0
        File system inputs: 0
        File system outputs: 464
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Performance analysis for out/shecc-stage1.elf src/main.c

RSS comparison

/usr/bin/time -v ./out/shecc-stage1.elf src/main.c:

        Command being timed: "./out/shecc-stage1.elf src/main.c"
        User time (seconds): 4.77
        System time (seconds): 3.61
        Percent of CPU this job got: 95%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.81
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 993232
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 245225
        Voluntary context switches: 8
        Involuntary context switches: 20
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

/usr/bin/time -v ./out/shecc-stage1-opt.elf src/main.c

        Command being timed: "./out/shecc-stage1-opt.elf src/main.c"
        User time (seconds): 4.67
        System time (seconds): 2.13
        Percent of CPU this job got: 94%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.20
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 658928
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 161604
        Voluntary context switches: 5
        Involuntary context switches: 7
        Swaps: 0
        File system inputs: 0
        File system outputs: 464
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Summary by Bito

This pull request introduces an arena allocator to enhance memory management, significantly reducing memory usage and improving performance. The calloc function has been optimized for bulk memory clearing, and several structures have been updated to utilize the new allocator. Unused functions have been removed for a cleaner codebase, and comprehensive documentation has been added.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 2

@ChAoSUnItY ChAoSUnItY force-pushed the feat/arena branch 2 times, most recently from 512ec04 to 27c2fb0 Compare April 4, 2025 12:09
In this patch, by introducing arena allocator, also known as regional
allocator, and optimizing calloc byte buffer clearing algorithm, the
compiler's memory usage, page fault occurrence frequency, and
performance are significantly improved. The resident set size has been
reduced up to 2/3.

Arena allocator's introduction allows future development safely
allocating structures without worrying their lifetime. This has already
been implemented in SSA unit's "basic_block_t" allocation, which now
doesn't require traversal to free up all connected detached basic
blocks.

In addition, calloc's byte buffer clearing algorithm has been changed
into bulk clearing algorithm, which clears up 4 bytes at a time when
requested byte buffer's size is large enough.
@jserv jserv merged commit a603970 into sysprog21:master Apr 5, 2025
6 checks passed
@jserv
Copy link
Collaborator

jserv commented Apr 5, 2025

Thank @ChAoSUnItY for contributing!

@ChAoSUnItY ChAoSUnItY deleted the feat/arena branch April 28, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants