Skip to content

Conversation

@icgmilk
Copy link

@icgmilk icgmilk commented Oct 26, 2025

Live sets in shecc are implemented using arrays of pointers, which preallocate hundreds of unused slots, leading to significant memory waste.

This patch replaces arrays of pointers with arena-backed vectors, reducing memory usage substantially.

Although this change introduces additional memcpy during dynamic resize in var_list_ensure_capacity, which may add overhead. The working set is much smaller and the better cache locality with fewer minor page faults outweigh the amortized memcpy cost.

Changes

  • Replace live sets with var_list_t for dynamic resizing.
  • Added helped routine in ssa.c for managing var_list_t
    instances.
  • Updated related logic in reg-alloc.c and ssa.c.

Performance analysis for out/shecc src/main.c

Using /usr/bin/time -v and uftrace to benchmark memory usage.

Before

/usr/bin/time -v:

    Command being timed: "./out/shecc src/main.c"
    User time (seconds): 0.16
    System time (seconds): 0.33
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.50
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 1239840
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 309819
    Voluntary context switches: 1
    Involuntary context switches: 11
    Swaps: 0
    File system inputs: 0
    File system outputs: 728
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

uftrace:

  Total time   Self time       Calls  Function
  ==========  ==========  ==========  ====================
    1.098  s    7.064 us           1  main
  633.678 ms    0.283 us           1  parse
  623.006 ms   63.503 us           1  parse_internal
  619.869 ms  253.340 us         544  read_global_statement
  616.525 ms  166.630 us         480  read_global_decl
  605.808 ms  130.668 us         382  read_func_body
  598.446 ms    1.048 ms        1582  read_code_block
  593.883 ms    7.064 ms        9749  read_body_statement
  413.070 ms    2.906 ms       13392  read_expr
  379.091 ms  904.960 us        1415  handle_if_statement
  352.620 ms   20.644 ms      181566  arena_calloc
  318.750 ms  318.750 ms      181566  memset
  317.000 ms    2.846 ms       37386  bb_create
  208.802 ms  320.579 us        3780  read_func_call
  206.616 ms    1.873 ms        3784  read_func_parameters
  195.706 ms    8.985 ms       16182  read_expr_operand
  124.341 ms    1.945 ms        2622  read_body_assignment
  120.245 ms  997.142 us           1  liveness_analysis
  109.172 ms    8.682 ms       34824  recompute_live_out
   93.959 ms   48.080 ms       44640  compute_live_in
   86.006 ms    1.557 us           1  ssa_build
   80.690 ms   57.090 us          99  handle_while_statement
   71.514 ms   34.394 ms      595721  lex_accept
   65.466 ms    5.129 ms           1  optimize
   60.946 ms  702.174 us       10843  find_var
   59.587 ms   20.101 ms       99257  bb_forward_traversal
   59.197 ms    1.704 ms           1  code_generate
   57.476 ms    7.001 ms       65990  emit_ph2_ir
   54.936 ms   23.109 ms       80617  lex_token_impl
   49.829 ms    7.871 ms       10570  read_lvalue
   48.023 ms   25.824 ms       13465  find_local_var
   44.431 ms   44.431 ms     1425503  strcmp
   43.255 ms    3.469 ms       85482  emit
   39.788 ms   10.872 ms       85486  elf_write_int
   39.229 ms    9.305 ms           1  reg_alloc
   38.262 ms    1.588 us           1  global_release
   38.211 ms  241.791 us           5  arena_free
   37.978 ms  851.189 us        4965  arena_block_free
   37.648 ms   15.501 ms      617546  lex_accept_internal
   37.194 ms   37.194 ms        9977  free
   32.715 ms    8.727 ms       15773  get_operator
   30.687 ms    1.701 ms       38319  lex_expect
   29.366 ms  178.269 us           1  solve_phi_params
   29.061 ms   20.699 ms      343524  strbuf_putc
   28.990 ms    1.941 ms       38322  lex_expect_internal
   28.864 ms    6.567 ms       11029  bb_solve_phi_params
   27.763 ms  188.973 us         664  handle_return_statement
   27.322 ms    3.386 ms       60742  require_var
   25.231 ms   25.231 ms      814966  add_live_in
   23.624 ms   23.624 ms      862045  var_check_killed
   23.206 ms   12.732 ms           1  elf_generate
   23.189 ms    5.623 ms      114285  hashmap_get
   22.990 ms   13.364 ms        8551  find_type
   22.550 ms   12.852 ms           1  peephole
   22.240 ms   17.829 ms       55145  bb_backward_traversal
   19.836 ms   10.584 ms        4010  find_global_var
   18.376 ms   53.773 us           1  solve_globals
   17.571 ms   11.946 ms      114308  hashmap_get_node
   17.381 ms    2.919 ms       31973  new_name
   16.646 ms    8.222 ms      315431  arena_alloc
   15.544 ms    4.820 ms       11029  bb_solve_globals
   12.291 ms  476.899 us        2395  read_full_var_decl
   12.143 ms    2.692 ms           1  use_chain_build
   10.671 ms    3.388 ms          35  load_source_file
   10.362 ms    9.104 ms      275260  read_char
   10.300 ms   10.300 ms      368921  fputc
    9.630 ms   62.466 us           1  build_reversed_rpo
    9.541 ms    5.668 ms       49779  const_folding
    9.451 ms    2.876 ms       40300  use_chain_add_tail
    9.356 ms    2.843 ms        7389  read_preproc_directive
    8.902 ms    8.798 ms      361867  strbuf_extend
    8.877 ms   46.757 us           1  check_var_cross_init
    8.686 ms  337.643 us         433  finalize_logical
    8.443 ms  726.461 us        4965  arena_block_create
    8.253 ms    4.233 ms       11025  bb_reset_and_solve_locals
    8.161 ms    4.532 ms           1  cfg_flatten
    8.142 ms    4.700 ms       46606  refresh
    7.916 ms    6.032 ms       75214  bb_add_ph2_ir
    7.743 ms    7.743 ms        9961  malloc
    7.553 ms    6.905 ms      252852  lex_peek
    7.423 ms   25.332 us           1  build_rdom
    7.348 ms    4.536 ms       46472  add_insn
    7.171 ms    2.235 ms       31111  var_add_killed_bb
    6.987 ms    1.398 ms       28275  find_alias
    6.924 ms  720.132 us       14072  lex_ident
    6.851 ms    1.386 ms       28279  lookup_keyword
    6.619 ms   79.290 us           1  build_rpo
    6.559 ms    1.118 ms       23439  find_func
    6.408 ms    3.750 ms       11029  dce_insn
    6.241 ms    1.497 ms       14150  lex_ident_internal
    6.085 ms  602.885 us       10718  read_ternary_operation
    5.986 ms  758.748 us        2474  read_inner_var_decl
    5.980 ms    5.980 ms       44640  merge_live_in
    5.956 ms    1.538 ms       11029  bb_check_var_cross_init
    5.726 ms  221.037 us         408  read_parameter_list_decl
    5.607 ms    3.433 ms       18343  strbuf_puts
    5.559 ms    5.559 ms      223284  is_alnum
    5.474 ms   40.772 us           1  build_df
    5.432 ms  112.327 us         621  read_logical
    5.311 ms   43.382 us           1  unwind_phi
    5.303 ms   25.715 us           1  build_rdf
    5.198 ms   55.196 us           1  build_dom
    5.164 ms    1.953 ms       34796  prepare_operand
    4.694 ms    4.020 ms           1  build_idom
    4.417 ms    4.417 ms       81734  is_dominate
    4.291 ms    3.210 ms       48234  cse
    4.244 ms  649.246 us        3181  read_numeric_param
    3.929 ms    3.929 ms      109915  check_live_out
    3.693 ms    1.763 ms       26749  gen_name_to
    3.628 ms    2.909 ms       26820  prepare_dest
    3.586 ms    3.586 ms      118222  hashmap_hash_index
    3.442 ms  582.526 us        7223  intern_string
    3.349 ms    3.349 ms       11029  bb_build_df
    3.200 ms  764.568 us       15466  find_macro
    3.105 ms    2.138 ms       36838  rename_var
    2.894 ms    1.546 ms       11029  bb_build_rdom
    2.829 ms    1.994 ms       11029  bb_build_dom
    2.816 ms    1.229 ms       10647  spill_live_out
    2.748 ms  800.884 us       17905  load_var
    2.728 ms    1.592 ms       11029  bb_unwind_phi
    2.666 ms  561.940 us       11007  find_constant
    2.657 ms    2.657 ms       49779  dce_init_mark
    2.648 ms  610.876 us        5172  spill_alive
    2.461 ms  382.195 us        1141  read_literal_param
    2.344 ms    2.344 ms       86644  skip_whitespace
    2.317 ms    2.317 ms       58326  bb_add_killed_var
    2.157 ms  920.529 us       10503  spill_var
    2.129 ms  232.998 us        4269  require_typed_var
    2.122 ms   38.254 us         148  handle_address_of_operator
    2.086 ms    1.091 ms           1  solve_phi_insertion
    2.000 ms  734.165 us           1  dce_sweep
    1.929 ms    1.929 ms       26749  __sprintf_chk
    1.846 ms    7.212 us           2  parse_array_init
    1.835 ms    1.835 ms       65608  update_elf_offset
    1.832 ms    1.052 ms        4315  find_member
    1.759 ms   18.493 us          29  parse_struct_field_init
    1.732 ms    1.202 ms           1  build_r_idom
    1.722 ms    1.722 ms           1  arm_lower
    1.718 ms  213.176 us           3  cppd_control_flow_skip_lines
    1.704 ms    1.704 ms       65917  add_existed_ph2_ir
    1.633 ms    1.184 ms       18394  __lw
    1.580 ms   98.344 us        2258  lex_token
    1.521 ms    1.521 ms       37153  update_consumed
    1.503 ms   87.309 us         384  add_func
    1.455 ms    1.057 ms       16454  __mov_i
    1.447 ms    1.447 ms       39359  strcpy
    1.375 ms    1.375 ms       49779  mark_const
    1.371 ms    1.318 ms       48558  insn_fusion
    1.348 ms    1.348 ms       19819  rdom_connect
    1.289 ms    1.289 ms       48558  triple_pattern_optimization
    1.276 ms  735.143 us        8715  resize_var
    1.264 ms    1.264 ms       11029  is_block_unreachable
    1.262 ms    1.262 ms       48381  eval_const_arithmetic
    1.247 ms    1.247 ms       33281  strncmp
    1.237 ms    1.237 ms       47058  redundant_move_elim
    1.236 ms    1.236 ms       48238  eval_const_unary
    1.233 ms  895.037 us       13899  __sw
    1.219 ms    1.219 ms       47606  find_in_regs
    1.218 ms    1.218 ms       47057  eliminate_load_store_pairs
    1.215 ms    1.215 ms         382  simple_sccp
    1.167 ms    1.167 ms       47058  strength_reduction
    1.143 ms    1.143 ms       47058  comparison_optimization
    1.136 ms  445.881 us        3448  append_unwound_phi_insn
    1.134 ms    1.134 ms       47058  algebraic_simplification
    1.134 ms    1.134 ms       47058  bitwise_optimization
    1.089 ms    1.089 ms       39295  get_stack_top_subscript_var
    1.080 ms    1.080 ms       45226  is_cse_candidate
    1.045 ms  202.780 us        2030  hashmap_put
    1.032 ms    1.032 ms       16466  fgets
    1.004 ms    1.004 ms       18208  memcpy
  983.798 us  266.239 us        3448  append_phi_operand
  971.813 us  971.813 us       18725  strncpy
  956.435 us   23.608 us         384  arena_alloc_func
  939.254 us  209.298 us        4021  arena_alloc_symbol
  919.933 us  919.933 us       38617  vreg_clear_phys
  912.752 us  386.331 us        9320  fn_add_global
  898.399 us  898.399 us       37281  opstack_pop
  893.857 us  893.857 us       37281  opstack_push
  887.018 us  887.018 us       24522  strlen
  834.952 us  834.952 us       20886  dom_connect
  826.691 us  826.691 us       31973  pop_name
  798.556 us  798.556 us       32730  arm_transfer
  785.149 us   38.125 us         107  handle_single_dereference
  783.066 us  783.066 us       32295  __mov
  681.916 us  681.916 us       19865  memcmp
  674.368 us  674.368 us        6255  intersect
  665.289 us  665.289 us       11029  bb_build_rdf
  654.337 us  342.697 us        2924  insert_phi_insn
  644.936 us  644.936 us         382  optimize_constant_casts
  640.831 us   90.329 us        1141  write_symbol
  632.551 us  378.214 us        2030  hashmap_node_new
  623.519 us  623.519 us       24675  get_size
  597.902 us  392.207 us        6812  extend_liveness
  590.125 us  590.125 us       10824  add_live_gen
  577.334 us  414.707 us        6627  __add_r
  529.655 us  529.655 us        6718  reverse_intersect
  521.299 us  108.466 us        1702  add_symbol
  519.711 us  519.711 us       19853  track_var_use
  467.414 us   51.990 us           1  libc_generate
  437.475 us  305.855 us       12220  perform_side_effect
  415.424 us   66.548 us         797  __c
  407.504 us  407.504 us        9799  __strcpy_chk
  397.856 us  397.856 us       13957  bb_connect
  392.538 us  391.796 us       16230  find_macro_param_src_idx
  385.278 us   30.024 us          65  handle_sizeof_operator
  383.619 us  269.546 us        2176  add_block
  378.560 us   63.663 us        1149  elf_write_str
  360.112 us   51.437 us         266  read_char_param
  340.875 us  340.875 us        4785  var_check_in_scope
  297.317 us  217.847 us        3292  __teq
  296.336 us  296.336 us       11029  bb_reverse_reversed_index
  288.880 us   41.782 us         324  truncate_unchecked
  288.594 us   24.280 us         187  add_constant
  287.249 us  287.249 us       11029  bb_build_reversed_rpo
  280.822 us  280.822 us       11029  bb_build_rpo
  273.204 us  273.204 us       11029  bb_reverse_index
  269.551 us  269.551 us       11029  bb_index_rpo
  267.320 us  267.320 us       11029  bb_index_reversed_rpo
  252.247 us    0.534 us           4  read_indirect_call
  212.394 us   67.439 us        1634  __zero
  204.784 us  149.218 us        2299  __mov_r
  179.795 us   66.464 us        1176  elf_write_byte
  179.767 us  124.486 us        1603  is_numeric
  157.033 us    6.548 us          19  read_global_assignment
  147.261 us  107.651 us        1642  __cmp_r
  147.200 us    3.743 us          65  read_partial_var_decl
  144.507 us   72.132 us           7  hashmap_rehash
  139.681 us  131.196 us          15  arena_free_trailing_blocks
  133.914 us   15.955 us          73  add_alias
  113.659 us  113.659 us        4742  __movw
  113.490 us  113.490 us        4742  __movt
   90.613 us   90.613 us          36  fopen
   83.461 us   83.461 us          36  fclose
   78.986 us    0.649 us           2  compact_all_arenas
   72.343 us    5.564 us           1  global_init
   69.466 us   45.631 us         382  add_ph2_ir
   64.581 us    9.562 us         107  require_deref_var
   62.283 us    0.939 us           3  compact_arenas_selective
   60.633 us   44.675 us         663  __add_i
   58.958 us    5.987 us         102  require_ref_var
   53.591 us   53.591 us        2232  is_fusible_insn
   50.126 us   36.739 us         553  __sub_r
   46.672 us    6.829 us          58  compute_field_address
   45.945 us    1.355 us           8  strbuf_free
   45.668 us    1.967 us           7  skip_macro_body
   44.577 us   44.577 us        1536  get_operator_prio
   44.347 us   44.347 us        1729  is_pointer_operation
   44.314 us   44.314 us        1626  arm_get_cond
   43.342 us    0.531 us           4  read_constant_expr
   42.811 us    1.518 us           5  read_constant_infix_expr
   42.740 us    6.563 us         132  lookup_directive
   41.929 us    9.960 us          10  elf_write_blk
   41.211 us    1.607 us           1  elf_generate_sections
   40.488 us    6.008 us          29  compute_element_address
   38.218 us   27.821 us         431  __cmp_i
   29.250 us    1.484 us           7  add_macro
   24.618 us    6.125 us          27  find_best_spill
   21.983 us    1.426 us           5  arena_init
   21.464 us   21.464 us          16  calloc
   19.585 us   14.248 us         221  __lb
   19.447 us    2.448 us           8  strbuf_create
   19.227 us   13.960 us         216  __sb
   18.493 us   12.709 us         184  calculate_spill_cost
   18.363 us   13.759 us         187  arena_alloc_constant
   16.818 us    0.463 us           7  arena_alloc_macro
   16.057 us    3.341 us           9  hashmap_create
   16.007 us    1.390 us           5  read_constant_expr_operand
   14.525 us    3.519 us          19  read_primary_constant
   14.462 us   14.462 us         582  is_hex
   13.564 us   13.564 us         572  align_size
   13.425 us    9.474 us         162  __rsb_i
    9.009 us    0.844 us           5  add_named_type
    8.904 us    0.109 us           1  elf_generate_header
    8.624 us    0.732 us           1  lex_init_keywords
    8.220 us    8.220 us         306  size_var
    6.994 us    6.994 us         275  promote
    6.978 us    0.846 us          10  arena_alloc_traversal_args
    6.900 us    5.124 us          73  arena_alloc_alias
    6.291 us    0.868 us           7  promote_unchecked
    6.281 us    1.634 us          23  hashmap_contains
    5.558 us    0.475 us           1  lex_init_directives
    5.241 us    5.241 us         169  read_numeric_constant
    5.180 us    3.774 us          58  __and_i
    4.463 us    3.264 us          50  __and_r
    4.314 us    1.023 us           4  handle_pointer_arithmetic
    3.982 us    3.982 us          34  __snprintf_chk
    3.715 us    3.085 us          26  __eor_r
    3.309 us    1.399 us           9  hashmap_free
    3.162 us    3.162 us          34  snprintf
    3.134 us    2.302 us          35  __or_r
    3.064 us    0.426 us           1  elf_add_symbol
    2.694 us    2.694 us          10  initialize_struct_field
    2.341 us    2.341 us          98  __mul
    1.833 us    1.833 us          75  add_type
    1.393 us    0.500 us           1  elf_align
    1.133 us    0.192 us           3  check_def
    1.123 us    0.189 us           1  lexer_cleanup
    0.853 us    0.853 us          18  bb_disconnect
    0.487 us    0.487 us           9  round_up_pow2
    0.287 us    0.215 us           3  __mvn_r
    0.178 us    0.178 us           7  __sxtb
    0.129 us    0.129 us           4  get_pointer_element_size
    0.127 us    0.127 us           5  get_unary_operator_prio
                                   1  exit
After

/usr/bin/time -v:


        Command being timed: "./out/shecc src/main.c"
        User time (seconds): 0.09
        System time (seconds): 0.08
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.17
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 305120
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 76130
        Voluntary context switches: 1
        Involuntary context switches: 3
        Swaps: 0
        File system inputs: 0
        File system outputs: 728
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

uftrace:

  Total time   Self time       Calls  Function
  ==========  ==========  ==========  ====================
  933.206 ms    6.234 us           1  main
  406.078 ms    0.923 us           1  parse
  397.239 ms   65.617 us           1  parse_internal
  393.823 ms  255.315 us         547  read_global_statement
  390.242 ms  161.774 us         483  read_global_decl
  378.968 ms  128.581 us         385  read_func_body
  376.991 ms    1.122 ms        1584  read_code_block
  372.174 ms    7.245 ms        9752  read_body_statement
  239.170 ms    2.996 ms       13403  read_expr
  234.155 ms  886.909 us        1414  handle_if_statement
  194.985 ms    1.249 ms           1  liveness_analysis
  177.800 ms    8.342 ms       34792  recompute_live_out
  165.787 ms    9.563 ms       16192  read_expr_operand
  158.947 ms   54.738 ms       44600  compute_live_in
  124.552 ms  352.212 us        3788  read_func_call
  122.223 ms    1.979 ms        3792  read_func_parameters
  106.184 ms   21.381 ms      181740  arena_calloc
   89.900 ms   65.418 ms      883178  var_list_add_var
   87.410 ms    0.889 us           1  ssa_build
   84.789 ms    1.941 ms        2618  read_body_assignment
   78.018 ms   78.018 ms      181740  memset
   75.997 ms   37.267 ms      596233  lex_accept
   69.820 ms    3.192 ms       37408  bb_create
   68.491 ms  733.137 us       10854  find_var
   66.451 ms    5.180 ms           1  optimize
   65.208 ms   17.270 ms       99257  bb_forward_traversal
   65.179 ms    2.174 ms           1  code_generate
   62.986 ms    7.579 ms       66026  emit_ph2_ir
   58.329 ms   25.856 ms       80702  lex_token_impl
   53.880 ms   32.050 ms       13472  find_local_var
   50.943 ms   54.921 us         100  handle_while_statement
   47.991 ms    4.559 ms       85517  emit
   43.918 ms   43.918 ms     1427062  strcmp
   43.434 ms   13.161 ms       85521  elf_write_int
   43.350 ms    8.146 ms       10576  read_lvalue
   40.726 ms    9.934 ms           1  reg_alloc
   39.257 ms   15.576 ms      618082  lex_accept_internal
   34.651 ms    9.206 ms       15783  get_operator
   32.765 ms    2.070 ms       38348  lex_expect
   31.929 ms  171.890 us           1  solve_phi_params
   31.357 ms    7.296 ms       11029  bb_solve_phi_params
   30.698 ms    2.071 ms       38351  lex_expect_internal
   30.419 ms   22.085 ms      343664  strbuf_putc
   28.342 ms    3.796 ms       60811  require_var
   26.261 ms   24.044 ms      876089  var_list_ensure_capacity
   26.077 ms   15.519 ms           1  elf_generate
   25.735 ms   16.093 ms        8564  find_type
   25.118 ms   15.454 ms           1  peephole
   24.433 ms   24.433 ms      862067  var_check_killed
   23.787 ms    6.520 ms      114432  hashmap_get
   22.404 ms   13.241 ms        4012  find_global_var
   21.411 ms   17.296 ms       55145  bb_backward_traversal
   20.505 ms   36.179 us           1  solve_globals
   18.538 ms    3.514 ms       32002  new_name
   18.504 ms    5.357 ms       11029  bb_solve_globals
   17.271 ms   11.727 ms      114455  hashmap_get_node
   13.912 ms  185.261 us         663  handle_return_statement
   13.549 ms    5.545 ms       11025  bb_reset_and_solve_locals
   13.343 ms    2.850 ms           1  use_chain_build
   12.993 ms  497.084 us        2402  read_full_var_decl
   11.349 ms    9.188 ms      360231  arena_alloc
   10.634 ms    1.117 us           1  global_release
   10.591 ms   76.467 us           5  arena_free
   10.541 ms    6.718 ms       49827  const_folding
   10.526 ms  166.961 us        1168  arena_block_free
   10.493 ms    3.682 ms       40330  use_chain_add_tail
   10.414 ms   10.414 ms        2383  free
   10.373 ms   10.373 ms      369061  fputc
   10.289 ms    9.041 ms      275633  read_char
    9.935 ms    2.968 ms        7397  read_preproc_directive
    9.282 ms   60.935 us           1  build_reversed_rpo
    8.837 ms    2.801 ms          35  load_source_file
    8.835 ms    5.208 ms           1  cfg_flatten
    8.778 ms    8.678 ms      362024  strbuf_extend
    8.699 ms   32.630 us           1  check_var_cross_init
    8.546 ms    5.260 ms       46653  refresh
    8.316 ms    6.443 ms       75270  bb_add_ph2_ir
    7.804 ms    5.019 ms       46522  add_insn
    7.697 ms    2.615 ms       31148  var_add_killed_bb
    7.535 ms    6.925 ms      253107  lex_peek
    7.376 ms    1.641 ms       28323  lookup_keyword
    7.212 ms    1.517 ms       28319  find_alias
    7.192 ms   22.977 us           1  build_rdom
    7.135 ms  751.861 us       14107  lex_ident
    6.924 ms    1.272 ms       23468  find_func
    6.670 ms    1.742 ms       11029  bb_check_var_cross_init
    6.436 ms    3.870 ms       11029  dce_insn
    6.421 ms    1.532 ms       14185  lex_ident_internal
    6.180 ms    6.180 ms       44600  merge_live_in
    6.142 ms  795.064 us        2481  read_inner_var_decl
    6.038 ms  218.226 us         411  read_parameter_list_decl
    5.773 ms   74.456 us           1  build_rpo
    5.710 ms    2.178 ms       34822  prepare_operand
    5.559 ms    5.559 ms      223654  is_alnum
    5.109 ms   37.491 us           1  build_df
    5.047 ms   25.549 us           1  build_rdf
    4.984 ms   38.733 us           1  unwind_phi
    4.927 ms    4.927 ms       81713  is_dominate
    4.658 ms    2.823 ms       18360  strbuf_puts
    4.561 ms   47.967 us           1  build_dom
    4.558 ms    3.485 ms       48284  cse
    4.330 ms    1.718 ms       16387  var_list_assign_array
    4.210 ms  637.732 us        3180  read_numeric_param
    3.991 ms    3.747 ms           1  build_idom
    3.773 ms    1.902 ms       26778  gen_name_to
    3.696 ms    3.696 ms      110015  check_live_out
    3.693 ms  620.649 us       10734  read_ternary_operation
    3.593 ms    2.523 ms       36870  rename_var
    3.529 ms    3.529 ms      118375  hashmap_hash_index
    3.466 ms  614.899 us        7235  intern_string
    3.314 ms  839.985 us       15476  find_macro
    3.226 ms    3.226 ms       11029  bb_build_df
    3.064 ms    1.020 ms       17906  load_var
    3.061 ms    2.336 ms       26851  prepare_dest
    2.960 ms  325.625 us         433  finalize_logical
    2.914 ms  706.288 us        5176  spill_alive
    2.716 ms    1.373 ms       11029  bb_build_rdom
    2.673 ms  586.800 us       11012  find_constant
    2.641 ms    1.518 ms       11029  bb_unwind_phi
    2.566 ms    2.566 ms       49827  dce_init_mark
    2.561 ms    1.358 ms       10644  spill_live_out
    2.549 ms    1.636 ms       11029  bb_build_dom
    2.509 ms  424.031 us        1141  read_literal_param
    2.349 ms    1.113 ms       10503  spill_var
    2.329 ms    2.329 ms       86731  skip_whitespace
    2.310 ms    2.310 ms       31867  memcpy
    2.174 ms  245.639 us        4269  require_typed_var
    2.172 ms    2.172 ms       37182  update_consumed
    2.170 ms  186.675 us        1168  arena_block_create
    2.050 ms  864.703 us           1  dce_sweep
    2.004 ms  691.027 us       10831  add_live_gen
    2.000 ms    2.000 ms        2367  malloc
    1.927 ms  269.002 us           3  cppd_control_flow_skip_lines
    1.921 ms    1.176 ms        4337  find_member
    1.870 ms    1.870 ms       26778  __sprintf_chk
    1.853 ms  911.900 us           1  solve_phi_insertion
    1.820 ms    1.820 ms       65641  update_elf_offset
    1.781 ms   39.259 us         154  handle_address_of_operator
    1.741 ms  132.942 us        2258  lex_token
    1.705 ms    1.705 ms       65953  add_existed_ph2_ir
    1.649 ms    1.198 ms       18408  __lw
    1.568 ms   80.486 us         387  add_func
    1.474 ms    1.075 ms       16470  __mov_i
    1.392 ms    1.339 ms       48568  insn_fusion
    1.382 ms    1.382 ms       39419  strcpy
    1.368 ms    1.368 ms       49827  mark_const
    1.342 ms    1.342 ms       19813  rdom_connect
    1.342 ms  111.714 us         621  read_logical
    1.259 ms    1.028 ms           1  build_r_idom
    1.257 ms    1.257 ms       48568  triple_pattern_optimization
    1.248 ms  709.022 us        8732  resize_var
    1.244 ms    1.244 ms           1  arm_lower
    1.240 ms    1.240 ms       47064  redundant_move_elim
    1.239 ms  902.419 us       13898  __sw
    1.233 ms    1.233 ms       48432  eval_const_arithmetic
    1.229 ms    1.229 ms       47646  find_in_regs
    1.220 ms    1.220 ms       48288  eval_const_unary
    1.219 ms    1.219 ms       47063  eliminate_load_store_pairs
    1.190 ms    1.190 ms       39324  get_stack_top_subscript_var
    1.184 ms    1.184 ms       11029  is_block_unreachable
    1.162 ms    1.162 ms       47064  strength_reduction
    1.133 ms    1.133 ms       47064  comparison_optimization
    1.129 ms    1.129 ms       47064  bitwise_optimization
    1.128 ms    1.128 ms       47064  algebraic_simplification
    1.123 ms  416.714 us        3446  append_unwound_phi_insn
    1.079 ms  335.748 us        3446  append_phi_operand
    1.073 ms  214.465 us        2036  hashmap_put
    1.073 ms    1.073 ms       45266  is_cse_candidate
    1.071 ms    1.071 ms         385  simple_sccp
    1.044 ms    7.634 us           2  parse_array_init
    1.012 ms    1.012 ms       33318  strncmp
    1.011 ms   25.245 us         387  arena_alloc_func
  956.948 us   22.065 us          29  parse_struct_field_init
  956.332 us  227.408 us        4039  arena_alloc_symbol
  953.326 us  410.865 us        9328  fn_add_global
  915.299 us  915.299 us       38638  vreg_clear_phys
  912.673 us  912.673 us       20877  dom_connect
  893.664 us  893.664 us       37325  opstack_pop
  892.914 us  892.914 us       37325  opstack_push
  854.375 us  854.375 us       16483  fgets
  852.593 us   38.179 us         107  handle_single_dereference
  849.558 us  849.558 us       32002  pop_name
  818.867 us  818.867 us       18745  strncpy
  799.153 us  799.153 us       32743  arm_transfer
  796.523 us  796.523 us       24541  strlen
  782.853 us  782.853 us       32331  __mov
  669.467 us  669.467 us       19977  memcmp
  656.168 us  102.807 us        1141  write_symbol
  644.729 us  443.821 us        6830  extend_liveness
  638.556 us  393.586 us        2036  hashmap_node_new
  616.290 us  616.290 us       24715  get_size
  603.963 us  603.963 us       11029  bb_build_rdf
  597.818 us  280.342 us        2923  insert_phi_insn
  592.054 us  431.326 us        6638  __add_r
  522.571 us  108.700 us        1710  add_symbol
  515.103 us  515.103 us       19863  track_var_use
  486.960 us  486.960 us         385  optimize_constant_casts
  442.729 us  304.098 us       12230  perform_side_effect
  411.310 us   29.956 us          65  handle_sizeof_operator
  394.118 us  393.402 us       16243  find_macro_param_src_idx
  384.378 us   67.862 us        1149  elf_write_str
  374.060 us  374.060 us        9810  __strcpy_chk
  369.940 us  369.940 us       13953  bb_connect
  367.857 us  257.657 us        2179  add_block
  367.036 us   53.219 us         266  read_char_param
  344.176 us  344.176 us        4786  var_check_in_scope
  312.050 us   30.410 us         187  add_constant
  306.616 us   34.305 us           1  libc_generate
  296.130 us  216.348 us        3291  __teq
  291.840 us   41.925 us         322  truncate_unchecked
  272.800 us  272.800 us       11029  bb_build_rpo
  272.311 us   46.235 us         797  __c
  269.094 us  269.094 us       11029  bb_index_reversed_rpo
  264.755 us  264.755 us       11029  bb_build_reversed_rpo
  262.270 us  262.270 us       11029  bb_reverse_index
  261.201 us  261.201 us       11029  bb_reverse_reversed_index
  261.066 us  261.066 us       11029  bb_index_rpo
  244.482 us  244.482 us        6254  intersect
  233.566 us   85.869 us        1632  __zero
  230.806 us  230.806 us        6714  reverse_intersect
  207.552 us  151.762 us        2305  __mov_r
  177.552 us   65.218 us        1176  elf_write_byte
  173.247 us  122.820 us        1596  is_numeric
  169.106 us    6.689 us          19  read_global_assignment
  155.291 us    3.764 us          65  read_partial_var_decl
  155.021 us   84.995 us           7  hashmap_rehash
  152.879 us    0.551 us           4  read_indirect_call
  147.088 us  107.465 us        1640  __cmp_r
  142.168 us   17.634 us          73  add_alias
  114.473 us  114.473 us        4733  __movt
  113.560 us  101.577 us          15  arena_free_trailing_blocks
  113.482 us  113.482 us        4733  __movw
   94.400 us   94.400 us          36  fopen
   76.893 us   76.893 us          36  fclose
   74.554 us   52.729 us         385  add_ph2_ir
   65.588 us    0.722 us           2  compact_all_arenas
   65.170 us    6.194 us         107  require_deref_var
   59.982 us   43.946 us         666  __add_i
   59.681 us    5.749 us         102  require_ref_var
   53.142 us   53.142 us        2239  is_fusible_insn
   49.468 us    0.774 us           3  compact_arenas_selective
   49.290 us    7.130 us          58  compute_field_address
   48.223 us   34.749 us         556  __sub_r
   47.898 us   13.207 us          10  elf_write_blk
   47.745 us    1.996 us           7  skip_macro_body
   46.865 us    1.830 us           1  elf_generate_sections
   45.571 us    7.991 us         132  lookup_directive
   44.161 us   44.161 us        1728  is_pointer_operation
   44.086 us    0.320 us           4  read_constant_expr
   43.766 us    1.546 us           5  read_constant_infix_expr
   42.797 us    6.078 us          29  compute_element_address
   42.657 us   42.657 us        1624  arm_get_cond
   42.376 us   42.376 us        1536  get_operator_prio
   38.535 us   28.125 us         431  __cmp_i
   37.758 us    1.156 us           8  strbuf_free
   34.183 us    2.350 us           1  global_init
   30.476 us    1.533 us           7  add_macro
   26.570 us    7.155 us          27  find_best_spill
   19.756 us   14.385 us         221  __lb
   19.415 us   13.569 us         184  calculate_spill_cost
   19.157 us   13.902 us         216  __sb
   19.139 us   14.653 us         187  arena_alloc_constant
   17.632 us    0.494 us           7  arena_alloc_macro
   15.936 us    1.386 us           5  read_constant_expr_operand
   15.044 us    3.783 us          19  read_primary_constant
   14.419 us   14.419 us         582  is_hex
   13.914 us   10.023 us         162  __rsb_i
   13.652 us   13.652 us         575  align_size
   13.214 us   13.214 us          16  calloc
   10.380 us    0.608 us           5  arena_init
   10.276 us    0.788 us           5  add_named_type
   10.153 us    0.119 us           1  elf_generate_header
    9.537 us    1.056 us           8  strbuf_create
    8.675 us    0.725 us           1  lex_init_keywords
    8.068 us    8.068 us         302  size_var
    7.659 us    1.698 us           9  hashmap_create
    7.032 us    5.275 us          73  arena_alloc_alias
    6.892 us    6.892 us         275  promote
    6.853 us    0.883 us          10  arena_alloc_traversal_args
    6.152 us    0.879 us           7  promote_unchecked
    5.755 us    0.500 us           1  lex_init_directives
    5.558 us    1.417 us          23  hashmap_contains
    5.203 us    3.794 us          58  __and_i
    4.828 us    4.828 us         165  read_numeric_constant
    4.478 us    3.259 us          50  __and_r
    4.335 us    1.134 us           4  handle_pointer_arithmetic
    4.060 us    1.846 us           9  hashmap_free
    3.862 us    3.245 us          26  __eor_r
    3.489 us    3.489 us          34  __snprintf_chk
    3.269 us    3.269 us          10  initialize_struct_field
    3.182 us    2.343 us          35  __or_r
    3.008 us    0.362 us           1  elf_add_symbol
    2.724 us    2.724 us          34  snprintf
    2.272 us    2.272 us          95  __mul
    2.007 us    0.113 us           1  lexer_cleanup
    1.784 us    1.784 us          75  add_type
    1.422 us    1.422 us          18  bb_disconnect
    1.409 us    0.532 us           1  elf_align
    1.145 us    0.197 us           3  check_def
    0.281 us    0.211 us           3  __mvn_r
    0.243 us    0.243 us           9  round_up_pow2
    0.173 us    0.173 us           7  __sxtb
    0.130 us    0.130 us           4  get_pointer_element_size
    0.117 us    0.117 us           5  get_unary_operator_prio
                                   1  exit

This patch reduces memory usage by 75.4% and improves execution time by 66%.
After this patch, var_list_add_var, var_list_ensure_capacity become visible hotspots. However, arena_calloc and memset elapsed times drop substantially, which outweigh the amortized hotspots cost.


Summary by cubic

Replaced fixed-size live set arrays with arena-backed dynamic lists to cut memory usage and speed up compilation. On src/main.c, RSS drops ~75% (1.2GB → ~305MB) and runtime improves ~66%.

  • Refactors
    • Switched basic_block live_gen/live_kill/live_in/live_out to var_list_t.
    • Added helpers in ssa.c: var_list_ensure_capacity, var_list_add_var, var_list_assign_array.
    • Updated liveness logic to use list.size and elements; removed *_idx counters.
    • Adjusted compute_live_in, merge_live_in, recompute_live_out to work with var_list_t.
    • Updated reg-alloc.c to iterate over live_out via var_list_t.

Written for commit cb5bd04. Summary will update automatically on new commits.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Copy link
Collaborator

@ChAoSUnItY ChAoSUnItY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asides from the definitions migration suggestion, all changes looking good.

Comment on lines 24 to 60
void var_list_ensure_capacity(var_list_t *list, int min_capacity)
{
if (list->capacity >= min_capacity) {
return;
}

int new_capacity = list->capacity ? list->capacity : 8;

while (new_capacity < min_capacity) {
new_capacity <<= 1;
}

var_t **new_elements = arena_alloc(BB_ARENA, new_capacity * HOST_PTR_SIZE);

if (list->elements)
memcpy(new_elements, list->elements, list->size * HOST_PTR_SIZE);

list->elements = new_elements;
list->capacity = new_capacity;
}

void var_list_add_var(var_list_t *list, var_t *var)
{
for (int i = 0; i < list->size; i++) {
if (list->elements[i] == var)
return;
}

var_list_ensure_capacity(list, list->size + 1);
list->elements[list->size++] = var;
}

void var_list_assign_array(var_list_t *list, var_t **data, int count)
{
var_list_ensure_capacity(list, count);
memcpy(list->elements, data, count * HOST_PTR_SIZE);
list->size = count;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move these definitions to globals.c as var_list_t may benefits other data structures for later enhancement?

if (list->elements)
memcpy(new_elements, list->elements, list->size * HOST_PTR_SIZE);

list->elements = new_elements;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super familiar with the arena_*() API, but don't we need to call arena_free() on the old pointer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arena_free() is used to free the whole arena, there's currently no freeing function specifically for allocated objects other than the internal arena block and arena itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the clarification.
So IIUC, this will actually cause a memory leak that the current API cannot resolve, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well not really, this is properly handled in global_release() already:

arena_free(BB_ARENA);

So no memory leak is happening here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but global_release() should only be called when the entire compiler is about to exit?

Therefore, when we use arena_alloc() to grow the old list->elements, I thought this was still a leakage until the entire compiler finishes?

Even if we don't call global_release(), the operating system should reclaim all used memory when the executable terminates.

But perhaps this isn't actually a serious problem because even if we don't immediately reclaim this memory that's no longer in use, the memory usage is still much less than the original static array.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOW, the arena API thinks this memory is occupied and won't reuse it, but in fact, no pointers are actually referencing it anymore?

var_t **new_elements = arena_alloc(BB_ARENA, new_capacity * HOST_PTR_SIZE);

if (list->elements)
memcpy(new_elements, list->elements, list->size * HOST_PTR_SIZE);
Copy link
Collaborator

@visitorckw visitorckw Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new code might introduce several additional memory copy operations, I imagine it will be slower. I'm not sure what the performance impact will be, but I guess the 75% memory saving is very likely worth that cost.

However, this is a trade-off. Reading the commit message alone gives the impression that this is a pure improvement without any drawbacks.

I'm wondering if this performance difference has been measured?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured the performance difference and added the results to the
commit message and PR description.

Live sets in shecc are implemented using arrays of pointers, which
preallocate hundreds of unused slots, leading to significant
memory waste.

This patch replaces arrays of pointers with arena-backed vectors,
reducing memory usage substantially.

Although this change introduces additional memcpy during
dynamic resize in "var_list_ensure_capacity", which may add
overhead. The working set is much smaller and the better cache
locality with fewer minor page faults outweigh the amortized
memcpy cost.

Measured (compiling src/main.c):
- RSS: ~1.24GB -> ~305MB (≈ 75.4% reduction)
- Elapsed time: 0.50s -> 0.17s (≈ 66% faster)
- Minor page faults: ~309k -> ~76k

Changes include:
- Replace live sets with "var_list_t" for dynamic resizing.
- Added helper routine in "ssa.c" for managing "var_list_t"
instances.
- Updated related logic in "reg-alloc.c" and "ssa.c".
@jserv jserv merged commit 60dccd5 into sysprog21:master Nov 3, 2025
6 checks passed
@jserv
Copy link
Collaborator

jserv commented Nov 3, 2025

Thank @icgmilk for contributing!

@ChAoSUnItY
Copy link
Collaborator

Though this PR is merged, it is still unclear the cause of even-larger-allocation as seen in "before" benchmark, where RSS peaked at 1239840, worth to notice that in #184, the RSS in "before" benchmark the value only peaked at 758556. After the dynamic array implemented, the value has dropped to near to #184's optimized allocation result.

@bito-code-review
Copy link

The higher RSS peak (1239840) in the 'before' benchmark compared to PR #184 (758556) is likely due to the static array allocation approach used in this PR's baseline. The original code allocated fixed-size arrays (MAX_ANALYSIS_STACK_SIZE) for live_gen, live_kill, live_in, and live_out in every basic_block structure, causing significant memory overhead even when most blocks don't need the full capacity. The dynamic array implementation reduces this waste by allocating only what's needed, bringing memory usage closer to the optimized levels seen in #184.

src/defs.h

// Before: Fixed arrays with indices
var_t *live_gen[MAX_ANALYSIS_STACK_SIZE];
int live_gen_idx;
// After: Dynamic lists
var_list_t live_gen;

@jserv
Copy link
Collaborator

jserv commented Nov 3, 2025

Though this PR is merged, it is still unclear the cause of even-larger-allocation as seen in "before" benchmark, where RSS peaked at 1239840, worth to notice that in #184, the RSS in "before" benchmark the value only peaked at 758556. After the dynamic array implemented, the value has dropped to near to #184's optimized allocation result.

Yes, let's create an issue for tracking.

@bito-code-review
Copy link

Creating an issue to track the memory allocation anomaly is a good idea. The PR shows dynamic arrays reduced RSS from 1239840 to levels comparable to #184's optimized result, but the unexplained spike in the 'before' benchmark (1239840 vs #184's 758556) warrants investigation to understand what caused the even-larger-allocation behavior.

@visitorckw
Copy link
Collaborator

In rv32emu, we have CI that automatically tracks performance changes when related code is touched.

I'm wondering if it's worthwhile to open an issue to discuss implementing similar automated tracking for memory usage. This could help us catch these kinds of problems and reduce future regressions.

@bito-code-review
Copy link

Based on this PR's memory optimization work and the performance measurement discussion, implementing automated memory usage tracking would be valuable. The PR shows a 75% memory reduction but required manual measurement to assess performance trade-offs. Automated tracking could help catch memory regressions early and provide data for similar optimization decisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants