Skip to content

Conversation

@ChAoSUnItY
Copy link
Collaborator

@ChAoSUnItY ChAoSUnItY commented Mar 25, 2025

This patch eliminates arithmetic instructions (add, sub, and mul, in particular) when the criterion is matched. Including:

  • Identity property for addition.
  • Zero and identity property for multiplication, these are effective when doing pointer offset.
  • Subtraction will be replaced based on operand, e.g., "s - 0" will be replaced with "s"; "0 - s" will be replaced with "-s".

Notice that insn_fusion's return value has changed from void to bool to preserve ability to indicate whether instruction has been optimized or not.

Analysis of performance statistics

Before: hyperfine --warmup 3 './out/shecc ./src/main.c' './out/shecc-stage1.elf ./src/main.c'

Benchmark 1: ./out/shecc ./src/main.c
  Time (mean ± σ):     966.5 ms ± 185.1 ms    [User: 277.6 ms, System: 616.3 ms]
  Range (min … max):   755.6 ms … 1237.0 ms    10 runs

Benchmark 2: ./out/shecc-stage1.elf ./src/main.c
  Time (mean ± σ):      8.060 s ±  0.452 s    [User: 4.461 s, System: 3.036 s]
  Range (min … max):    7.495 s …  8.931 s    10 runs

After: hyperfine --warmup 3 './out/shecc ./src/main.c' './out/shecc-stage1.elf ./src/main.c'

Benchmark 1: ./out/shecc ./src/main.c
  Time (mean ± σ):      1.126 s ±  0.428 s    [User: 0.324 s, System: 0.713 s]
  Range (min … max):    0.801 s …  2.283 s    10 runs

Benchmark 2: ./out/shecc-stage1.elf ./src/main.c
  Time (mean ± σ):      8.166 s ±  0.263 s    [User: 4.356 s, System: 3.181 s]
  Range (min … max):    7.936 s …  8.760 s    10 runs

I assume the minor performance regression is caused by introduction of more condition checking on every instructions.

@jserv jserv requested a review from vacantron March 25, 2025 12:12
@jserv jserv requested a review from DrXiao March 25, 2025 13:56
* {ALU rn, rs1, rs2; mv rd, rn;}
* reduces to:
* {ALU rd, rs1, rs2;}
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the meaning of the proposed changes, but I want to ask a question:

After applying these changes, shecc may find two proper instructions and merge them into the first instruction (ph2_ir) if possible. However, the second instruction (next) remains in a certain ph2_ir_list after merging, but it seems unnecessary.

Therefore, I think the second instruction could be deleted from the linked list after fusion. Do you consider my opinion correct? If you agree, please add the deletion process to the proposed changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct in this case, but I am also working on regional allocation for ph2_ir_t in the mean time, implemented in arena allocation to be specific. I am aiming to free all ph2_ir in that work. Therefore, I prefer to keep it as is.

Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of referring the instructions as "math," classify them in the field of "arithmetic instructions."

@jserv
Copy link
Collaborator

jserv commented Mar 25, 2025

Provide the statistics of the proposed changes.

@ChAoSUnItY ChAoSUnItY changed the title Eliminate math instructions based on properties Eliminate arithmetic instructions based on properties Mar 26, 2025
This patch eliminates arithmetic instructions (add, sub, and mul to be
specific) when the criterion is matched. Including:
- Identity property for addition.
- Zero and identity property for multiplication, these are
  effective when doing pointer offset.
- Subtraction will be replaced based on operand, e.g., "s - 0" will be
  replaced with "s"; "0 - s" will be replaced with "-s".
@jserv jserv requested a review from DrXiao March 26, 2025 07:14
@jserv
Copy link
Collaborator

jserv commented Mar 26, 2025

The results generated by hyperfine --warmup 3 'out/shecc src/main.c' looks weird. Can you utilize tools such as uftrace to figure out the slowdown?

@ChAoSUnItY
Copy link
Collaborator Author

The result given by uftrace is contrary to the result generated by hyperfine, the optimized one is faster then the original one.
image

Copy link
Collaborator

@vacantron vacantron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Did you test stage-1 on armv7 machine?

@ChAoSUnItY
Copy link
Collaborator Author

No, and I don't have any armv7 / riscv machine to run tests. I am currently using qemu to bootstrap and benchmark locally, which may not be accurate.

@jserv jserv merged commit 6196ab1 into sysprog21:master Mar 26, 2025
6 checks passed
@jserv
Copy link
Collaborator

jserv commented Mar 26, 2025

Thank @ChAoSUnItY for contributing!

@ChAoSUnItY ChAoSUnItY deleted the feat/optimization branch April 28, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants