Skip to content

Conversation

zyw-bot
Copy link
Collaborator

@zyw-bot zyw-bot commented Jul 26, 2025

Link: llvm/llvm-project#150420
Requested by: @dtcxzyw

@github-actions github-actions bot mentioned this pull request Jul 26, 2025
@zyw-bot
Copy link
Collaborator Author

zyw-bot commented Jul 26, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@7ba6255
patch: llvm/llvm-project#150420
sha256: b3b98b45bf1684dbefa1fdb24e43a0aa71231c089e343fe9bf8d255c176e7463
commit: b3dc255

2264 files changed, 384115 insertions(+), 387353 deletions(-)

Improvements:
  indvars.NumSimplifiedSRem 16 -> 18 +12.50%
  correlated-value-propagation.NumPhiCommon 58373 -> 59959 +2.72%
  indvars.NumReplaced 72376 -> 72597 +0.31%
  gvn.NumGVNEqProp 455686 -> 456845 +0.25%
  simplifycfg.NumInvokesMerged 159173 -> 159396 +0.14%
  correlated-value-propagation.NumPhis 1343211 -> 1344508 +0.10%
  simplifycfg.NumInvokeSetsFormed 59475 -> 59530 +0.09%
  instsimplify.NumSimplified 2614047 -> 2616086 +0.08%
  gvn.NumGVNSimpl 4741368 -> 4743129 +0.04%
  bdce.NumSimplified 6040 -> 6042 +0.03%
Regressions:
  correlated-value-propagation.NumAShrsRemoved 196 -> 195 -0.51%
  bdce.NumSExt2ZExt 5241 -> 5234 -0.13%
  bdce.NumRemoved 397403 -> 397236 -0.04%
  correlated-value-propagation.NumUDivURemsNarrowed 13051 -> 13046 -0.04%
  loop-simplifycfg.NumTerminatorsFolded 10471 -> 10467 -0.04%
  div-rem-pairs.NumRecomposed 3217 -> 3216 -0.03%
  simple-loop-unswitch.NumTrivial 3470 -> 3469 -0.03%
  aggressive-instcombine.NumInstrsReduced 73500 -> 73482 -0.02%
  sccp.NumInstRemoved 2081635 -> 2081163 -0.02%
  aggressive-instcombine.NumExprsReduced 22574 -> 22569 -0.02%

2 4 bench/abc/optimized/cmdUtils.ll
34 30 bench/abc/optimized/nwkSpeedup.ll
2 3 bench/abseil-cpp/optimized/int128_test.ll
1 1 bench/actix-rs/optimized/3k33h0ss7dy62evb.ll
3 4 bench/assimp/optimized/3DSConverter.ll
39 49 bench/assimp/optimized/ComputeUVMappingProcess.ll
7 11 bench/boost/optimized/ipv6_address_rule.ll
16 20 bench/brotli/optimized/compress_fragment.ll
8 9 bench/c3c/optimized/parse_expr.ll
3 4 bench/clamav/optimized/list.ll
5 6 bench/clap-rs/optimized/5651dp9k16h53y8x.ll
6 12 bench/clap-rs/optimized/rh1bh36cvgkzipt.ll
10 12 bench/coreutils-rs/optimized/jiqj5u7teuhb0o0.ll
12 13 bench/cpython/optimized/initconfig.ll
4 10 bench/cpython/optimized/unicodedata.ll
17 18 bench/cpython/optimized/unicodeobject.ll
18 20 bench/darktable/optimized/fuji_compressed.ll
1 2 bench/darktable/optimized/gaussian.ll
4 9 bench/delta-rs/optimized/4say4x9grcidoih4.ll
2 4 bench/elfshaker-rs/optimized/34r3nkcreq4js9gcfofcmkjs8.ll
26 24 bench/ffmpeg/optimized/dvaudiodec.ll
18 19 bench/ffmpeg/optimized/edge_common.ll
12 13 bench/git/optimized/packfile.ll
14 15 bench/glslang/optimized/Initialize.ll
10 11 bench/gromacs/optimized/reduce.ll
2 2 bench/html5ever-rs/optimized/427f68nqtcfpg289.ll
4 4 bench/hyperscan/optimized/repeat.ll
2 4 bench/influxdb-rs/optimized/17ptp6pnu4b90vr6.ll
25 20 bench/lean4/optimized/mpn.ll
5 6 bench/lean4/optimized/static.ll
28 47 bench/libcxx/optimized/path.ll
28 31 bench/libigl/optimized/sum.ll
15 18 bench/libquic/optimized/p224-64.ll
14 17 bench/libquic/optimized/p256-64.ll
3 5 bench/linux/optimized/cdrom.ll
3 4 bench/linux/optimized/i2c-algo-bit.ll
12 6 bench/linux/optimized/sch_generic.ll
6 7 bench/luau/optimized/Parser.ll
4 5 bench/luau/optimized/isocline.ll
21 28 bench/lz4/optimized/lz4.ll
5 6 bench/mimalloc/optimized/arena.ll
16 18 bench/mitsuba3/optimized/codeholder.ll
15 13 bench/node/optimized/inet.ll
2 4 bench/ockam-rs/optimized/1nr6pb10qh86z9fy.ll
2 4 bench/opencv/optimized/trackerCSRTSegmentation.ll
16 14 bench/opencv/optimized/version.ll
12 13 bench/openexr/optimized/ImfCheckFile.ll
9 10 bench/openexr/optimized/ImfOutputFile.ll
6 7 bench/openjdk/optimized/coalesce.ll
74 51 bench/openjdk/optimized/logSelection.ll
25 30 bench/openmpi/optimized/ras_slurm_module.ll
7 9 bench/openssl/optimized/i_skey.ll
34 68 bench/openusd/optimized/dirtyBitsTranslator.ll
11 11 bench/pbrt-v4/optimized/integrators.ll
8 9 bench/php/optimized/pcre2_substitute.ll
3 4 bench/php/optimized/proc_open.ll
1 1 bench/pingora-rs/optimized/6wibsd5gc0z7di4fjkaikq290.ll
28 27 bench/postgres/optimized/inet_net_ntop.ll
9 8 bench/qdrant-rs/optimized/1qtu8dw3f0ctj9yc.ll
11 12 bench/quantlib/optimized/matrix.ll
43 45 bench/quantlib/optimized/pseudosqrt.ll
4 5 bench/raylib/optimized/raudio.ll
2 1 bench/regex-rs/optimized/12jtvy3iayrg5nam.ll
2 4 bench/regex-rs/optimized/1x04d8372kemp7hd.ll
12 16 bench/rocksdb/optimized/db_impl.ll
15 20 bench/rocksdb/optimized/range_del_aggregator.ll
12 15 bench/ruby/optimized/parse.ll
9 12 bench/ruby/optimized/ripper.ll
7 6 bench/ruff-rs/optimized/1t5d2y321zgutphrasyamrpjz.ll
4 8 bench/rust-analyzer-rs/optimized/15tfqr3l9t81r1af.ll
4 6 bench/rust-analyzer-rs/optimized/hknx1qr3lu9291s.ll
4 5 bench/rust-analyzer-rs/optimized/mucn4qgqdg2891h.ll
6 7 bench/rustfmt-rs/optimized/4arc02n7xt9gqo2v.ll
4 5 bench/spike/optimized/disasm.ll
5 6 bench/sqlite/optimized/sqlite3.ll
9 11 bench/stb/optimized/stb_vorbis.ll
22 24 bench/sundials/optimized/idaHeat2D_kry.ll
7 8 bench/tokenizers-rs/optimized/58hth72z9dib25am.ll
8 17 bench/uv-rs/optimized/2k54dkzlj25rhgifzsgtp51ql.ll
10 12 bench/uv-rs/optimized/5j7xzn845lcvgg50lzoz3eg8s.ll
6 9 bench/uv-rs/optimized/967eumvkkk7xz52paw1v0vcyj.ll
2 9 bench/uv-rs/optimized/b24v25twjd4kchixabmblnyee.ll
2 2 bench/wasmi-rs/optimized/595hyxxsbu415zatshe9om7o4.ll
5 6 bench/wasmtime-rs/optimized/enal6epyb0tyurl.ll
20 22 bench/wireshark/optimized/packet-ipp.ll
14 54 bench/wolfssl/optimized/internal.ll
1 3 bench/yosys/optimized/ffmerge.ll
5 8 bench/z3/optimized/sat_solver.ll
4 7 bench/z3/optimized/smt_consequences.ll

Copy link
Contributor

The provided diff includes numerous changes across various optimized LLVM IR files. After filtering out non-interesting changes (e.g., formatting, comments, reordering), the major changes can be summarized as follows:

  1. Removal of freeze operations and direct use of original values: In multiple files (e.g., cmdUtils.ll, nwkSpeedup.ll, assimp/optimized/3DSConverter.ll), freeze instructions are removed, and the original values are used directly in comparisons and other operations. This simplifies the IR and potentially improves performance by reducing unnecessary operations.

  2. Simplification of phi nodes and control flow: Several changes involve reordering or simplifying phi nodes in loops and control flow constructs. For example, in assimp/optimized/ComputeUVMappingProcess.ll, the order of incoming values in phi nodes is changed to reflect more natural control flow, improving readability and potentially enabling better optimization.

  3. Optimization of arithmetic and bit manipulation operations: In files like darktable/optimized/fuji_compressed.ll and gromacs/optimized/reduce.ll, arithmetic operations are optimized by replacing sequences of and, lshr, and add with more efficient equivalents (e.g., using sub instead of lshr and add). This reduces the number of instructions and can improve performance.

  4. Reduction of redundant comparisons and branches: In boost/optimized/ipv6_address_rule.ll and node/optimized/inet.ll, redundant comparisons and branches are eliminated by combining conditions or using more efficient branching logic. This reduces code complexity and can improve execution speed.

  5. Improvement of loop structures and induction variables: In ffmpeg/optimized/dvaudiodec.ll and opencv/optimized/version.ll, loop structures are optimized by simplifying induction variable calculations and reducing the number of loop iterations. This can lead to better performance, especially in tight loops.

These changes collectively aim to simplify the IR, reduce the number of instructions, and improve the efficiency of the generated code, which can lead to better performance in the final executable.

model: qwen-plus-latest
CompletionUsage(completion_tokens=443, prompt_tokens=112825, total_tokens=113268, completion_tokens_details=None, prompt_tokens_details=None)

%67 = add nsw i64 %66, -1
%umax = call i64 @llvm.umax.i64(i64 %66, i64 1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reproduce this with opt -passes=instcombine version.ll -S -o 1.ll and I can see the freezes get pushed through the icmps and phis into the .lr.ph block, but I can't see this call to umax. How can I reproduce?

Full output is here: https://gist.githubusercontent.com/c-rhodes/cbc106cafdbb56ae1d1211d309329d28/raw/b0d3a707f814be3a25735acc64e4f58c0368e536/1.ll

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just a thought, have you considered tracking number of freeze instructions using the stats from the instcount pass?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reproduce this with opt -passes=instcombine version.ll -S -o 1.ll and I can see the freezes get pushed through the icmps and phis into the .lr.ph block, but I can't see this call to umax. How can I reproduce?

Full output is here: https://gist.githubusercontent.com/c-rhodes/cbc106cafdbb56ae1d1211d309329d28/raw/b0d3a707f814be3a25735acc64e4f58c0368e536/1.ll

This is an end-to-end test. You can reproduce it with the options here:

cmd = [
OPT_EXEC, "-O3", "-disable-loop-unrolling",
"-vectorize-loops=false", "-vectorize-slp=false", input_file, "-S"
]

I will update the README later.

Also, just a thought, have you considered tracking number of freeze instructions using the stats from the instcount pass?

We can add a new stat counter for the number of times pushFreezeToPreventPoisonFromPropagating is triggered.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see this call to umax with those flags

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Apply your patch and rebuild opt
wget https://raw.githubusercontent.com/dtcxzyw/llvm-opt-benchmark/refs/heads/main/bench/opencv/original/version.ll
bin/opt -O3 version.ll -S -o out.ll

Then you can find the call to umax.
image

@dtcxzyw dtcxzyw closed this Aug 2, 2025
@dtcxzyw dtcxzyw deleted the test-run16539942245 branch August 2, 2025 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants