Skip to content

Conversation

@zyw-bot
Copy link
Collaborator

@zyw-bot zyw-bot commented Jun 22, 2025

Link: llvm/llvm-project#145204
Requested by: @dtcxzyw

@github-actions github-actions bot mentioned this pull request Jun 22, 2025
@zyw-bot
Copy link
Collaborator Author

zyw-bot commented Jun 22, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@b7d0c9b
patch: llvm/llvm-project#145204
sha256: 07dd62be23fbe507d935c3adc47c4fe4e1c6a7f0e5172de5f5fe69501b6448ca
commit: c1d2a7e

2143 files changed, 349066 insertions(+), 361831 deletions(-)

Improvements:
  gvn.NumGVNSimpl 4742451 -> 4746837 +0.09%
  globalopt.NumDeleted 1037046 -> 1037416 +0.04%
  memdep.NumCacheCompleteNonLocalPtr 5683960 -> 5685298 +0.02%
  build-libcalls.NumInaccessibleMemOnly 5430 -> 5431 +0.02%
  memdep.NumCacheNonLocal 21838 -> 21842 +0.02%
  loop-instsimplify.NumSimplified 196883 -> 196913 +0.02%
  memdep.NumCacheNonLocalPtr 284110917 -> 284149012 +0.01%
  build-libcalls.NumNoAlias 8828 -> 8829 +0.01%
  loop-simplifycfg.NumTerminatorsFolded 10669 -> 10670 +0.01%
  memcpyopt.NumCpyToSet 11313 -> 11314 +0.01%
Regressions:
  correlated-value-propagation.NumMinMax 16543 -> 12488 -24.51%
  correlated-value-propagation.NumSaturating 2900 -> 2884 -0.55%
  licm.NumMovedCalls 35499 -> 35333 -0.47%
  correlated-value-propagation.NumSubNUW 39292 -> 39254 -0.10%
  globalsmodref-aa.NumNoMemFunctions 812847 -> 812402 -0.05%
  indvars.NumElimIdentity 1889 -> 1888 -0.05%
  correlated-value-propagation.NumSubNW 122449 -> 122387 -0.05%
  simplifycfg.NumHoistCommonCode 859217 -> 858877 -0.04%
  globalsmodref-aa.NumReadMemFunctions 1242553 -> 1242108 -0.04%
  correlated-value-propagation.NumSubNSW 83634 -> 83610 -0.03%

1 6 bench/abc/optimized/abcSop.ll
3 5 bench/abc/optimized/absRpm.ll
8 11 bench/abc/optimized/ac_wrapper.ll
1 4 bench/abseil-cpp/optimized/usage.ll
1 5 bench/actix-rs/optimized/comsm606o4zjj7a.ll
37 38 bench/arrow/optimized/arena.ll
1 2 bench/arrow/optimized/exec.ll
9 14 bench/arrow/optimized/key_value_metadata.ll
3 7 bench/boost/optimized/async.ll
2 7 bench/cpython/optimized/hamt.ll
2 6 bench/curl/optimized/mime.ll
4 12 bench/curl/optimized/select.ll
12 18 bench/delta-rs/optimized/43y2svfstmvqcl15.ll
4 5 bench/draco/optimized/corner_table.ll
6 9 bench/duckdb/optimized/ub_duckdb_core_functions_generic.ll
3 9 bench/eastl/optimized/EACallback.ll
6 14 bench/eastl/optimized/TestRandom.ll
24 62 bench/flac/optimized/decode.ll
19 23 bench/freetype/optimized/pshinter.ll
1 3 bench/freetype/optimized/truetype.ll
3 8 bench/freetype/optimized/type1cid.ll
10 20 bench/glslang/optimized/linkValidate.ll
4 6 bench/graphviz/optimized/emit.ll
43 48 bench/graphviz/optimized/triang.ll
56 59 bench/harfbuzz/optimized/hb-ot-cff2-table.ll
26 34 bench/harfbuzz/optimized/hb-static.ll
55 58 bench/harfbuzz/optimized/hb-subset-cff2.ll
16 20 bench/hdf5/optimized/H5Spoint.ll
5 6 bench/hdf5/optimized/h5ls.ll
3 7 bench/hdf5/optimized/h5tools.ll
4 14 bench/hermes/optimized/StringKind.ll
4 7 bench/hermes/optimized/hbc-attribute.ll
2 3 bench/icu/optimized/icuexportdata.ll
8 14 bench/jq/optimized/regexec.ll
12 13 bench/jsonnet/optimized/libjsonnet.ll
5 10 bench/libcxx/optimized/valarray.ll
25 41 bench/libevent/optimized/evdns.ll
4 10 bench/libquic/optimized/quic_connection.ll
2 2 bench/libsodium/optimized/aead_aes256gcm_aesni.ll
3 8 bench/lief/optimized/DataDirectory.ll
4 9 bench/lief/optimized/DynamicEntry.ll
5 13 bench/lief/optimized/rsa.ll
12 15 bench/lightgbm/optimized/boosting.ll
2 4 bench/lightgbm/optimized/dcg_calculator.ll
13 17 bench/lightgbm/optimized/gbdt.ll
19 39 bench/lightgbm/optimized/linear_tree_learner.ll
6 11 bench/lightgbm/optimized/metadata.ll
12 20 bench/lightgbm/optimized/objective_function.ll
12 24 bench/lightgbm/optimized/serial_tree_learner.ll
5 5 bench/linux/optimized/hsu.ll
29 33 bench/linux/optimized/virtio_input.ll
5 7 bench/lvgl/optimized/lv_textarea.ll
5 9 bench/meilisearch-rs/optimized/1wnbkg3u8l6dyln4.ll
10 18 bench/meilisearch-rs/optimized/4rtt9oltj0ubuf08.ll
5 10 bench/meshoptimizer/optimized/clusterizer.ll
4 14 bench/minetest/optimized/tool.ll
7 11 bench/mini-lsm-rs/optimized/2j7sj03n10nloiwr.ll
11 12 bench/mini-lsm-rs/optimized/2vbarw74mreksmkr.ll
7 8 bench/mini-lsm-rs/optimized/3l74wehtlfae5jz1.ll
8 10 bench/mold/optimized/gdb-index.cc.X86_64.ll
5 9 bench/mold/optimized/icf.cc.X86_64.ll
12 14 bench/nanobind/optimized/nb_func.ll
7 17 bench/ncnn/optimized/concat.ll
5 7 bench/ncnn/optimized/detectionoutput.ll
7 11 bench/ninja/optimized/build_log_perftest.ll
7 9 bench/oiio/optimized/exrinput.ll
6 7 bench/oiio/optimized/exrinput_c.ll
3 4 bench/openblas/optimized/dlarrv.ll
3 6 bench/opencc/optimized/CommandLine.ll
7 13 bench/openexr/optimized/ImfTileOffsets.ll
2 8 bench/openjdk/optimized/g1HeapRegionManager.ll
8 11 bench/openjdk/optimized/p11_convert.ll
7 12 bench/openmpi/optimized/plookup.ll
16 30 bench/openmpi/optimized/pmix_hash.ll
2 7 bench/openssl/optimized/f_string.ll
56 26 bench/openssl/optimized/o_str.ll
15 30 bench/ozz-animation/optimized/animation.ll
11 16 bench/ozz-animation/optimized/animation_optimizer.ll
14 27 bench/pbrt-v4/optimized/loopsubdiv.ll
15 30 bench/pola-rs/optimized/8jp76n2tmi4x2dvxoma5qtaa7.ll
5 15 bench/postgres/optimized/pg_enum.ll
9 18 bench/proj/optimized/singleoperation.ll
21 33 bench/raylib/optimized/rmodels.ll
5 12 bench/ring-rs/optimized/1ifa1mnaz8f3h6jb.ll
27 46 bench/ruby/optimized/encoding.ll
15 27 bench/rust-analyzer-rs/optimized/rilullg9p294yp1.ll
5 7 bench/sentencepiece/optimized/builder.ll
3 8 bench/slurm/optimized/part_data.ll
8 16 bench/spike/optimized/vl1re16_v.ll
12 23 bench/tev/optimized/Common.ll
9 16 bench/tev/optimized/ImageViewer.ll
5 11 bench/tomlplusplus/optimized/toml.ll
32 44 bench/typst-rs/optimized/1fd2xpfefmgrcb9d.ll
8 10 bench/typst-rs/optimized/3kgmqnxcsl3z3n0n.ll
4 11 bench/typst-rs/optimized/59tuvc5m3xlovl3o.ll
9 13 bench/verilator/optimized/V3AstNodes.ll
48 92 bench/wasmedge/optimized/compiler.ll
10 21 bench/wasmtime-rs/optimized/2dcgoeji2y2j2nl0.ll
13 25 bench/wasmtime-rs/optimized/3wxh4cbua3k3i5hq.ll
5 13 bench/wasmtime-rs/optimized/6ly84hjssnlljzr.ll
4 10 bench/wireshark/optimized/packet-dhcpv6.ll
8 19 bench/wireshark/optimized/packet-ssh.ll
8 15 bench/wireshark/optimized/packet-thrift.ll
9 15 bench/yoga/optimized/YGNode.ll
20 36 bench/zed-rs/optimized/d31g6vudldcq1cl7b9cowxr8a.ll
3 9 bench/zstd/optimized/fse_decompress.ll
5 10 bench/zxing/optimized/PDFCodewordDecoder.ll

@github-actions
Copy link
Contributor

Here is a brief summary of the major changes in this patch, focusing on high-level transformations and ignoring minor or non-essential modifications:

Major Changes

  1. Elimination of llvm.umax Calls:

    • The patch removes several calls to @llvm.umax.i32 and @llvm.umax.i64, which were previously used to compute the maximum of a value and 1 (e.g., umax = umax(x, 1)).
    • These are replaced with either direct use of the original value or equivalent logic that avoids the need for such a max operation.
    • This suggests that the code was optimized by eliminating unnecessary safety checks or redundant bounds computations.
  2. Simplification of Loop Exit Conditions:

    • Many loop exit conditions like:
      %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
      have been changed to compare directly against the source variable instead of the umax-derived trip count:
      %exitcond.not = icmp eq i64 %indvars.iv.next, %original.value
    • This indicates that the trip count calculation no longer requires padding with umax(x, 1), and loops now use tighter bounds derived from original input values.
  3. Removal of Unused or Redundant Phi Nodes and Variables:

    • Several phi instructions involving the result of umax have been removed.
    • For example, %umax = call i64 @llvm.umax.i64(...) followed by phi nodes using %umax are now replaced with phis that use the original variable %val.
    • This reduces register pressure and simplifies control flow without changing semantics.
  4. Inlining and Loop Header Restructuring:

    • Some functions show structural changes where .lr.ph.preheader blocks are merged or eliminated, and .lr.ph labels now branch directly from predecessors without needing an intermediate preheader block that computes umax.
    • This reflects better loop structure optimization and reduced overhead in loop setup.
  5. Memory Copy Adjustments:

    • In some cases, calls to memcpy or similar routines had their length argument adjusted via zext(umax(x, 1)). These are now simplified to just zext(x), implying that the input is already guaranteed to be positive or non-zero.
    • This change likely comes from improved static analysis or assertion handling that removes the need for runtime clamping to minimum values.

High-Level Overview

The overall goal of the patch appears to be reducing redundancy and improving loop efficiency by removing unnecessary calls to llvm.umax, simplifying trip count calculations, and streamlining control flow graphs. It leverages more precise assumptions about input ranges and eliminates defensive zero-clamping patterns that were previously used. These optimizations suggest stronger static reasoning by the compiler or manual cleanup based on known constraints, resulting in cleaner and potentially faster-executing code.

model: qwen-plus-latest
CompletionUsage(completion_tokens=638, prompt_tokens=103261, total_tokens=103899, completion_tokens_details=None, prompt_tokens_details=None)

@dtcxzyw dtcxzyw closed this Jun 22, 2025
@dtcxzyw dtcxzyw deleted the test-run15802919592 branch June 30, 2025 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants