Skip to content

Conversation

@dtcxzyw
Copy link
Owner

@dtcxzyw dtcxzyw commented Apr 13, 2025

Link: llvm/llvm-project#134712
Requested by: @andjo403

@github-actions github-actions bot mentioned this pull request Apr 13, 2025
@dtcxzyw
Copy link
Owner Author

dtcxzyw commented Apr 13, 2025

Diff mode

runner: ariselab-64c-v2
baseline: llvm/llvm-project@a24ef4b
patch: llvm/llvm-project#134712
sha256: 1e12f7f2556cc6d21f21032af90e1ac0308db68f6733c2feac2ec3c4d803c677
commit: 48ba7c7

3223 files changed, 1970842 insertions(+), 1975763 deletions(-)

Improvements:
  instcombine.NegatorNumNegationsFoundInCache 4247 -> 4283 +0.85%
  early-cse.NumCSECVP 95437 -> 95931 +0.52%
  indvars.NumElimIdentity 1694 -> 1701 +0.41%
  correlated-value-propagation.NumUDivURemsNarrowedExpanded 960 -> 963 +0.31%
  loop-simplifycfg.NumLoopBlocksDeleted 5521 -> 5534 +0.24%
  bdce.NumSimplified 5324 -> 5333 +0.17%
  simple-loop-unswitch.NumTrivial 3072 -> 3077 +0.16%
  correlated-value-propagation.NumSRems 1359 -> 1361 +0.15%
  loop-simplify.NumNested 9538 -> 9552 +0.15%
  correlated-value-propagation.NumAnd 38211 -> 38255 +0.12%
Regressions:
  indvars.NumReplaced 60746 -> 56733 -6.61%
  gvn.NumGVNEqProp 356515 -> 348190 -2.34%
  aggressive-instcombine.NumExprsReduced 19334 -> 18905 -2.22%
  correlated-value-propagation.NumPhis 1110670 -> 1089967 -1.86%
  aggressive-instcombine.NumInstrsReduced 59856 -> 58798 -1.77%
  instcombine.NumExpand 1977 -> 1956 -1.06%
  adce.NumRemoved 93290 -> 93003 -0.31%
  licm.NumSunk 263925 -> 263455 -0.18%
  local.NumPHICSEs 157076 -> 156856 -0.14%
  instcombine.NumConstProp 122376 -> 122212 -0.13%

5 3 bench/abc/optimized/bmcFault.ll
2 1 bench/abc/optimized/solver.ll
53 56 bench/abseil-cpp/optimized/hash_generator_testing.ll
4 2 bench/abseil-cpp/optimized/inlined_vector_exception_safety_test.ll
25 26 bench/assimp/optimized/UniqueNameGenerator.ll
18 16 bench/box2d/optimized/imgui_tables.ll
18 9 bench/c3c/optimized/bigint.ll
42 21 bench/c3c/optimized/sema_casts.ll
36 45 bench/casadi/optimized/function.ll
2 3 bench/casadi/optimized/sx_instantiator.ll
11 8 bench/cjson/optimized/cJSON_Utils.ll
20 21 bench/coreutils-rs/optimized/31vrb73337u20kex.ll
12 13 bench/coreutils-rs/optimized/3ntjj58b904wujzh.ll
29 34 bench/coreutils-rs/optimized/n5dhracig0q9az4.ll
29 32 bench/cpp-httplib/optimized/httplib.ll
3 4 bench/curl/optimized/cf-https-connect.ll
27 29 bench/cvc5/optimized/sygus_sampler.ll
23 25 bench/cvc5/optimized/theory_arrays_type_rules.ll
1 2 bench/darktable/optimized/Cr2Decompressor.ll
20 10 bench/darktable/optimized/box_filters.ll
10 11 bench/delta-rs/optimized/2tf2q4cmcrkztukf.ll
5 7 bench/diesel-rs/optimized/27d1dwdaey9nml16.ll
8 7 bench/draco/optimized/mesh_stripifier.ll
40 41 bench/folly/optimized/ManualExecutor.ll
54 55 bench/folly/optimized/ThreadPoolExecutor.ll
20 21 bench/freetype/optimized/psaux.ll
5 6 bench/gromacs/optimized/gmx_wham.ll
29 30 bench/hdf5/optimized/H5Dcontig.ll
1 3 bench/hermes/optimized/regcomp.ll
10 14 bench/hyperscan/optimized/ng_haig.ll
4 6 bench/hyperscan/optimized/ng_repeat.ll
3 2 bench/icu/optimized/locdispnames.ll
8 9 bench/icu/optimized/normalizer2impl.ll
21 26 bench/icu/optimized/ustdio.ll
43 48 bench/image-rs/optimized/30755d6iao7ojcvl.ll
11 10 bench/jemalloc/optimized/extent_dss.ll
35 70 bench/jq/optimized/decNumber.ll
42 43 bench/jsonnet/optimized/vm.ll
67 66 bench/lief/optimized/ssl_tls13_client.ll
4 3 bench/linux/optimized/vsprintf.ll
4 6 bench/llama.cpp/optimized/llama-kv-cache.ll
4 5 bench/luajit/optimized/buildvm_fold.ll
9 5 bench/luau/optimized/isocline.ll
65 66 bench/lvgl/optimized/lv_svg_render.ll
48 54 bench/ninja/optimized/graph.ll
5 4 bench/nuttx/optimized/fs_sendfile.ll
60 64 bench/opencv/optimized/text_format.ll
27 26 bench/openjdk/optimized/memnode.ll
19 16 bench/openssl/optimized/quic_engine.ll
60 57 bench/openusd/optimized/lightAPIAdapter.ll
18 19 bench/ozz-animation/optimized/options.ll
18 19 bench/php/optimized/phpdbg_frame.ll
40 42 bench/pocketpy/optimized/dataclasses.ll
3 1 bench/postgres/optimized/checkpointer.ll
46 44 bench/proj/optimized/param.ll
18 14 bench/proj/optimized/trans.ll
34 33 bench/recastnavigation/optimized/DetourCrowd.ll
15 17 bench/ruby/optimized/proc.ll
4 5 bench/rust-analyzer-rs/optimized/2mbx5ptcpq6fo7sc.ll
8 12 bench/rust-analyzer-rs/optimized/55szrkbrq7kolv5z.ll
33 31 bench/rust-analyzer-rs/optimized/5ac99zaxn7b9r9xv.ll
10 5 bench/rust-analyzer-rs/optimized/5c13ae2xelsf4ggd.ll
15 20 bench/slurm/optimized/licenses.ll
51 49 bench/slurm/optimized/node_info.ll
21 26 bench/slurm/optimized/proc_args.ll
31 29 bench/spike/optimized/s_mulAddF128.ll
22 20 bench/spike/optimized/s_roundPackToF128.ll
67 66 bench/spike/optimized/wfi.ll
45 40 bench/stb/optimized/stb_sprintf.ll
120 60 bench/sundials/optimized/arkode_butcher.ll
31 33 bench/syn/optimized/2i67i8azb4r5b3mw.ll
38 37 bench/tinyrenderer/optimized/tgaimage.ll
12 16 bench/tree-sitter-rs/optimized/2jber9b3bsvatks5.ll
5 3 bench/verilator/optimized/V3Options.ll
2 3 bench/wasmtime-rs/optimized/3vdx8w41hjyzioqv.ll
2 4 bench/wasmtime-rs/optimized/4kfbj1e4an3vjclp.ll
22 26 bench/wireshark/optimized/commandline.ll
14 13 bench/wireshark/optimized/dct3trace.ll
83 64 bench/xgboost/optimized/config.ll
63 51 bench/yalantinglibs/optimized/file_server.ll
16 13 bench/yosys/optimized/abc9_ops.ll
7 5 bench/yosys/optimized/ffmerge.ll
8 6 bench/z3/optimized/arith_rewriter.ll
5 9 bench/z3/optimized/dimacs.ll
29 21 bench/zed-rs/optimized/5lgahps99tv0rsaolw3x59ow2.ll
27 11 bench/zed-rs/optimized/7rpe3bril898mttdoib5hjrj5.ll
5 8 bench/zed-rs/optimized/80403hw32s3ougvze8j2ycldj.ll

@github-actions
Copy link
Contributor

Summary of Major Changes in the LLVM IR Diff

  1. Phi Node Adjustments and Simplifications:

    • Multiple phi nodes have been adjusted to use new or renamed variables, such as replacing %phi with %.ph or %.ph62. This indicates optimizations where intermediate values are being managed more efficiently.
    • Some phi nodes have been added to preheader blocks (e.g., %.0.i.i.ph, %.ph137) to improve loop handling and reduce redundant calculations within loops.
  2. Branch Condition Simplifications:

    • Branch conditions have been simplified by removing unnecessary comparisons or truncations. For example, replacing complex selects with simpler icmp instructions.
    • In several cases, branches now directly compare values instead of using intermediate variables, reducing instruction count and improving clarity (e.g., br i1 %49).
  3. Memory Access Optimization:

    • Memory access patterns have been optimized by adjusting getelementptr (GEP) indices and ensuring proper alignment for loads/stores.
    • Calls to llvm.memcpy.p0.p0.i64 and llvm.experimental.noalias.scope.decl have been updated with revised metadata and arguments, reflecting aliasing improvements.
  4. Redundant Variable Removal:

    • Several redundant variables and phi nodes have been removed, such as %pa.val75166 in Folly's ThreadPoolExecutor and %spec.select30 in CVC5's mesh_stripifier.ll. These changes streamline the code by eliminating unnecessary computations.
    • Truncation operations (trunc nuw i8 to i1) have been simplified or removed entirely when possible, improving type handling.
  5. Loop Handling Enhancements:

    • Loop exits and preheaders have been restructured to improve control flow logic. For instance, adding .backedge labels like %.preheader.i.i.backedge ensures better loop unrolling and vectorization opportunities.
    • Loop predicates have been updated to reflect clearer exit conditions, enhancing loop analysis and potential transformations (e.g., icmp eq i64 %122, 624).

High-Level Overview of Changes

The provided diff showcases various refinements across multiple benchmarks, focusing on improving performance through LLVM Intermediate Representation (IR) optimizations. Key areas of improvement include:

  • Phi Node Management: Adjustments to phi nodes ensure that intermediate values are computed only when necessary, reducing redundant calculations and improving cache utilization.
  • Control Flow Simplification: Branch conditions have been streamlined, often replacing complex selects with straightforward comparisons. This improves readability and reduces overhead in conditional evaluations.
  • Memory Access Efficiency: GEP and load/store operations have been fine-tuned for better alignment and reduced aliasing concerns, leading to faster memory accesses.
  • Variable Reduction: Unnecessary variables and operations, such as redundant truncations and phi nodes, have been eliminated to simplify the IR and lower register pressure.
  • Enhanced Loop Structures: Preheader and backedge modifications enable better loop analysis, facilitating optimizations like unrolling and vectorization.

These changes collectively aim to enhance runtime efficiency, reduce memory usage, and improve the overall quality of generated machine code by refining the IR structure. The focus is on minimizing redundant operations while preserving correctness and enabling further compiler-level optimizations.

model: qwen-plus-latest
CompletionUsage(completion_tokens=682, prompt_tokens=109799, total_tokens=110481, completion_tokens_details=None, prompt_tokens_details=None)

br i1 %.not5581.i.i.i.i, label %.lr.ph.i.i174.i.i.preheader, label %.critedge57.loopexit.i.i.i.i

.lr.ph.i.i174.i.i.preheader: ; preds = %518, %515, %504
%.ph = phi i32 [ 2, %504 ], [ 2, %515 ], [ %502, %518 ]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of regressions like this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe only do the fold if all arms can be folded to a constant to avoid this type of regression

@andjo403
Copy link

andjo403 commented Jun 6, 2025

/add-label miscompilation

@andjo403
Copy link

andjo403 commented Jun 6, 2025

/close

@github-actions github-actions bot closed this Jun 6, 2025
@dtcxzyw dtcxzyw deleted the test-run14431379569 branch June 16, 2025 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants