Skip to content

Conversation

@dtcxzyw
Copy link
Owner

@dtcxzyw dtcxzyw commented Jun 5, 2025

Link: llvm/llvm-project#142878
Requested by: @dtcxzyw

@github-actions github-actions bot mentioned this pull request Jun 5, 2025
@dtcxzyw
Copy link
Owner Author

dtcxzyw commented Jun 5, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@7ca7bcb
patch: llvm/llvm-project#142878
sha256: fe19d47e3bd9892b2bbcf7323abcfa9aa09995bddf4247726cf443a56d0803b1
commit: 33df204

69 files changed, 20904 insertions(+), 21161 deletions(-)

Improvements:
  div-rem-pairs.NumHoisted 3413 -> 3414 +0.03%
  correlated-value-propagation.NumSMinMax 9247 -> 9248 +0.01%
  correlated-value-propagation.NumAnd 48446 -> 48451 +0.01%
  div-rem-pairs.NumPairs 56287 -> 56291 +0.01%
  licm.NumMovedCalls 35465 -> 35466 +0.00%
  correlated-value-propagation.NumSExt 51155 -> 51156 +0.00%
  globalopt.NumDeleted 1036922 -> 1036936 +0.00%
  function-attrs.NumNoUndefReturn 77820 -> 77821 +0.00%
  correlated-value-propagation.NumPhis 1357206 -> 1357217 +0.00%
  correlated-value-propagation.NumShlNSW 126849 -> 126850 +0.00%
Regressions:
  correlated-value-propagation.NumMinMax 16564 -> 16541 -0.14%
  simple-loop-unswitch.NumSwitches 2033 -> 2032 -0.05%
  aggressive-instcombine.NumInstrsReduced 73621 -> 73606 -0.02%
  aggressive-instcombine.NumExprsReduced 22630 -> 22629 -0.00%
  indvars.NumElimExt 310087 -> 310077 -0.00%
  loop-instsimplify.NumSimplified 196752 -> 196746 -0.00%
  correlated-value-propagation.NumSubNUW 39275 -> 39274 -0.00%
  simplifycfg.NumSinkCommonCode 389341 -> 389332 -0.00%
  bdce.NumRemoved 396451 -> 396443 -0.00%
  gvn.NumGVNInstr 160543 -> 160540 -0.00%

16 24 bench/abc/optimized/abc.ll
18 26 bench/abc/optimized/bmcMaj2.ll
18 23 bench/abc/optimized/bmcMaj3.ll
39 46 bench/abc/optimized/cbaBlast.ll
19 27 bench/abc/optimized/dauDsd.ll
7 14 bench/abc/optimized/giaDecs.ll
4 6 bench/abc/optimized/giaIf.ll
8 15 bench/abc/optimized/giaMinLut.ll
23 27 bench/abc/optimized/giaMuxes.ll
103 111 bench/abc/optimized/ifDec66.ll
48 40 bench/abc/optimized/ifDsd.ll
50 61 bench/abc/optimized/ifSat.ll
59 70 bench/abc/optimized/ifTune.ll
15 23 bench/abc/optimized/sbdLut.ll
43 50 bench/abc/optimized/sfmSat.ll
21 26 bench/abc/optimized/utilIsop.ll
19 31 bench/abc/optimized/wlcBlast.ll
7 6 bench/boost/optimized/collator.ll
21 28 bench/cmake/optimized/zstd_compress.ll
54 59 bench/cpython/optimized/obmalloc.ll
13 11 bench/curl/optimized/http.ll
5 3 bench/ffmpeg/optimized/bgmc.ll
27 28 bench/ffmpeg/optimized/mpegvideo.ll
117 38 bench/ffmpeg/optimized/timecode.ll
167 193 bench/icu/optimized/uniset.ll
74 72 bench/icu/optimized/utext.ll
58 62 bench/image-rs/optimized/254ue5dpb10tdnze.ll
22 24 bench/image-rs/optimized/4srzh4wujeew249y.ll
19 17 bench/image-rs/optimized/5ez7udly19o3uj1p.ll
2 3 bench/libevent/optimized/evdns.ll
68 71 bench/libwebp/optimized/quant_dec.ll
12 10 bench/lief/optimized/debug.ll
12 12 bench/lvgl/optimized/lv_draw_sw_mask.ll
7 9 bench/minetest/optimized/mapblock_mesh.ll
97 95 bench/openjdk/optimized/DrawGlyphList.ll
117 107 bench/openssl/optimized/bio_dump.ll
37 41 bench/openusd/optimized/aom_convolve.ll
253 279 bench/openusd/optimized/av1_inv_txfm2d.ll
66 72 bench/openusd/optimized/av1_loopfilter.ll
12 15 bench/openusd/optimized/cfl.ll
16 16 bench/openusd/optimized/quant_common.ll
133 137 bench/openusd/optimized/restoration.ll
18 20 bench/openusd/optimized/warped_motion.ll
10 11 bench/postgres/optimized/snprintf.ll
84 108 bench/sdl/optimized/yuv_rgb_std.ll
6 8 bench/wasmi-rs/optimized/4jlq2y0wli37amf79pjx22v8j.ll
12 11 bench/zed-rs/optimized/8n2fsvz9zbnw9ojg9jkj0503g.ll
12 14 bench/zed-rs/optimized/dw4qzuo904yf8wu71sutofhxl.ll

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2025

Here's a high-level summary of the most significant changes in this patch, focusing only on interesting modifications:

  1. Removal of llvm.umax.i32 Intrinsic Usage
    Multiple functions previously used llvm.umax.i32 to compute maximum values with a lower bound (e.g., max(x, 2)). These calls have been replaced with constant comparisons and selects using icmp ult, which effectively computes the same logic without relying on the umax intrinsic. This reduces dependency on an LLVM-specific intrinsic and replaces it with more standard IR constructs.

  2. Simplification of Select Chains Using Range Checks
    Instead of chaining multiple select instructions based on equality (icmp eq) to determine successive values (like 2, 3, 4, etc.), several functions now use icmp ult to check if a value is within a range. This simplifies control flow and potentially improves performance by reducing the number of branches and select operations.

  3. Elimination of Intermediate Phi Nodes and Unused Variables
    Several intermediate phi nodes and variables that were used to track or propagate results of previous umax/select chains have been removed or simplified. For example, unused tail call results and redundant phi selections based on bitmasking and and/or operations are no longer present, indicating dead code removal.

  4. Reduction in Number of Instructions for Bitmasking and Multiplication
    Some transformations involving bitmasks (e.g., and i64 ..., 15) followed by multiplication (mul nuw nsw ... , 17) have been streamlined. The structure of these computations remains, but the way they're conditionally selected has changed, often resulting in fewer instructions due to earlier simplifications.

  5. Improved Value Selection Logic with Truncation and Extension Optimizations
    In some cases, especially in ifDsd.ll and zstd_compress.ll, truncations and extensions (e.g., trunc i64 ... to i32, zext i32 ... to i64) have been reordered or optimized in combination with selects, leading to cleaner and more efficient handling of integer ranges.


Conclusion

The main theme across all diffs appears to be replacing explicit @llvm.umax.i32 calls and associated select patterns with simpler range-based checks using icmp ult. This leads to:

  • Reduced reliance on intrinsics.
  • More readable and compact code.
  • Removal of unnecessary intermediate logic.
  • Streamlined control flow in loops and switch statements.

These optimizations likely improve both compilation and runtime efficiency while maintaining equivalent behavior.

model: qwen-plus-latest
CompletionUsage(completion_tokens=546, prompt_tokens=117505, total_tokens=118051, completion_tokens_details=None, prompt_tokens_details=None)

%spec.select = tail call i64 @llvm.umin.i64(i64 %2, i64 %26)
br label %27
%28 = trunc nuw nsw i64 %spec.select to i32
%29 = select i1 %27, i32 0, i32 %28
Copy link
Owner Author

@dtcxzyw dtcxzyw Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtcxzyw dtcxzyw closed this Jun 5, 2025
@dtcxzyw dtcxzyw deleted the test-run15460443895 branch June 6, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant