Skip to content

Conversation

@dtcxzyw
Copy link
Owner

@dtcxzyw dtcxzyw commented Feb 9, 2025

@github-actions github-actions bot mentioned this pull request Feb 9, 2025
@dtcxzyw
Copy link
Owner Author

dtcxzyw commented Feb 9, 2025

runner: ariselab-64c-v2
baseline: llvm/llvm-project@d204724
patch: llvm/llvm-project#126438
sha256: 7523a589ae919b4f9e3108200acfa52b65213a38b3c3e46a8c4f5226eb7f7bd6
commit: 05242c2

505 files changed, 11414 insertions(+), 11465 deletions(-)

Improvements:
  correlated-value-propagation.NumAShrsConverted 3970 -> 3971 +0.03%
  loop-instsimplify.NumSimplified 162364 -> 162386 +0.01%
  simple-loop-unswitch.NumCostMultiplierSkipped 13381 -> 13382 +0.01%
  simple-loop-unswitch.NumBranches 82386 -> 82390 +0.00%
  sccp.NumDeadBlocks 728065 -> 728083 +0.00%
  loop-delete.NumDeleted 144897 -> 144900 +0.00%
  correlated-value-propagation.NumPhiCommon 49272 -> 49273 +0.00%
  local.NumPHICSEs 155257 -> 155260 +0.00%
  gvn.NumPRELoadMoved2CEPred 76922 -> 76923 +0.00%
  instcombine.NumCombined 99678365 -> 99679330 +0.00%
Regressions:
  div-rem-pairs.NumHoisted 2943 -> 2922 -0.71%
  div-rem-pairs.NumDecomposed 1716 -> 1713 -0.17%
  div-rem-pairs.NumPairs 29871 -> 29848 -0.08%
  indvars.NumElimIdentity 1713 -> 1712 -0.06%
  correlated-value-propagation.NumUDivURemsNarrowed 8405 -> 8404 -0.01%
  correlated-value-propagation.NumSDivs 17008 -> 17007 -0.01%
  correlated-value-propagation.NumCmps 249857 -> 249849 -0.00%
  dse.NumRedundantStores 31883 -> 31882 -0.00%
  gvn.NumGVNPRE 133649 -> 133645 -0.00%
  instsimplify.NumExpand 139754 -> 139751 -0.00%

1 1 bench/abc/optimized/abcLog.ll
2 2 bench/abseil-cpp/optimized/symbolize.ll
13 19 bench/assimp/optimized/FBXParser.ll
369 380 bench/boost/optimized/to_chars.ll
7 3 bench/clamav/optimized/hwp.ll
4 5 bench/cpython/optimized/arraymodule.ll
12 12 bench/cpython/optimized/mpdecimal.ll
4 4 bench/diesel-rs/optimized/2zzzvc1em6im74h3.ll
3 3 bench/flac/optimized/encode.ll
12 15 bench/icu/optimized/number_decimalquantity.ll
10 10 bench/icu/optimized/plurrule.ll
8 8 bench/image-rs/optimized/ptscn4jakoj4p9m.ll
4 5 bench/lightgbm/optimized/boosting.ll
6 7 bench/lightgbm/optimized/gbdt.ll
8 10 bench/lightgbm/optimized/objective_function.ll
28 28 bench/llama.cpp/optimized/ggml.ll
53 46 bench/llvm/optimized/CombinerHelper.ll
20 20 bench/llvm/optimized/DAGCombiner.ll
13 13 bench/llvm/optimized/LegalizeVectorTypes.ll
53 57 bench/llvm/optimized/Legalizer.ll
51 66 bench/llvm/optimized/LegalizerHelper.ll
29 29 bench/llvm/optimized/SLPVectorizer.ll
7 7 bench/llvm/optimized/SelectionDAG.ll
9 10 bench/llvm/optimized/TargetLowering.ll
10 13 bench/llvm/optimized/X86FrameLowering.ll
172 136 bench/llvm/optimized/X86TargetTransformInfo.ll
34 34 bench/meshlab/optimized/meshfilter.ll
9 11 bench/mold/optimized/passes.cc.ALPHA.ll
10 12 bench/mold/optimized/passes.cc.SPARC64.ll
18 18 bench/ncnn/optimized/deconvolution_x86_avx.ll
32 32 bench/ncnn/optimized/deconvolution_x86_avx512.ll
6 6 bench/ncnn/optimized/deconvolutiondepthwise_x86_avx.ll
6 6 bench/openblas/optimized/dsytrd_sb2st.ll
40 44 bench/opencc/optimized/louds-trie.ll
33 35 bench/opencc/optimized/tail.ll
6 8 bench/opencv/optimized/persistence.ll
16 12 bench/pbrt-v4/optimized/paramdict.ll
15 24 bench/qemu/optimized/audio_audio.ll
19 19 bench/velox/optimized/Sequence.ll
5 5 bench/wasmtime-rs/optimized/16qf4j2oevjc61uc.ll
8 18 bench/wireshark/optimized/packet-isakmp.ll
5 5 bench/xgboost/optimized/charconv.ll

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2025

The provided LLVM IR diffs across multiple files indicate several significant changes that can be summarized into five major categories:

1. Introduction of exact Division Instructions

  • In many places, the division instructions (udiv, sdiv) have been modified to include the exact keyword. This ensures that the division operation does not produce a remainder, which can help in optimizing the code by allowing the compiler to assume no overflow or truncation occurs.
  • Example:
    %114 = udiv exact i32 %.046.off.i.i, 10
    This change is present in numerous files such as boost/optimized/to_chars.ll, assimp/optimized/FBXParser.ll, and llama.cpp/optimized/ggml.ll.

2. Condition Checks Modified for Division By Zero

  • Several condition checks have been introduced or modified to explicitly handle division by zero scenarios before performing division operations. This prevents undefined behavior and ensures robustness.
  • Example:
    %.not128.i.i.i.i.i.i = icmp eq i64 %67, 0
    br i1 %.not128.i.i.i.i.i.i, label %._crit_edge.i.i.i.i.i.i, label %.lr.ph.i.i.i.i.i.i
    This pattern appears in both image-rs/optimized/ptscn4jakoj4p9m.ll and lightgbm/optimized/Legalizer.ll.

3. Use of llvm.umax and llvm.umin Intrinsics

  • The patch introduces the use of llvm.umax and llvm.umin intrinsics to compute the maximum and minimum values between two integers. These intrinsics are more efficient than manually calculating the max or min using conditional branches.
  • Example:
    %spec.store.select = tail call i32 @llvm.umax.i32(i32 %1, i32 1)
    This change is visible in boost/optimized/to_chars.ll and clamav/optimized/hwp.ll.

4. Reorganization of Phi Nodes and Control Flow

  • The control flow has been reorganized in several functions to reduce redundant computations and improve clarity. Specifically, phi nodes have been adjusted to eliminate unnecessary intermediate values and simplify branching logic.
  • Example:
    %.060106 = phi ptr [ %.pre119, %.lr.ph107 ], [ %140, %_ZN4llvm23SmallVectorTemplateBaseINS_8RegisterELb1EE9push_backES1_.exit ]
    This reorganization is evident in llvm/optimized/CombinerHelper.ll and llvm/optimized/LegalizerHelper.ll.

5. TBAA (Type-Based Alias Analysis) Metadata Adjustments

  • TBAA metadata tags have been updated or replaced to reflect changes in data types or memory access patterns. This helps the optimizer make better decisions about aliasing and memory layout.
  • Example:
    store i8 %780, ptr %.32045, align 1, !tbaa !22
    TBAA updates are scattered throughout various files like boost/optimized/to_chars.ll and python/optimized/arraymodule.ll.

High-Level Overview:

This patch primarily focuses on improving the efficiency and correctness of integer division operations by introducing exact divisions, handling edge cases like division by zero more explicitly, and leveraging intrinsic functions like llvm.umax and llvm.umin. Additionally, it simplifies and optimizes control flow structures, particularly loops and conditional branches, by reorganizing phi nodes and reducing redundant computations. Lastly, adjustments to TBAA metadata ensure that the memory access patterns are accurately represented, aiding further optimizations.

These changes collectively enhance performance by reducing unnecessary calculations, improving branch prediction, and ensuring safer arithmetic operations.

model: qwen-plus-latest
CompletionUsage(completion_tokens=847, prompt_tokens=114175, total_tokens=115022, completion_tokens_details=None, prompt_tokens_details=None)

@dtcxzyw dtcxzyw closed this Feb 19, 2025
@dtcxzyw dtcxzyw deleted the test-run13230064145 branch February 24, 2025 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants