Skip to content

Conversation

@dtcxzyw
Copy link
Owner

@dtcxzyw dtcxzyw commented Jun 14, 2025

Link: llvm/llvm-project#143683
Requested by: @dtcxzyw

@github-actions github-actions bot mentioned this pull request Jun 14, 2025
@dtcxzyw
Copy link
Owner Author

dtcxzyw commented Jun 14, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@d4c7d0b
patch: llvm/llvm-project#143683
sha256: 54eaa56178578da2a2dc2694fa4498e0b9da77f5d6d9f32dd7c57f2b658f56c1
commit: c314743

3 files changed, 633 insertions(+), 669 deletions(-)

Improvements:
  gvn.NumGVNEqProp 456516 -> 456518 +0.00%
  correlated-value-propagation.NumPhis 1345916 -> 1345918 +0.00%
  scalar-evolution.NumExitCountsNotComputed 12663534 -> 12663544 +0.00%
  memdep.NumCacheCompleteNonLocalPtr 5686109 -> 5686113 +0.00%
  jump-threading.NumThreads 2951034 -> 2951036 +0.00%
  instcombine.NumDeadInst 45401734 -> 45401744 +0.00%
  instcombine.NumCombined 131973745 -> 131973753 +0.00%
Regressions:
  reassociate.NumChanged 5221578 -> 5221574 -0.00%
  memdep.NumCacheNonLocalPtr 284002745 -> 284002709 -0.00%
  memdep.NumUncacheNonLocalPtr 269980422 -> 269980392 -0.00%

127 145 bench/miniaudio/optimized/unity.ll
172 190 bench/raylib/optimized/raudio.ll

@github-actions
Copy link
Contributor

The provided LLVM IR diff modifies several functions related to memory allocation and synchronization in the miniaudio and raudio benchmarks. Below is a summary of up to 5 major changes, focusing only on meaningful transformations:


1. Simplification of Slot Allocator Allocation Logic

In both ma_slot_allocator_alloc and similar functions:

  • The original logic involved computing bit indices using shifts and masks (lshr, and, etc.) to determine if a slot group needs expansion.
  • This has been simplified by replacing that sequence with a direct check: icmp eq i32 %value, 0.
  • This change reduces intermediate computations, likely improving readability and potentially aiding optimization.

2. Loop Structure and PHI Node Reorganization

  • Several loops such as those involving .preheader, .preheader58, and ma_ffs_32.exit have been restructured.
  • PHI nodes were updated accordingly (e.g., %indvars.iv and %08.i), reflecting changes in control flow.
  • These changes suggest better loop unrolling or simplification of induction variables, possibly enabling better vectorization or register allocation.

3. Improved Atomic Compare-and-Swap Logic

  • In ma_slot_allocator_alloc, the cmpxchg instruction now uses the correct value loaded from memory before the operation:
    • Previously used outdated values like %16 or %14.
    • Now correctly uses the value read from the same atomic load (%12, %14, etc.).
  • This ensures correctness in concurrent access scenarios and aligns better with memory model expectations.

4. Cleanup of Exit Thread Selection in PHIs

  • In exit blocks like .thread54 and ma_slot_allocator_alloc.exit.thread, new incoming PHI edges were added pointing to .preheader59.
  • This reflects new control flow paths introduced by earlier restructuring, ensuring all possible predecessors are accounted for.
  • Ensures correctness after control-flow graph (CFG) modifications.

5. Reduction of Redundant Loads and Stores

  • Some redundant loads from memory (e.g., %35, %36, %47) were removed or reordered.
  • Memory operations inside ma_job_queue_post were streamlined, reducing unnecessary pointer arithmetic and store/load pairs.
  • These changes can improve performance by reducing memory traffic and helping with alias analysis.

Summary

These changes primarily aim to simplify and optimize allocation and synchronization routines:

  • Simplified zero checks replace complex bitmasking.
  • Loop structures and PHIs are reorganized for clarity and optimization.
  • Atomic operations use more accurate values.
  • Memory usage is optimized with fewer redundant accesses.
  • CFG updates ensure correct handling of all code paths.

Overall, these represent targeted optimizations that could lead to better runtime performance and maintainability of the generated code.

model: qwen-plus-latest
CompletionUsage(completion_tokens=594, prompt_tokens=18253, total_tokens=18847, completion_tokens_details=None, prompt_tokens_details=None)

@dtcxzyw dtcxzyw closed this Jun 14, 2025
@dtcxzyw dtcxzyw deleted the test-run15649014781 branch June 16, 2025 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant