Skip to content

Conversation

@jinhuang1102
Copy link
Contributor

@jinhuang1102 jinhuang1102 commented Nov 4, 2025

As a follow-up to PR#165841, this change addresses prof_md metadata loss in AtomicExpandPass when lowering atomicrmw xchg to a Load-Linked/Store-Exclusive (LL/SC) loop.

This path is distinct from the LSE path addressed previously:

PR #165841 (and its tests) used -mtriple=aarch64-linux-gnu, which targets a modern ARMv8.1+ architecture. This architecture supports Large System Extensions (LSE), allowing atomicrmw to be lowered directly to a more efficient hardware instruction.

This PR (and its tests) uses -mtriple=aarch64-- or -mtriple=armv8-linux-gnueabihf. This indicates an ARMv8.0 or lower architecture that does not support LSE. On these targets, the pass must fall back to synthesizing a manual LL/SC loop using the ldaxr/stxr instruction pair.

Similar to previous issue, the new conditional branch was failin to inherit the prof_md metadata. Theis PR correctly fix the branch weights to the newly created branch within the LL/SC loop, ensuring profile information is preserved.

This PR includes fixes for the following tests:

Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll
Transforms/AtomicExpand/ARM/atomic-expansion-v8.ll
Transforms/AtomicExpand/Hexagon/atomicrmw-fp.ll

@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Jin Huang (jinhuang1102)

Changes

As the followup PR of #165841, the AtomicExpandPass is lowering high-level atomic operations (atmoicrmw xchg) to a icmp retry loop.


Full diff: https://github.com/llvm/llvm-project/pull/166273.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/AtomicExpandPass.cpp (+9-3)
  • (modified) llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll (+21-7)
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index 53f1cfe24a68d..c22f5ff9e444b 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -38,6 +38,7 @@
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/MemoryModelRelaxationAnnotations.h"
 #include "llvm/IR/Module.h"
+#include "llvm/IR/ProfDataUtils.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/User.h"
 #include "llvm/IR/Value.h"
@@ -1259,8 +1260,7 @@ Value *AtomicExpandImpl::insertRMWLLSCLoop(
   BasicBlock *BB = Builder.GetInsertBlock();
   Function *F = BB->getParent();
 
-  assert(AddrAlign >=
-             F->getDataLayout().getTypeStoreSize(ResultTy) &&
+  assert(AddrAlign >= F->getDataLayout().getTypeStoreSize(ResultTy) &&
          "Expected at least natural alignment at this point.");
 
   // Given: atomicrmw some_op iN* %addr, iN %incr ordering
@@ -1295,7 +1295,13 @@ Value *AtomicExpandImpl::insertRMWLLSCLoop(
       TLI->emitStoreConditional(Builder, NewVal, Addr, MemOpOrder);
   Value *TryAgain = Builder.CreateICmpNE(
       StoreSuccess, ConstantInt::get(IntegerType::get(Ctx, 32), 0), "tryagain");
-  Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);
+
+  Instruction *CondBr = Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);
+
+  // Atomic RMW expands to a icmp loop, because it is hard to predict precise
+  // branch weigths we mark the branch as "unknown" (50/50) to prevent
+  // misleading optimizations.
+  setExplicitlyUnknownBranchWeightsIfProfiled(*CondBr, *F, DEBUG_TYPE);
 
   Builder.SetInsertPoint(ExitBB, ExitBB->begin());
   return Loaded;
diff --git a/llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll b/llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll
index 95a52aa0f7f52..b509b2469cfdc 100644
--- a/llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll
+++ b/llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll
@@ -1,8 +1,8 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals
 ; RUN: opt -codegen-opt-level=1 -S -mtriple=aarch64-- -passes=atomic-expand %s | FileCheck %s
 ; RUN: opt -codegen-opt-level=1 -S -mtriple=aarch64-- -mattr=+outline-atomics -passes=atomic-expand %s | FileCheck %s --check-prefix=OUTLINE-ATOMICS
 
-define void @atomic_swap_f16(ptr %ptr, half %val) nounwind {
+define void @atomic_swap_f16(ptr %ptr, half %val) !prof !0 {
 ; CHECK-LABEL: @atomic_swap_f16(
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast half [[VAL:%.*]] to i16
 ; CHECK-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -12,7 +12,7 @@ define void @atomic_swap_f16(ptr %ptr, half %val) nounwind {
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i16 [[TMP1]] to i64
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @llvm.aarch64.stxr.p0(i64 [[TMP4]], ptr elementtype(i16) [[PTR]])
 ; CHECK-NEXT:    [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP5]], 0
-; CHECK-NEXT:    br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
+; CHECK-NEXT:    br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]], !prof [[PROF1:![0-9]+]]
 ; CHECK:       atomicrmw.end:
 ; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i16 [[TMP3]] to half
 ; CHECK-NEXT:    ret void
@@ -27,7 +27,7 @@ define void @atomic_swap_f16(ptr %ptr, half %val) nounwind {
   ret void
 }
 
-define void @atomic_swap_f32(ptr %ptr, float %val) nounwind {
+define void @atomic_swap_f32(ptr %ptr, float %val) nounwind !prof !0 {
 ; CHECK-LABEL: @atomic_swap_f32(
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast float [[VAL:%.*]] to i32
 ; CHECK-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -37,7 +37,7 @@ define void @atomic_swap_f32(ptr %ptr, float %val) nounwind {
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i32 [[TMP1]] to i64
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @llvm.aarch64.stxr.p0(i64 [[TMP4]], ptr elementtype(i32) [[PTR]])
 ; CHECK-NEXT:    [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP5]], 0
-; CHECK-NEXT:    br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
+; CHECK-NEXT:    br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]], !prof [[PROF1]]
 ; CHECK:       atomicrmw.end:
 ; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32 [[TMP3]] to float
 ; CHECK-NEXT:    ret void
@@ -52,7 +52,7 @@ define void @atomic_swap_f32(ptr %ptr, float %val) nounwind {
   ret void
 }
 
-define void @atomic_swap_f64(ptr %ptr, double %val) nounwind {
+define void @atomic_swap_f64(ptr %ptr, double %val) nounwind !prof !0 {
 ; CHECK-LABEL: @atomic_swap_f64(
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast double [[VAL:%.*]] to i64
 ; CHECK-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -60,7 +60,7 @@ define void @atomic_swap_f64(ptr %ptr, double %val) nounwind {
 ; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.aarch64.ldaxr.p0(ptr elementtype(i64) [[PTR:%.*]])
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @llvm.aarch64.stxr.p0(i64 [[TMP1]], ptr elementtype(i64) [[PTR]])
 ; CHECK-NEXT:    [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP3]], 0
-; CHECK-NEXT:    br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
+; CHECK-NEXT:    br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]], !prof [[PROF1]]
 ; CHECK:       atomicrmw.end:
 ; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i64 [[TMP2]] to double
 ; CHECK-NEXT:    ret void
@@ -74,3 +74,17 @@ define void @atomic_swap_f64(ptr %ptr, double %val) nounwind {
   %t1 = atomicrmw xchg ptr %ptr, double %val acquire
   ret void
 }
+
+!0 = !{!"function_entry_count", i64 1000}
+;.
+; CHECK: attributes #[[ATTR0:[0-9]+]] = { nounwind }
+; CHECK: attributes #[[ATTR1:[0-9]+]] = { nofree nounwind willreturn }
+;.
+; OUTLINE-ATOMICS: attributes #[[ATTR0:[0-9]+]] = { "target-features"="+outline-atomics" }
+; OUTLINE-ATOMICS: attributes #[[ATTR1:[0-9]+]] = { nounwind "target-features"="+outline-atomics" }
+;.
+; CHECK: [[META0:![0-9]+]] = !{!"function_entry_count", i64 1000}
+; CHECK: [[PROF1]] = !{!"unknown", !"atomic-expand"}
+;.
+; OUTLINE-ATOMICS: [[META0:![0-9]+]] = !{!"function_entry_count", i64 1000}
+;.

Copy link
Member

@mtrofin mtrofin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title is wrong "icmp loop"?

@jinhuang1102 jinhuang1102 changed the title [profcheck] Add unknown branch weights to expand icmp loop. [profcheck] Add unknown branch weights to expand LL/SR loop. Nov 4, 2025
@jinhuang1102
Copy link
Contributor Author

Title is wrong "icmp loop"?

Rewrite the title and description. Thx!


Instruction *CondBr = Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);

// Atomic RMW expands to a icmp loop, because it is hard to predict precise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be updated.

@jinhuang1102 jinhuang1102 force-pushed the fixup/prof-missing-icmp-atomicop branch from c1bbb4e to 03aa99e Compare November 4, 2025 23:02
@jinhuang1102 jinhuang1102 merged commit fa5cd27 into llvm:main Nov 5, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants