-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[profcheck] Add unknown branch weights to expand LL/SR loop. #166273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[profcheck] Add unknown branch weights to expand LL/SR loop. #166273
Conversation
|
@llvm/pr-subscribers-llvm-transforms Author: Jin Huang (jinhuang1102) ChangesAs the followup PR of #165841, the AtomicExpandPass is lowering high-level atomic operations ( Full diff: https://github.com/llvm/llvm-project/pull/166273.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index 53f1cfe24a68d..c22f5ff9e444b 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -38,6 +38,7 @@
#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/Module.h"
+#include "llvm/IR/ProfDataUtils.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"
@@ -1259,8 +1260,7 @@ Value *AtomicExpandImpl::insertRMWLLSCLoop(
BasicBlock *BB = Builder.GetInsertBlock();
Function *F = BB->getParent();
- assert(AddrAlign >=
- F->getDataLayout().getTypeStoreSize(ResultTy) &&
+ assert(AddrAlign >= F->getDataLayout().getTypeStoreSize(ResultTy) &&
"Expected at least natural alignment at this point.");
// Given: atomicrmw some_op iN* %addr, iN %incr ordering
@@ -1295,7 +1295,13 @@ Value *AtomicExpandImpl::insertRMWLLSCLoop(
TLI->emitStoreConditional(Builder, NewVal, Addr, MemOpOrder);
Value *TryAgain = Builder.CreateICmpNE(
StoreSuccess, ConstantInt::get(IntegerType::get(Ctx, 32), 0), "tryagain");
- Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);
+
+ Instruction *CondBr = Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);
+
+ // Atomic RMW expands to a icmp loop, because it is hard to predict precise
+ // branch weigths we mark the branch as "unknown" (50/50) to prevent
+ // misleading optimizations.
+ setExplicitlyUnknownBranchWeightsIfProfiled(*CondBr, *F, DEBUG_TYPE);
Builder.SetInsertPoint(ExitBB, ExitBB->begin());
return Loaded;
diff --git a/llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll b/llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll
index 95a52aa0f7f52..b509b2469cfdc 100644
--- a/llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll
+++ b/llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll
@@ -1,8 +1,8 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals
; RUN: opt -codegen-opt-level=1 -S -mtriple=aarch64-- -passes=atomic-expand %s | FileCheck %s
; RUN: opt -codegen-opt-level=1 -S -mtriple=aarch64-- -mattr=+outline-atomics -passes=atomic-expand %s | FileCheck %s --check-prefix=OUTLINE-ATOMICS
-define void @atomic_swap_f16(ptr %ptr, half %val) nounwind {
+define void @atomic_swap_f16(ptr %ptr, half %val) !prof !0 {
; CHECK-LABEL: @atomic_swap_f16(
; CHECK-NEXT: [[TMP1:%.*]] = bitcast half [[VAL:%.*]] to i16
; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]
@@ -12,7 +12,7 @@ define void @atomic_swap_f16(ptr %ptr, half %val) nounwind {
; CHECK-NEXT: [[TMP4:%.*]] = zext i16 [[TMP1]] to i64
; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.aarch64.stxr.p0(i64 [[TMP4]], ptr elementtype(i16) [[PTR]])
; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP5]], 0
-; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
+; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]], !prof [[PROF1:![0-9]+]]
; CHECK: atomicrmw.end:
; CHECK-NEXT: [[TMP6:%.*]] = bitcast i16 [[TMP3]] to half
; CHECK-NEXT: ret void
@@ -27,7 +27,7 @@ define void @atomic_swap_f16(ptr %ptr, half %val) nounwind {
ret void
}
-define void @atomic_swap_f32(ptr %ptr, float %val) nounwind {
+define void @atomic_swap_f32(ptr %ptr, float %val) nounwind !prof !0 {
; CHECK-LABEL: @atomic_swap_f32(
; CHECK-NEXT: [[TMP1:%.*]] = bitcast float [[VAL:%.*]] to i32
; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]
@@ -37,7 +37,7 @@ define void @atomic_swap_f32(ptr %ptr, float %val) nounwind {
; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[TMP1]] to i64
; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.aarch64.stxr.p0(i64 [[TMP4]], ptr elementtype(i32) [[PTR]])
; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP5]], 0
-; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
+; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]], !prof [[PROF1]]
; CHECK: atomicrmw.end:
; CHECK-NEXT: [[TMP6:%.*]] = bitcast i32 [[TMP3]] to float
; CHECK-NEXT: ret void
@@ -52,7 +52,7 @@ define void @atomic_swap_f32(ptr %ptr, float %val) nounwind {
ret void
}
-define void @atomic_swap_f64(ptr %ptr, double %val) nounwind {
+define void @atomic_swap_f64(ptr %ptr, double %val) nounwind !prof !0 {
; CHECK-LABEL: @atomic_swap_f64(
; CHECK-NEXT: [[TMP1:%.*]] = bitcast double [[VAL:%.*]] to i64
; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]
@@ -60,7 +60,7 @@ define void @atomic_swap_f64(ptr %ptr, double %val) nounwind {
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.aarch64.ldaxr.p0(ptr elementtype(i64) [[PTR:%.*]])
; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.aarch64.stxr.p0(i64 [[TMP1]], ptr elementtype(i64) [[PTR]])
; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP3]], 0
-; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
+; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]], !prof [[PROF1]]
; CHECK: atomicrmw.end:
; CHECK-NEXT: [[TMP4:%.*]] = bitcast i64 [[TMP2]] to double
; CHECK-NEXT: ret void
@@ -74,3 +74,17 @@ define void @atomic_swap_f64(ptr %ptr, double %val) nounwind {
%t1 = atomicrmw xchg ptr %ptr, double %val acquire
ret void
}
+
+!0 = !{!"function_entry_count", i64 1000}
+;.
+; CHECK: attributes #[[ATTR0:[0-9]+]] = { nounwind }
+; CHECK: attributes #[[ATTR1:[0-9]+]] = { nofree nounwind willreturn }
+;.
+; OUTLINE-ATOMICS: attributes #[[ATTR0:[0-9]+]] = { "target-features"="+outline-atomics" }
+; OUTLINE-ATOMICS: attributes #[[ATTR1:[0-9]+]] = { nounwind "target-features"="+outline-atomics" }
+;.
+; CHECK: [[META0:![0-9]+]] = !{!"function_entry_count", i64 1000}
+; CHECK: [[PROF1]] = !{!"unknown", !"atomic-expand"}
+;.
+; OUTLINE-ATOMICS: [[META0:![0-9]+]] = !{!"function_entry_count", i64 1000}
+;.
|
mtrofin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title is wrong "icmp loop"?
Rewrite the title and description. Thx! |
|
|
||
| Instruction *CondBr = Builder.CreateCondBr(TryAgain, LoopBB, ExitBB); | ||
|
|
||
| // Atomic RMW expands to a icmp loop, because it is hard to predict precise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be updated.
c1bbb4e to
03aa99e
Compare
As a follow-up to PR#165841, this change addresses
prof_mdmetadata loss in AtomicExpandPass when loweringatomicrmw xchgto a Load-Linked/Store-Exclusive (LL/SC) loop.This path is distinct from the LSE path addressed previously:
PR #165841 (and its tests) used
-mtriple=aarch64-linux-gnu, which targets a modern ARMv8.1+ architecture. This architecture supports Large System Extensions (LSE), allowingatomicrmwto be lowered directly to a more efficient hardware instruction.This PR (and its tests) uses
-mtriple=aarch64--or-mtriple=armv8-linux-gnueabihf. This indicates anARMv8.0 or lower architecture that does not support LSE. On these targets, the pass must fall back to synthesizing a manual LL/SC loop using theldaxr/stxrinstruction pair.Similar to previous issue, the new conditional branch was failin to inherit the
prof_mdmetadata. Theis PR correctly fix the branch weights to the newly created branch within the LL/SC loop, ensuring profile information is preserved.This PR includes fixes for the following tests: