[clang] Emit @llvm.assume before streaming_compatible functions when the streaming mode is known #121917

NickGuy-Arm · 2025-01-07T11:23:40Z

If the streaming mode of the calling function is known, emitting @llvm.assume before calls to __arm_streaming_compatible functions allows other passes, such as the partial inliner, to make more informed decisions on what to do. In the case of the partial inliner, this allows for only the live code paths to be inlined, and leaves the other code paths to be removed as part of dead code elimination.

…tion

llvmbot · 2025-01-07T11:24:18Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-clang-codegen

Author: Nicholas Guy (NickGuy-Arm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/121917.diff

2 Files Affected:

(modified) clang/lib/CodeGen/CGBuiltin.cpp (+6)
(modified) clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c (+22-13)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index dcea32969fb990..3765285c58f6ca 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -11335,6 +11335,12 @@ Value *CodeGenFunction::EmitAArch64SMEBuiltinExpr(unsigned BuiltinID,
       unsigned SMEAttrs = FPT->getAArch64SMEAttributes();
       if (!(SMEAttrs & FunctionType::SME_PStateSMCompatibleMask)) {
         bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;
+        // Emit the llvm.assume intrinsic so that called functions can use the
+        // streaming mode information discerned here
+        Value* call = Builder.CreateCall(CGM.getIntrinsic(Builtin->LLVMIntrinsic));
+        if (!IsStreaming)
+          call = Builder.CreateNot(call);
+        Builder.CreateIntrinsic(Intrinsic::assume, {}, {call});
         return ConstantInt::getBool(Builder.getContext(), IsStreaming);
       }
     }
diff --git a/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c b/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
index 72f2d17fc6dc11..1e630e196fcb66 100644
--- a/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
+++ b/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
@@ -22,23 +22,32 @@ bool test_in_streaming_mode_streaming_compatible(void) __arm_streaming_compatibl
 
 // CHECK-LABEL: @test_in_streaming_mode_streaming(
 // CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP0]])
 // CHECK-NEXT:    ret i1 true
 //
 // CPP-CHECK-LABEL: @_Z32test_in_streaming_mode_streamingv(
 // CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CPP-CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP0]])
 // CPP-CHECK-NEXT:    ret i1 true
 //
 bool test_in_streaming_mode_streaming(void) __arm_streaming {
-//
   return __arm_in_streaming_mode();
 }
 
 // CHECK-LABEL: @test_in_streaming_mode_non_streaming(
 // CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CHECK-NEXT:    [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP1]])
 // CHECK-NEXT:    ret i1 false
 //
 // CPP-CHECK-LABEL: @_Z36test_in_streaming_mode_non_streamingv(
 // CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CPP-CHECK-NEXT:    [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CPP-CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP1]])
 // CPP-CHECK-NEXT:    ret i1 false
 //
 bool test_in_streaming_mode_non_streaming(void) {
@@ -47,12 +56,12 @@ bool test_in_streaming_mode_non_streaming(void) {
 
 // CHECK-LABEL: @test_za_disable(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR7:[0-9]+]]
+// CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR8:[0-9]+]]
 // CHECK-NEXT:    ret void
 //
 // CPP-CHECK-LABEL: @_Z15test_za_disablev(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR7:[0-9]+]]
+// CPP-CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR8:[0-9]+]]
 // CPP-CHECK-NEXT:    ret void
 //
 void test_za_disable(void) __arm_streaming_compatible {
@@ -61,14 +70,14 @@ void test_za_disable(void) __arm_streaming_compatible {
 
 // CHECK-LABEL: @test_has_sme(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR7]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR8]]
 // CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
 // CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
 // CHECK-NEXT:    ret i1 [[TOBOOL_I]]
 //
 // CPP-CHECK-LABEL: @_Z12test_has_smev(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR8]]
 // CPP-CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
 // CPP-CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
 // CPP-CHECK-NEXT:    ret i1 [[TOBOOL_I]]
@@ -91,12 +100,12 @@ void test_svundef_za(void) __arm_streaming_compatible __arm_out("za") {
 
 // CHECK-LABEL: @test_sc_memcpy(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CHECK-NEXT:    ret ptr [[CALL]]
 //
 // CPP-CHECK-LABEL: @_Z14test_sc_memcpyPvPKvm(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CPP-CHECK-NEXT:    ret ptr [[CALL]]
 //
 void *test_sc_memcpy(void *dest, const void *src, size_t n) __arm_streaming_compatible {
@@ -105,12 +114,12 @@ void *test_sc_memcpy(void *dest, const void *src, size_t n) __arm_streaming_comp
 
 // CHECK-LABEL: @test_sc_memmove(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CHECK-NEXT:    ret ptr [[CALL]]
 //
 // CPP-CHECK-LABEL: @_Z15test_sc_memmovePvPKvm(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CPP-CHECK-NEXT:    ret ptr [[CALL]]
 //
 void *test_sc_memmove(void *dest, const void *src, size_t n) __arm_streaming_compatible {
@@ -119,12 +128,12 @@ void *test_sc_memmove(void *dest, const void *src, size_t n) __arm_streaming_com
 
 // CHECK-LABEL: @test_sc_memset(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CHECK-NEXT:    ret ptr [[CALL]]
 //
 // CPP-CHECK-LABEL: @_Z14test_sc_memsetPvim(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CPP-CHECK-NEXT:    ret ptr [[CALL]]
 //
 void *test_sc_memset(void *s, int c, size_t n) __arm_streaming_compatible {
@@ -133,12 +142,12 @@ void *test_sc_memset(void *s, int c, size_t n) __arm_streaming_compatible {
 
 // CHECK-LABEL: @test_sc_memchr(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CHECK-NEXT:    ret ptr [[CALL]]
 //
 // CPP-CHECK-LABEL: @_Z14test_sc_memchrPvim(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CPP-CHECK-NEXT:    ret ptr [[CALL]]
 //
 void *test_sc_memchr(void *s, int c, size_t n) __arm_streaming_compatible {

llvmbot · 2025-01-07T11:24:18Z

@llvm/pr-subscribers-clang

Author: Nicholas Guy (NickGuy-Arm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/121917.diff

2 Files Affected:

(modified) clang/lib/CodeGen/CGBuiltin.cpp (+6)
(modified) clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c (+22-13)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index dcea32969fb990..3765285c58f6ca 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -11335,6 +11335,12 @@ Value *CodeGenFunction::EmitAArch64SMEBuiltinExpr(unsigned BuiltinID,
       unsigned SMEAttrs = FPT->getAArch64SMEAttributes();
       if (!(SMEAttrs & FunctionType::SME_PStateSMCompatibleMask)) {
         bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;
+        // Emit the llvm.assume intrinsic so that called functions can use the
+        // streaming mode information discerned here
+        Value* call = Builder.CreateCall(CGM.getIntrinsic(Builtin->LLVMIntrinsic));
+        if (!IsStreaming)
+          call = Builder.CreateNot(call);
+        Builder.CreateIntrinsic(Intrinsic::assume, {}, {call});
         return ConstantInt::getBool(Builder.getContext(), IsStreaming);
       }
     }
diff --git a/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c b/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
index 72f2d17fc6dc11..1e630e196fcb66 100644
--- a/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
+++ b/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
@@ -22,23 +22,32 @@ bool test_in_streaming_mode_streaming_compatible(void) __arm_streaming_compatibl
 
 // CHECK-LABEL: @test_in_streaming_mode_streaming(
 // CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP0]])
 // CHECK-NEXT:    ret i1 true
 //
 // CPP-CHECK-LABEL: @_Z32test_in_streaming_mode_streamingv(
 // CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CPP-CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP0]])
 // CPP-CHECK-NEXT:    ret i1 true
 //
 bool test_in_streaming_mode_streaming(void) __arm_streaming {
-//
   return __arm_in_streaming_mode();
 }
 
 // CHECK-LABEL: @test_in_streaming_mode_non_streaming(
 // CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CHECK-NEXT:    [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP1]])
 // CHECK-NEXT:    ret i1 false
 //
 // CPP-CHECK-LABEL: @_Z36test_in_streaming_mode_non_streamingv(
 // CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CPP-CHECK-NEXT:    [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CPP-CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP1]])
 // CPP-CHECK-NEXT:    ret i1 false
 //
 bool test_in_streaming_mode_non_streaming(void) {
@@ -47,12 +56,12 @@ bool test_in_streaming_mode_non_streaming(void) {
 
 // CHECK-LABEL: @test_za_disable(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR7:[0-9]+]]
+// CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR8:[0-9]+]]
 // CHECK-NEXT:    ret void
 //
 // CPP-CHECK-LABEL: @_Z15test_za_disablev(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR7:[0-9]+]]
+// CPP-CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR8:[0-9]+]]
 // CPP-CHECK-NEXT:    ret void
 //
 void test_za_disable(void) __arm_streaming_compatible {
@@ -61,14 +70,14 @@ void test_za_disable(void) __arm_streaming_compatible {
 
 // CHECK-LABEL: @test_has_sme(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR7]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR8]]
 // CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
 // CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
 // CHECK-NEXT:    ret i1 [[TOBOOL_I]]
 //
 // CPP-CHECK-LABEL: @_Z12test_has_smev(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR8]]
 // CPP-CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
 // CPP-CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
 // CPP-CHECK-NEXT:    ret i1 [[TOBOOL_I]]
@@ -91,12 +100,12 @@ void test_svundef_za(void) __arm_streaming_compatible __arm_out("za") {
 
 // CHECK-LABEL: @test_sc_memcpy(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CHECK-NEXT:    ret ptr [[CALL]]
 //
 // CPP-CHECK-LABEL: @_Z14test_sc_memcpyPvPKvm(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CPP-CHECK-NEXT:    ret ptr [[CALL]]
 //
 void *test_sc_memcpy(void *dest, const void *src, size_t n) __arm_streaming_compatible {
@@ -105,12 +114,12 @@ void *test_sc_memcpy(void *dest, const void *src, size_t n) __arm_streaming_comp
 
 // CHECK-LABEL: @test_sc_memmove(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CHECK-NEXT:    ret ptr [[CALL]]
 //
 // CPP-CHECK-LABEL: @_Z15test_sc_memmovePvPKvm(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CPP-CHECK-NEXT:    ret ptr [[CALL]]
 //
 void *test_sc_memmove(void *dest, const void *src, size_t n) __arm_streaming_compatible {
@@ -119,12 +128,12 @@ void *test_sc_memmove(void *dest, const void *src, size_t n) __arm_streaming_com
 
 // CHECK-LABEL: @test_sc_memset(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CHECK-NEXT:    ret ptr [[CALL]]
 //
 // CPP-CHECK-LABEL: @_Z14test_sc_memsetPvim(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CPP-CHECK-NEXT:    ret ptr [[CALL]]
 //
 void *test_sc_memset(void *s, int c, size_t n) __arm_streaming_compatible {
@@ -133,12 +142,12 @@ void *test_sc_memset(void *s, int c, size_t n) __arm_streaming_compatible {
 
 // CHECK-LABEL: @test_sc_memchr(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CHECK-NEXT:    ret ptr [[CALL]]
 //
 // CPP-CHECK-LABEL: @_Z14test_sc_memchrPvim(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
 // CPP-CHECK-NEXT:    ret ptr [[CALL]]
 //
 void *test_sc_memchr(void *s, int c, size_t n) __arm_streaming_compatible {

github-actions · 2025-01-07T11:27:04Z

✅ With the latest revision this PR passed the C/C++ code formatter.

sdesmalen-arm · 2025-01-07T12:02:06Z

clang/lib/CodeGen/CGBuiltin.cpp

+            Builder.CreateCall(CGM.getIntrinsic(Builtin->LLVMIntrinsic));
+        if (!IsStreaming)
+          call = Builder.CreateNot(call);
+        Builder.CreateIntrinsic(Intrinsic::assume, {}, {call});


Emitting the llvm.assume only for calls to __arm_in_streaming_mode() only has any effect if it is followed by a call to a streaming-compatible function.

To make this more useful generically, I think it needs to be emitted before any call from a streaming- or non-streaming function, to a streaming-compatible function. Then the partial-inliner, ipsccp pass, or function specialization pass can use the information to specialize or inline the callee.

efriedma-quic · 2025-01-07T18:52:32Z

I really don't see why we'd want to implement the optimization this way; can't we just add an instcombine for calls to llvm.aarch64.sme.in.streaming.mode?

NickGuy-Arm · 2025-01-08T05:59:53Z

I really don't see why we'd want to implement the optimization this way; can't we just add an instcombine for calls to llvm.aarch64.sme.in.streaming.mode?

If what this patch initially did was our actual goal, then that would be an option. However what we want to achieve with this is to make the state of the streaming mode known to potentially inlineable called functions, allowing for further inlining and identification/removal of dead code. Ive fixed the implementation to actually achieve this, and I'll update the description with a bit more context

sdesmalen-arm · 2025-01-08T12:15:50Z

@efriedma-quic a motivating example is https://godbolt.org/z/aW9zTrdf9

With @NickGuy-Arm's patch, this would compile to:

_Z1xv:
        rdvl    x0, #1
        ret

_Z1yv:
        rdsvl   x0, #1
        ret

sdesmalen-arm

@NickGuy-Arm this PR is missing some test-coverage.

@efriedma-quic do you have any thoughts on emitting the llvm.assume as a prologue to the actual call? I guess the alternative would be to analyse the function to see if there's any calls to streaming-compatible functions, and then emit the llvm.assume in the function's prologue or something, but that would be more work (and I'm not sure if that's worth it).

sdesmalen-arm · 2025-01-08T12:23:55Z

clang/lib/CodeGen/Targets/AArch64.cpp

+  const AArch64ABIInfo &ABIInfo = getABIInfo<AArch64ABIInfo>();
+  const TargetInfo &TI = ABIInfo.getContext().getTargetInfo();
+
+  if (!TI.hasFeature("sme"))
+    return;
+


Is this check really necessary?

sdesmalen-arm · 2025-01-08T12:24:05Z

clang/lib/CodeGen/Targets/AArch64.cpp

+  if (!TI.hasFeature("sme"))
+    return;
+
+  if (!Callee || !isStreamingCompatible(Callee))


Can Caller be nullptr too?

sdesmalen-arm · 2025-01-08T12:26:07Z

clang/lib/CodeGen/Targets/AArch64.cpp

+  if (const auto *FPT = Caller->getType()->getAs<FunctionProtoType>()) {
+    unsigned SMEAttrs = FPT->getAArch64SMEAttributes();
+    if (!(SMEAttrs & FunctionType::SME_PStateSMCompatibleMask)) {
+      bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;


nit:

Suggested change

bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;

bool CallerIsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;

efriedma-quic · 2025-01-08T19:18:18Z

However what we want to achieve with this is to make the state of the streaming mode known to potentially inlineable called functions, allowing for further inlining and identification/removal of dead code

It seems like we're in agreement that the llvm.assume doesn't actually add any information to the IR that the optimizer can't figure out itself. It sounds like the issue here is that you don't just need the call to be folded; you need the InlineCost evaluator to constant-fold the call? That makes things a little more complicated, but it still seems solvable...

llvm.assume handling is relatively expensive, so we don't want to scatter assume calls across emitted code when it isn't necessary.

efriedma-quic · 2025-01-08T19:37:35Z

Also, I'm not sure it's correct to mark int_aarch64_sme_in_streaming_mode IntrNoMem. That implies the result doesn't depend on any modifiable state, which isn't true because the result depends on whether the caller is in streaming mode. (I think inlining is suppressed for calls between streaming and non-streaming functions, but other interprocedural optimizations could get confused.)

aemerson · 2025-01-08T22:44:08Z

However what we want to achieve with this is to make the state of the streaming mode known to potentially inlineable called functions, allowing for further inlining and identification/removal of dead code

It seems like we're in agreement that the llvm.assume doesn't actually add any information to the IR that the optimizer can't figure out itself. It sounds like the issue here is that you don't just need the call to be folded; you need the InlineCost evaluator to constant-fold the call? That makes things a little more complicated, but it still seems solvable...

llvm.assume handling is relatively expensive, so we don't want to scatter assume calls across emitted code when it isn't necessary.

This. Assume calls and emitting code before call sites seems an awfully big hammer for this kind of problem. We should try to use existing mechanisms if we can first.

sdesmalen-arm · 2025-01-10T15:13:58Z

The assumption cache mechanism is used by a number of passes, such as [partial] inlining, function specialization and IPSCCP (interprocedural sparse conditional constant propagation).

The idea behind doing this is to let optimizations iteratively apply knowledge about the streaming-mode of the caller when analyzing or optimizing a callee. If we'd move this functionality to e.g. constant-folding, then we can only apply this knowledge on functions where the streaming-mode is already known. If we'd combine the intrinsic in the instcombine pass, then this is combined only once, instead of iteratively by a pass that needs the information while e.g. inlining.

The idea of using llvm.assume is to give the compiler more knowledge when trying to inline or specialize a function. For example, if the callee has a if (__arm_in_streaming_mode()) { ... } branch, then the cost-model could infer that this is always/never executed, depending on mode of the call-site, which changes the cost/decision on whether or not to inline. IIRC, when it inlines the IR cloner also simplifies the code based on the assumption cache.

I understand that emitting this on a per-call basis is a bit of a hammer. I had expected/hoped that LLVM passes would hoist/CSE the multiple llvm.assume's to reduce the number of these intrinsics (because the value of the condition is constant inside the function). Am I correct in thinking that emitting an llvm.assume once in the entry block of a function is not a problem?

NickGuy-Arm · 2025-01-16T14:07:44Z

The assumption cache mechanism is used by a number of passes...

@efriedma-quic @aemerson ping

I'm not sure it's correct to mark int_aarch64_sme_in_streaming_mode IntrNoMem...

That's fair, I had mistakenly assumed IntrNoMem to mean "I don't touch memory". I'll put up a separate PR to change it to use IntrReadMem shortly.

efriedma-quic · 2025-01-16T17:13:17Z

The two issues are sort of tied together: if the intrinsic isn't IntrNoMem, I suspect the assumption cache stops cooperating.

But anyway, I guess the question is really whether we want these constructs represented explicitly in IR, to try to leverage the AssumptionCache, or if we want to instead add a TTI hook for transform passes to query for the information in question. My default is not to use assumptions; llvm.assume calls still cause issues for optimizations, and they're not really necessary here. (I think the situation around llvm.assume has improved to some extent, but it's still a bit problematic.)

Even if we do decide we want assumptions, we likely want to add them in an IR pass, instead of in clang, so that other frontends don't need to duplicate this code.

[clang] Emit @llvm.assume when we know the streaming mode of the func…

27e9773

…tion

NickGuy-Arm requested a review from sdesmalen-arm January 7, 2025 11:23

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 7, 2025

Format

7cc6750

sdesmalen-arm reviewed Jan 7, 2025

View reviewed changes

Add hook to allow targets to emit additional IR at call sites

ac671ca

llvmbot added the backend:AArch64 label Jan 8, 2025

NickGuy-Arm changed the title ~~[clang] Emit @llvm.assume when we know the streaming mode of the function~~ [clang] Emit @llvm.assume before streaming_compatible functions when the streaming mode is known Jan 8, 2025

sdesmalen-arm requested a review from efriedma-quic January 8, 2025 12:29

sdesmalen-arm reviewed Jan 8, 2025

View reviewed changes

	bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;
	bool CallerIsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;

[clang] Emit @llvm.assume before streaming_compatible functions when the streaming mode is known #121917

Are you sure you want to change the base?

[clang] Emit @llvm.assume before streaming_compatible functions when the streaming mode is known #121917

Uh oh!

Conversation

NickGuy-Arm commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdesmalen-arm Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

efriedma-quic commented Jan 7, 2025

Uh oh!

NickGuy-Arm commented Jan 8, 2025

Uh oh!

sdesmalen-arm commented Jan 8, 2025

Uh oh!

sdesmalen-arm left a comment

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

efriedma-quic commented Jan 8, 2025

Uh oh!

efriedma-quic commented Jan 8, 2025

Uh oh!

aemerson commented Jan 8, 2025

Uh oh!

sdesmalen-arm commented Jan 10, 2025

Uh oh!

NickGuy-Arm commented Jan 16, 2025

Uh oh!

efriedma-quic commented Jan 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

NickGuy-Arm commented Jan 7, 2025 •

edited

Loading

llvmbot commented Jan 7, 2025 •

edited

Loading

github-actions bot commented Jan 7, 2025 •

edited

Loading