-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[clang] Emit @llvm.assume before streaming_compatible functions when the streaming mode is known #121917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-clang-codegen Author: Nicholas Guy (NickGuy-Arm) ChangesFull diff: https://github.com/llvm/llvm-project/pull/121917.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index dcea32969fb990..3765285c58f6ca 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -11335,6 +11335,12 @@ Value *CodeGenFunction::EmitAArch64SMEBuiltinExpr(unsigned BuiltinID,
unsigned SMEAttrs = FPT->getAArch64SMEAttributes();
if (!(SMEAttrs & FunctionType::SME_PStateSMCompatibleMask)) {
bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;
+ // Emit the llvm.assume intrinsic so that called functions can use the
+ // streaming mode information discerned here
+ Value* call = Builder.CreateCall(CGM.getIntrinsic(Builtin->LLVMIntrinsic));
+ if (!IsStreaming)
+ call = Builder.CreateNot(call);
+ Builder.CreateIntrinsic(Intrinsic::assume, {}, {call});
return ConstantInt::getBool(Builder.getContext(), IsStreaming);
}
}
diff --git a/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c b/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
index 72f2d17fc6dc11..1e630e196fcb66 100644
--- a/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
+++ b/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
@@ -22,23 +22,32 @@ bool test_in_streaming_mode_streaming_compatible(void) __arm_streaming_compatibl
// CHECK-LABEL: @test_in_streaming_mode_streaming(
// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP0]])
// CHECK-NEXT: ret i1 true
//
// CPP-CHECK-LABEL: @_Z32test_in_streaming_mode_streamingv(
// CPP-CHECK-NEXT: entry:
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CPP-CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP0]])
// CPP-CHECK-NEXT: ret i1 true
//
bool test_in_streaming_mode_streaming(void) __arm_streaming {
-//
return __arm_in_streaming_mode();
}
// CHECK-LABEL: @test_in_streaming_mode_non_streaming(
// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CHECK-NEXT: [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP1]])
// CHECK-NEXT: ret i1 false
//
// CPP-CHECK-LABEL: @_Z36test_in_streaming_mode_non_streamingv(
// CPP-CHECK-NEXT: entry:
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CPP-CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP1]])
// CPP-CHECK-NEXT: ret i1 false
//
bool test_in_streaming_mode_non_streaming(void) {
@@ -47,12 +56,12 @@ bool test_in_streaming_mode_non_streaming(void) {
// CHECK-LABEL: @test_za_disable(
// CHECK-NEXT: entry:
-// CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR7:[0-9]+]]
+// CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR8:[0-9]+]]
// CHECK-NEXT: ret void
//
// CPP-CHECK-LABEL: @_Z15test_za_disablev(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR7:[0-9]+]]
+// CPP-CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR8:[0-9]+]]
// CPP-CHECK-NEXT: ret void
//
void test_za_disable(void) __arm_streaming_compatible {
@@ -61,14 +70,14 @@ void test_za_disable(void) __arm_streaming_compatible {
// CHECK-LABEL: @test_has_sme(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR7]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR8]]
// CHECK-NEXT: [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
// CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
// CHECK-NEXT: ret i1 [[TOBOOL_I]]
//
// CPP-CHECK-LABEL: @_Z12test_has_smev(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR7]]
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR8]]
// CPP-CHECK-NEXT: [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
// CPP-CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
// CPP-CHECK-NEXT: ret i1 [[TOBOOL_I]]
@@ -91,12 +100,12 @@ void test_svundef_za(void) __arm_streaming_compatible __arm_out("za") {
// CHECK-LABEL: @test_sc_memcpy(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CHECK-NEXT: ret ptr [[CALL]]
//
// CPP-CHECK-LABEL: @_Z14test_sc_memcpyPvPKvm(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CPP-CHECK-NEXT: ret ptr [[CALL]]
//
void *test_sc_memcpy(void *dest, const void *src, size_t n) __arm_streaming_compatible {
@@ -105,12 +114,12 @@ void *test_sc_memcpy(void *dest, const void *src, size_t n) __arm_streaming_comp
// CHECK-LABEL: @test_sc_memmove(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CHECK-NEXT: ret ptr [[CALL]]
//
// CPP-CHECK-LABEL: @_Z15test_sc_memmovePvPKvm(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CPP-CHECK-NEXT: ret ptr [[CALL]]
//
void *test_sc_memmove(void *dest, const void *src, size_t n) __arm_streaming_compatible {
@@ -119,12 +128,12 @@ void *test_sc_memmove(void *dest, const void *src, size_t n) __arm_streaming_com
// CHECK-LABEL: @test_sc_memset(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CHECK-NEXT: ret ptr [[CALL]]
//
// CPP-CHECK-LABEL: @_Z14test_sc_memsetPvim(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CPP-CHECK-NEXT: ret ptr [[CALL]]
//
void *test_sc_memset(void *s, int c, size_t n) __arm_streaming_compatible {
@@ -133,12 +142,12 @@ void *test_sc_memset(void *s, int c, size_t n) __arm_streaming_compatible {
// CHECK-LABEL: @test_sc_memchr(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CHECK-NEXT: ret ptr [[CALL]]
//
// CPP-CHECK-LABEL: @_Z14test_sc_memchrPvim(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CPP-CHECK-NEXT: ret ptr [[CALL]]
//
void *test_sc_memchr(void *s, int c, size_t n) __arm_streaming_compatible {
|
|
@llvm/pr-subscribers-clang Author: Nicholas Guy (NickGuy-Arm) ChangesFull diff: https://github.com/llvm/llvm-project/pull/121917.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index dcea32969fb990..3765285c58f6ca 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -11335,6 +11335,12 @@ Value *CodeGenFunction::EmitAArch64SMEBuiltinExpr(unsigned BuiltinID,
unsigned SMEAttrs = FPT->getAArch64SMEAttributes();
if (!(SMEAttrs & FunctionType::SME_PStateSMCompatibleMask)) {
bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask;
+ // Emit the llvm.assume intrinsic so that called functions can use the
+ // streaming mode information discerned here
+ Value* call = Builder.CreateCall(CGM.getIntrinsic(Builtin->LLVMIntrinsic));
+ if (!IsStreaming)
+ call = Builder.CreateNot(call);
+ Builder.CreateIntrinsic(Intrinsic::assume, {}, {call});
return ConstantInt::getBool(Builder.getContext(), IsStreaming);
}
}
diff --git a/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c b/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
index 72f2d17fc6dc11..1e630e196fcb66 100644
--- a/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
+++ b/clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_state_funs.c
@@ -22,23 +22,32 @@ bool test_in_streaming_mode_streaming_compatible(void) __arm_streaming_compatibl
// CHECK-LABEL: @test_in_streaming_mode_streaming(
// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP0]])
// CHECK-NEXT: ret i1 true
//
// CPP-CHECK-LABEL: @_Z32test_in_streaming_mode_streamingv(
// CPP-CHECK-NEXT: entry:
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CPP-CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP0]])
// CPP-CHECK-NEXT: ret i1 true
//
bool test_in_streaming_mode_streaming(void) __arm_streaming {
-//
return __arm_in_streaming_mode();
}
// CHECK-LABEL: @test_in_streaming_mode_non_streaming(
// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CHECK-NEXT: [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP1]])
// CHECK-NEXT: ret i1 false
//
// CPP-CHECK-LABEL: @_Z36test_in_streaming_mode_non_streamingv(
// CPP-CHECK-NEXT: entry:
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call i1 @llvm.aarch64.sme.in.streaming.mode()
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CPP-CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP1]])
// CPP-CHECK-NEXT: ret i1 false
//
bool test_in_streaming_mode_non_streaming(void) {
@@ -47,12 +56,12 @@ bool test_in_streaming_mode_non_streaming(void) {
// CHECK-LABEL: @test_za_disable(
// CHECK-NEXT: entry:
-// CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR7:[0-9]+]]
+// CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR8:[0-9]+]]
// CHECK-NEXT: ret void
//
// CPP-CHECK-LABEL: @_Z15test_za_disablev(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR7:[0-9]+]]
+// CPP-CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR8:[0-9]+]]
// CPP-CHECK-NEXT: ret void
//
void test_za_disable(void) __arm_streaming_compatible {
@@ -61,14 +70,14 @@ void test_za_disable(void) __arm_streaming_compatible {
// CHECK-LABEL: @test_has_sme(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR7]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR8]]
// CHECK-NEXT: [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
// CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
// CHECK-NEXT: ret i1 [[TOBOOL_I]]
//
// CPP-CHECK-LABEL: @_Z12test_has_smev(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR7]]
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR8]]
// CPP-CHECK-NEXT: [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
// CPP-CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
// CPP-CHECK-NEXT: ret i1 [[TOBOOL_I]]
@@ -91,12 +100,12 @@ void test_svundef_za(void) __arm_streaming_compatible __arm_out("za") {
// CHECK-LABEL: @test_sc_memcpy(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CHECK-NEXT: ret ptr [[CALL]]
//
// CPP-CHECK-LABEL: @_Z14test_sc_memcpyPvPKvm(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memcpy(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CPP-CHECK-NEXT: ret ptr [[CALL]]
//
void *test_sc_memcpy(void *dest, const void *src, size_t n) __arm_streaming_compatible {
@@ -105,12 +114,12 @@ void *test_sc_memcpy(void *dest, const void *src, size_t n) __arm_streaming_comp
// CHECK-LABEL: @test_sc_memmove(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CHECK-NEXT: ret ptr [[CALL]]
//
// CPP-CHECK-LABEL: @_Z15test_sc_memmovePvPKvm(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memmove(ptr noundef [[DEST:%.*]], ptr noundef [[SRC:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CPP-CHECK-NEXT: ret ptr [[CALL]]
//
void *test_sc_memmove(void *dest, const void *src, size_t n) __arm_streaming_compatible {
@@ -119,12 +128,12 @@ void *test_sc_memmove(void *dest, const void *src, size_t n) __arm_streaming_com
// CHECK-LABEL: @test_sc_memset(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CHECK-NEXT: ret ptr [[CALL]]
//
// CPP-CHECK-LABEL: @_Z14test_sc_memsetPvim(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memset(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CPP-CHECK-NEXT: ret ptr [[CALL]]
//
void *test_sc_memset(void *s, int c, size_t n) __arm_streaming_compatible {
@@ -133,12 +142,12 @@ void *test_sc_memset(void *s, int c, size_t n) __arm_streaming_compatible {
// CHECK-LABEL: @test_sc_memchr(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CHECK-NEXT: ret ptr [[CALL]]
//
// CPP-CHECK-LABEL: @_Z14test_sc_memchrPvim(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR7]]
+// CPP-CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__arm_sc_memchr(ptr noundef [[S:%.*]], i32 noundef [[C:%.*]], i64 noundef [[N:%.*]]) #[[ATTR8]]
// CPP-CHECK-NEXT: ret ptr [[CALL]]
//
void *test_sc_memchr(void *s, int c, size_t n) __arm_streaming_compatible {
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
| Builder.CreateCall(CGM.getIntrinsic(Builtin->LLVMIntrinsic)); | ||
| if (!IsStreaming) | ||
| call = Builder.CreateNot(call); | ||
| Builder.CreateIntrinsic(Intrinsic::assume, {}, {call}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emitting the llvm.assume only for calls to __arm_in_streaming_mode() only has any effect if it is followed by a call to a streaming-compatible function.
To make this more useful generically, I think it needs to be emitted before any call from a streaming- or non-streaming function, to a streaming-compatible function. Then the partial-inliner, ipsccp pass, or function specialization pass can use the information to specialize or inline the callee.
|
I really don't see why we'd want to implement the optimization this way; can't we just add an instcombine for calls to |
If what this patch initially did was our actual goal, then that would be an option. However what we want to achieve with this is to make the state of the streaming mode known to potentially inlineable called functions, allowing for further inlining and identification/removal of dead code. Ive fixed the implementation to actually achieve this, and I'll update the description with a bit more context |
|
@efriedma-quic a motivating example is https://godbolt.org/z/aW9zTrdf9 With @NickGuy-Arm's patch, this would compile to: |
sdesmalen-arm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickGuy-Arm this PR is missing some test-coverage.
@efriedma-quic do you have any thoughts on emitting the llvm.assume as a prologue to the actual call? I guess the alternative would be to analyse the function to see if there's any calls to streaming-compatible functions, and then emit the llvm.assume in the function's prologue or something, but that would be more work (and I'm not sure if that's worth it).
| const AArch64ABIInfo &ABIInfo = getABIInfo<AArch64ABIInfo>(); | ||
| const TargetInfo &TI = ABIInfo.getContext().getTargetInfo(); | ||
|
|
||
| if (!TI.hasFeature("sme")) | ||
| return; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check really necessary?
| if (!TI.hasFeature("sme")) | ||
| return; | ||
|
|
||
| if (!Callee || !isStreamingCompatible(Callee)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can Caller be nullptr too?
| if (const auto *FPT = Caller->getType()->getAs<FunctionProtoType>()) { | ||
| unsigned SMEAttrs = FPT->getAArch64SMEAttributes(); | ||
| if (!(SMEAttrs & FunctionType::SME_PStateSMCompatibleMask)) { | ||
| bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| bool IsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask; | |
| bool CallerIsStreaming = SMEAttrs & FunctionType::SME_PStateSMEnabledMask; |
It seems like we're in agreement that the llvm.assume doesn't actually add any information to the IR that the optimizer can't figure out itself. It sounds like the issue here is that you don't just need the call to be folded; you need the InlineCost evaluator to constant-fold the call? That makes things a little more complicated, but it still seems solvable... llvm.assume handling is relatively expensive, so we don't want to scatter assume calls across emitted code when it isn't necessary. |
|
Also, I'm not sure it's correct to mark int_aarch64_sme_in_streaming_mode IntrNoMem. That implies the result doesn't depend on any modifiable state, which isn't true because the result depends on whether the caller is in streaming mode. (I think inlining is suppressed for calls between streaming and non-streaming functions, but other interprocedural optimizations could get confused.) |
This. Assume calls and emitting code before call sites seems an awfully big hammer for this kind of problem. We should try to use existing mechanisms if we can first. |
|
The assumption cache mechanism is used by a number of passes, such as [partial] inlining, function specialization and IPSCCP (interprocedural sparse conditional constant propagation). The idea behind doing this is to let optimizations iteratively apply knowledge about the streaming-mode of the caller when analyzing or optimizing a callee. If we'd move this functionality to e.g. constant-folding, then we can only apply this knowledge on functions where the streaming-mode is already known. If we'd combine the intrinsic in the instcombine pass, then this is combined only once, instead of iteratively by a pass that needs the information while e.g. inlining. The idea of using llvm.assume is to give the compiler more knowledge when trying to inline or specialize a function. For example, if the callee has a I understand that emitting this on a per-call basis is a bit of a hammer. I had expected/hoped that LLVM passes would hoist/CSE the multiple llvm.assume's to reduce the number of these intrinsics (because the value of the condition is constant inside the function). Am I correct in thinking that emitting an llvm.assume once in the entry block of a function is not a problem? |
@efriedma-quic @aemerson ping
That's fair, I had mistakenly assumed |
|
The two issues are sort of tied together: if the intrinsic isn't IntrNoMem, I suspect the assumption cache stops cooperating. But anyway, I guess the question is really whether we want these constructs represented explicitly in IR, to try to leverage the AssumptionCache, or if we want to instead add a TTI hook for transform passes to query for the information in question. My default is not to use assumptions; llvm.assume calls still cause issues for optimizations, and they're not really necessary here. (I think the situation around llvm.assume has improved to some extent, but it's still a bit problematic.) Even if we do decide we want assumptions, we likely want to add them in an IR pass, instead of in clang, so that other frontends don't need to duplicate this code. |
If the streaming mode of the calling function is known, emitting @llvm.assume before calls to
__arm_streaming_compatiblefunctions allows other passes, such as the partial inliner, to make more informed decisions on what to do. In the case of the partial inliner, this allows for only the live code paths to be inlined, and leaves the other code paths to be removed as part of dead code elimination.