-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[PreISelIntrinsicLowering] Look through ptrtoint for a constant (e.g. global ptr) when trying to produce memset_pattern16 #129220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… memset_pattern16 for loaded constant val
…ading from constant global This is motivated by llvm#126736, and catches a case that would have resulted in memset_pattern16 being produced by LoopIdiomRecognize previously but is missed after moving to the intrinsic in llvm#126736 and relying on PreISelintrinsicLoewring to produce the libcall when available. The logic for handling load instructions that access constant globals could be made more extensive, but it's not clear it would be worthwhile. For now we prioritise the patterns that could be produced by LoopIdiomRecognize.
|
@llvm/pr-subscribers-llvm-transforms Author: Alex Bradbury (asb) ChangesThis is motivated by #126736, and catches a case that would have The logic for handling load instructions that access constant globals Full diff: https://github.com/llvm/llvm-project/pull/129220.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
index 27fa0b43d74f6..458e52bcb7c68 100644
--- a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
@@ -254,10 +254,23 @@ static Constant *getMemSetPattern16Value(MemSetPatternInst *Inst,
if (!isLibFuncEmittable(M, &TLI, LibFunc_memset_pattern16))
return nullptr;
+ // If V is a load instruction that loads from a constant global then attempt
+ // to use that constant to produce the pattern.
+ Constant *C = nullptr;
+ if (auto *LI = dyn_cast<LoadInst>(V)) {
+ if (auto *GV = dyn_cast<GlobalVariable>(LI->getPointerOperand())) {
+ if (GV->isConstant() && GV->hasInitializer()) {
+ C = GV->getInitializer();
+ }
+ }
+ }
+
+ if (!C)
+ C = dyn_cast<Constant>(V);
+
// If the value isn't a constant, we can't promote it to being in a constant
// array. We could theoretically do a store to an alloca or something, but
// that doesn't seem worthwhile.
- Constant *C = dyn_cast<Constant>(V);
if (!C || isa<ConstantExpr>(C))
return nullptr;
@@ -284,7 +297,7 @@ static Constant *getMemSetPattern16Value(MemSetPatternInst *Inst,
// Otherwise, we'll use an array of the constants.
uint64_t ArraySize = 16 / Size;
- ArrayType *AT = ArrayType::get(V->getType(), ArraySize);
+ ArrayType *AT = ArrayType::get(C->getType(), ArraySize);
return ConstantArray::get(AT, std::vector<Constant *>(ArraySize, C));
}
diff --git a/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll b/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
index 7cfdcb8578809..e44007f736370 100644
--- a/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
+++ b/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
@@ -2,13 +2,18 @@
; RUN: opt -mtriple=x86_64-apple-darwin10.0.0 -passes=pre-isel-intrinsic-lowering -S -o - %s | FileCheck %s
;.
+; CHECK: @G = global i32 5
+; CHECK: @ptr_pat = private unnamed_addr constant ptr @G, align 8
+; CHECK: @nonconst_ptr_pat = private unnamed_addr global ptr @G, align 8
; CHECK: @.memset_pattern = private unnamed_addr constant [2 x i64] [i64 -6148895925951734307, i64 -6148895925951734307], align 16
; CHECK: @.memset_pattern.1 = private unnamed_addr constant [2 x i64] [i64 4614256656552045848, i64 4614256656552045848], align 16
-; CHECK: @.memset_pattern.2 = private unnamed_addr constant [8 x i16] [i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555], align 16
-; CHECK: @.memset_pattern.3 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
-; CHECK: @.memset_pattern.4 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.2 = private unnamed_addr constant [2 x ptr] [ptr @G, ptr @G], align 16
+; CHECK: @.memset_pattern.3 = private unnamed_addr constant [2 x ptr] [ptr @G, ptr @G], align 16
+; CHECK: @.memset_pattern.4 = private unnamed_addr constant [8 x i16] [i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555], align 16
; CHECK: @.memset_pattern.5 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
; CHECK: @.memset_pattern.6 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.7 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.8 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
;.
define void @memset_pattern_i128_1_dynvalue(ptr %a, i128 %value) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_1_dynvalue(
@@ -31,7 +36,7 @@ define void @memset_pattern_i128_1_dynvalue(ptr %a, i128 %value) nounwind {
define void @memset_pattern_i128_1(ptr %a, i128 %value) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_1(
; CHECK-SAME: ptr [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.3, i64 16)
+; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.5, i64 16)
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
@@ -59,7 +64,7 @@ define void @memset_pattern_i128_1_nz_as(ptr addrspace(1) %a, i128 %value) nounw
define void @memset_pattern_i128_1_align_attr(ptr align(16) %a, i128 %value) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_1_align_attr(
; CHECK-SAME: ptr align 16 [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: call void @memset_pattern16(ptr align 16 [[A]], ptr @.memset_pattern.4, i64 16)
+; CHECK-NEXT: call void @memset_pattern16(ptr align 16 [[A]], ptr @.memset_pattern.6, i64 16)
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr align(16) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
@@ -69,7 +74,7 @@ define void @memset_pattern_i128_1_align_attr(ptr align(16) %a, i128 %value) nou
define void @memset_pattern_i128_16(ptr %a) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_16(
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.5, i64 256)
+; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.7, i64 256)
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 16, i1 false)
@@ -80,7 +85,7 @@ define void @memset_pattern_i128_x(ptr %a, i64 %x) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_x(
; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 16, [[X]]
-; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.6, i64 [[TMP1]])
+; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.8, i64 [[TMP1]])
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 %x, i1 false)
@@ -110,7 +115,7 @@ define void @memset_pattern_i16_x(ptr %a, i64 %x) nounwind {
; CHECK-LABEL: define void @memset_pattern_i16_x(
; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 2, [[X]]
-; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.2, i64 [[TMP1]])
+; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.4, i64 [[TMP1]])
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i16 u0xabcd, i64 %x, i1 false)
@@ -144,6 +149,60 @@ define void @memset_pattern_i64_128_tbaa(ptr %a) nounwind {
!7 = !{!"omnipotent char", !8, i64 0}
!8 = !{!"Simple C++ TBAA"}
+@G = global i32 5
+@ptr_pat = private unnamed_addr constant ptr @G, align 8
+
+define void @memset_pattern_i64_16_fromptr(ptr %a) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_16_fromptr(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr @ptr_pat, align 8
+; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.2, i64 128)
+; CHECK-NEXT: ret void
+;
+ %1 = load i64, ptr @ptr_pat, align 8
+ tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 16, i1 false)
+ ret void
+}
+
+define void @memset_pattern_i64_x_fromptr(ptr %a, i64 %x) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_x_fromptr(
+; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr @ptr_pat, align 8
+; CHECK-NEXT: [[TMP2:%.*]] = mul i64 8, [[X]]
+; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.3, i64 [[TMP2]])
+; CHECK-NEXT: ret void
+;
+ %1 = load i64, ptr @ptr_pat, align 8
+ tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false)
+ ret void
+}
+
+@nonconst_ptr_pat = private unnamed_addr global ptr @G, align 8
+
+; memset_pattern16 shouldn't be used for this example (at least not by just
+; creating a constantarray global at compile tiem), as the global isn't
+; constant.
+define void @memset_pattern_i64_x_fromnonconstptr(ptr %a, i64 %x) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_x_fromnonconstptr(
+; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr @nonconst_ptr_pat, align 8
+; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 0, [[X]]
+; CHECK-NEXT: br i1 [[TMP2]], label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]]
+; CHECK: [[LOADSTORELOOP]]:
+; CHECK-NEXT: [[TMP3:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP5:%.*]], %[[LOADSTORELOOP]] ]
+; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
+; CHECK-NEXT: store i64 [[TMP1]], ptr [[TMP4]], align 1
+; CHECK-NEXT: [[TMP5]] = add i64 [[TMP3]], 1
+; CHECK-NEXT: [[TMP6:%.*]] = icmp ult i64 [[TMP5]], [[X]]
+; CHECK-NEXT: br i1 [[TMP6]], label %[[LOADSTORELOOP]], label %[[SPLIT]]
+; CHECK: [[SPLIT]]:
+; CHECK-NEXT: ret void
+;
+ %1 = load i64, ptr @nonconst_ptr_pat, align 8
+ tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false)
+ ret void
+}
+
;.
; CHECK: attributes #[[ATTR0]] = { nounwind }
; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: write) }
|
|
I'm not sure I understand why we need to do this. That is, shouldn't LIR be passing the pattern directly as an argument to experimental.memset.pattern and leave it to intrinsic lowering to create the global, instead of creating the global itself? |
You're absolutely right, the process of LoopIdiomRecognize creating a global value and loading that is unnecessary and should have been removed for the ptr case as it had been for int and fp. I've updated #126736 to reflect this and reworked this patch to just look through a ptrtoint for a constant arg. Thank you for calling out this weirdness. This continues along the path of leaving memset.pattern as taking an integer type for now, and potentially allowing overloads of other types in future work. |
…iclowering-memset-pattern-ptrpat
…iclowering-memset-pattern-ptrpat
|
Ping. |
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This continues along the path of leaving memset.pattern as taking an integer type for now, and potentially allowing overloads of other types in future work.
I feel like it would make more sense to lift that restriction right away. It seem like both this PR and #129220 have to do unnecessary work to force the use of integers, while (I think?) non-integer types should "just work" if we allow them.
| // Look through a ptrtoint cast for a candidate constant. This could be | ||
| // extended to look through other casts, but the assumption is earlier | ||
| // passes that introduced memset.pattern intrinsic would have just emitted | ||
| // the integer argument directly for CosntantFP or ConstantInt cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // the integer argument directly for CosntantFP or ConstantInt cases. | |
| // the integer argument directly for ConstantFP or ConstantInt cases. |
| %1 = ptrtoint ptr %p to i64 | ||
| tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false) | ||
| ret void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(finding the new no-required mangling suffix behavior confusing)
Could instcombine fold this to pull the cast in, and adjust the call signature?
After looking through this and the dependent patch, I strongly agree with @nikic's point here. The right thing to do here is simply to allow non-integer types for the pattern argument in memset.pattern intrinsic. In principal, the pattern value could be any (sized) type. |
… the pattern argument I initially thought starting with a more narrow definition and later expanding would make more sense. But as pointed out in review for PR llvm#129220, this restriction is generating additional unnecessary work. This patch alters the intrinsic to accept patterns of any type. Future patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to take advantage of this. The verifier will complain if an unsized type is used. I've additionally taken the opportunity to remove a comment from the LangRef about some bit widths potentially not being supported by the target. I don't think this is any more true than it is for arbitrary width loads/stores which don't carry a similar warning that I can see.
… the pattern argument (#132026) I initially thought starting with a more narrow definition and later expanding would make more sense. But as pointed out in review for PR #129220, this restriction is generating additional unnecessary work. This patch alters the intrinsic to accept patterns of any type. Future patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to take advantage of this. The verifier will complain if an unsized type is used. I've additionally taken the opportunity to remove a comment from the LangRef about some bit widths potentially not being supported by the target. I don't think this is any more true than it is for arbitrary width loads/stores which don't carry a similar warning that I can see. A verifier check ensures that only sized types are used for the pattern.
This is motivated by #126736, and catches a case that would have
resulted in memset_pattern16 being produced by LoopIdiomRecognize
previously but is missed after moving to the intrinsic in #126736 and
relying on PreISelintrinsicLoewring to produce the libcall when
available.
The logic for looking through casts could be made more extensive, but the assumption is that the cast will only be present (in this case, ptrtoint) if a constant integer couldn't have been produced directly. This is true for a global ptr, but not for constant ints or constant fp values.