Skip to content

Conversation

@asb
Copy link
Contributor

@asb asb commented Feb 28, 2025

This is motivated by #126736, and catches a case that would have
resulted in memset_pattern16 being produced by LoopIdiomRecognize
previously but is missed after moving to the intrinsic in #126736 and
relying on PreISelintrinsicLoewring to produce the libcall when
available.

The logic for looking through casts could be made more extensive, but the assumption is that the cast will only be present (in this case, ptrtoint) if a constant integer couldn't have been produced directly. This is true for a global ptr, but not for constant ints or constant fp values.

asb added 2 commits February 28, 2025 10:14
…ading from constant global

This is motivated by llvm#126736, and catches a case that would have
resulted in memset_pattern16 being produced by LoopIdiomRecognize
previously but is missed after moving to the intrinsic in llvm#126736 and
relying on PreISelintrinsicLoewring to produce the libcall when
available.

The logic for handling load instructions that access constant globals
could be made more extensive, but it's not clear it would be worthwhile.
For now we prioritise the patterns that could be produced by
LoopIdiomRecognize.
@llvmbot
Copy link
Member

llvmbot commented Feb 28, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Alex Bradbury (asb)

Changes

This is motivated by #126736, and catches a case that would have
resulted in memset_pattern16 being produced by LoopIdiomRecognize
previously but is missed after moving to the intrinsic in #126736 and
relying on PreISelintrinsicLoewring to produce the libcall when
available.

The logic for handling load instructions that access constant globals
could be made more extensive, but it's not clear it would be worthwhile.
For now we prioritise the patterns that could be produced by
LoopIdiomRecognize.


Full diff: https://github.com/llvm/llvm-project/pull/129220.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp (+15-2)
  • (modified) llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll (+67-8)
diff --git a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
index 27fa0b43d74f6..458e52bcb7c68 100644
--- a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
@@ -254,10 +254,23 @@ static Constant *getMemSetPattern16Value(MemSetPatternInst *Inst,
   if (!isLibFuncEmittable(M, &TLI, LibFunc_memset_pattern16))
     return nullptr;
 
+  // If V is a load instruction that loads from a constant global then attempt
+  // to use that constant to produce the pattern.
+  Constant *C = nullptr;
+  if (auto *LI = dyn_cast<LoadInst>(V)) {
+    if (auto *GV = dyn_cast<GlobalVariable>(LI->getPointerOperand())) {
+      if (GV->isConstant() && GV->hasInitializer()) {
+        C = GV->getInitializer();
+      }
+    }
+  }
+
+  if (!C)
+    C = dyn_cast<Constant>(V);
+
   // If the value isn't a constant, we can't promote it to being in a constant
   // array.  We could theoretically do a store to an alloca or something, but
   // that doesn't seem worthwhile.
-  Constant *C = dyn_cast<Constant>(V);
   if (!C || isa<ConstantExpr>(C))
     return nullptr;
 
@@ -284,7 +297,7 @@ static Constant *getMemSetPattern16Value(MemSetPatternInst *Inst,
 
   // Otherwise, we'll use an array of the constants.
   uint64_t ArraySize = 16 / Size;
-  ArrayType *AT = ArrayType::get(V->getType(), ArraySize);
+  ArrayType *AT = ArrayType::get(C->getType(), ArraySize);
   return ConstantArray::get(AT, std::vector<Constant *>(ArraySize, C));
 }
 
diff --git a/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll b/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
index 7cfdcb8578809..e44007f736370 100644
--- a/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
+++ b/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
@@ -2,13 +2,18 @@
 ; RUN: opt -mtriple=x86_64-apple-darwin10.0.0 -passes=pre-isel-intrinsic-lowering -S -o - %s | FileCheck %s
 
 ;.
+; CHECK: @G = global i32 5
+; CHECK: @ptr_pat = private unnamed_addr constant ptr @G, align 8
+; CHECK: @nonconst_ptr_pat = private unnamed_addr global ptr @G, align 8
 ; CHECK: @.memset_pattern = private unnamed_addr constant [2 x i64] [i64 -6148895925951734307, i64 -6148895925951734307], align 16
 ; CHECK: @.memset_pattern.1 = private unnamed_addr constant [2 x i64] [i64 4614256656552045848, i64 4614256656552045848], align 16
-; CHECK: @.memset_pattern.2 = private unnamed_addr constant [8 x i16] [i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555], align 16
-; CHECK: @.memset_pattern.3 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
-; CHECK: @.memset_pattern.4 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.2 = private unnamed_addr constant [2 x ptr] [ptr @G, ptr @G], align 16
+; CHECK: @.memset_pattern.3 = private unnamed_addr constant [2 x ptr] [ptr @G, ptr @G], align 16
+; CHECK: @.memset_pattern.4 = private unnamed_addr constant [8 x i16] [i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555], align 16
 ; CHECK: @.memset_pattern.5 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
 ; CHECK: @.memset_pattern.6 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.7 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.8 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
 ;.
 define void @memset_pattern_i128_1_dynvalue(ptr %a, i128 %value) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_1_dynvalue(
@@ -31,7 +36,7 @@ define void @memset_pattern_i128_1_dynvalue(ptr %a, i128 %value) nounwind {
 define void @memset_pattern_i128_1(ptr %a, i128 %value) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_1(
 ; CHECK-SAME: ptr [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.3, i64 16)
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.5, i64 16)
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
@@ -59,7 +64,7 @@ define void @memset_pattern_i128_1_nz_as(ptr addrspace(1) %a, i128 %value) nounw
 define void @memset_pattern_i128_1_align_attr(ptr align(16) %a, i128 %value) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_1_align_attr(
 ; CHECK-SAME: ptr align 16 [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    call void @memset_pattern16(ptr align 16 [[A]], ptr @.memset_pattern.4, i64 16)
+; CHECK-NEXT:    call void @memset_pattern16(ptr align 16 [[A]], ptr @.memset_pattern.6, i64 16)
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr align(16) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
@@ -69,7 +74,7 @@ define void @memset_pattern_i128_1_align_attr(ptr align(16) %a, i128 %value) nou
 define void @memset_pattern_i128_16(ptr %a) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_16(
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.5, i64 256)
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.7, i64 256)
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 16, i1 false)
@@ -80,7 +85,7 @@ define void @memset_pattern_i128_x(ptr %a, i64 %x) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_x(
 ; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul i64 16, [[X]]
-; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.6, i64 [[TMP1]])
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.8, i64 [[TMP1]])
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 %x, i1 false)
@@ -110,7 +115,7 @@ define void @memset_pattern_i16_x(ptr %a, i64 %x) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i16_x(
 ; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul i64 2, [[X]]
-; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.2, i64 [[TMP1]])
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.4, i64 [[TMP1]])
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr %a, i16 u0xabcd, i64 %x, i1 false)
@@ -144,6 +149,60 @@ define void @memset_pattern_i64_128_tbaa(ptr %a) nounwind {
 !7 = !{!"omnipotent char", !8, i64 0}
 !8 = !{!"Simple C++ TBAA"}
 
+@G = global i32 5
+@ptr_pat = private unnamed_addr constant ptr @G, align 8
+
+define void @memset_pattern_i64_16_fromptr(ptr %a) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_16_fromptr(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load i64, ptr @ptr_pat, align 8
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.2, i64 128)
+; CHECK-NEXT:    ret void
+;
+  %1 = load i64, ptr @ptr_pat, align 8
+  tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 16, i1 false)
+  ret void
+}
+
+define void @memset_pattern_i64_x_fromptr(ptr %a, i64 %x) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_x_fromptr(
+; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load i64, ptr @ptr_pat, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 8, [[X]]
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.3, i64 [[TMP2]])
+; CHECK-NEXT:    ret void
+;
+  %1 = load i64, ptr @ptr_pat, align 8
+  tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false)
+  ret void
+}
+
+@nonconst_ptr_pat = private unnamed_addr global ptr @G, align 8
+
+; memset_pattern16 shouldn't be used for this example (at least not by just
+; creating a constantarray global at compile tiem), as the global isn't
+; constant.
+define void @memset_pattern_i64_x_fromnonconstptr(ptr %a, i64 %x) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_x_fromnonconstptr(
+; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load i64, ptr @nonconst_ptr_pat, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 0, [[X]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]]
+; CHECK:       [[LOADSTORELOOP]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP5:%.*]], %[[LOADSTORELOOP]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
+; CHECK-NEXT:    store i64 [[TMP1]], ptr [[TMP4]], align 1
+; CHECK-NEXT:    [[TMP5]] = add i64 [[TMP3]], 1
+; CHECK-NEXT:    [[TMP6:%.*]] = icmp ult i64 [[TMP5]], [[X]]
+; CHECK-NEXT:    br i1 [[TMP6]], label %[[LOADSTORELOOP]], label %[[SPLIT]]
+; CHECK:       [[SPLIT]]:
+; CHECK-NEXT:    ret void
+;
+  %1 = load i64, ptr @nonconst_ptr_pat, align 8
+  tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false)
+  ret void
+}
+
 ;.
 ; CHECK: attributes #[[ATTR0]] = { nounwind }
 ; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: write) }

@nikic
Copy link
Contributor

nikic commented Feb 28, 2025

I'm not sure I understand why we need to do this. That is, shouldn't LIR be passing the pattern directly as an argument to experimental.memset.pattern and leave it to intrinsic lowering to create the global, instead of creating the global itself?

@asb asb changed the title [PreISelIntrinsicLowering] Support producing memset_pattern16 when loading from constant global [PreISelIntrinsicLowering] Look through ptrtoint for a constant (e.g. global ptr) when trying to produce memset_pattern16 Mar 5, 2025
@asb
Copy link
Contributor Author

asb commented Mar 5, 2025

I'm not sure I understand why we need to do this. That is, shouldn't LIR be passing the pattern directly as an argument to experimental.memset.pattern and leave it to intrinsic lowering to create the global, instead of creating the global itself?

You're absolutely right, the process of LoopIdiomRecognize creating a global value and loading that is unnecessary and should have been removed for the ptr case as it had been for int and fp. I've updated #126736 to reflect this and reworked this patch to just look through a ptrtoint for a constant arg. Thank you for calling out this weirdness.

This continues along the path of leaving memset.pattern as taking an integer type for now, and potentially allowing overloads of other types in future work.

@asb
Copy link
Contributor Author

asb commented Mar 12, 2025

Ping.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This continues along the path of leaving memset.pattern as taking an integer type for now, and potentially allowing overloads of other types in future work.

I feel like it would make more sense to lift that restriction right away. It seem like both this PR and #129220 have to do unnecessary work to force the use of integers, while (I think?) non-integer types should "just work" if we allow them.

// Look through a ptrtoint cast for a candidate constant. This could be
// extended to look through other casts, but the assumption is earlier
// passes that introduced memset.pattern intrinsic would have just emitted
// the integer argument directly for CosntantFP or ConstantInt cases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the integer argument directly for CosntantFP or ConstantInt cases.
// the integer argument directly for ConstantFP or ConstantInt cases.

Comment on lines +195 to +197
%1 = ptrtoint ptr %p to i64
tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false)
ret void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(finding the new no-required mangling suffix behavior confusing)

Could instcombine fold this to pull the cast in, and adjust the call signature?

@preames
Copy link
Collaborator

preames commented Mar 18, 2025

I feel like it would make more sense to lift that restriction right away. It seem like both this PR and #129220 have to do unnecessary work to force the use of integers, while (I think?) non-integer types should "just work" if we allow them.

After looking through this and the dependent patch, I strongly agree with @nikic's point here. The right thing to do here is simply to allow non-integer types for the pattern argument in memset.pattern intrinsic. In principal, the pattern value could be any (sized) type.

asb added a commit to asb/llvm-project that referenced this pull request Mar 19, 2025
… the pattern argument

I initially thought starting with a more narrow definition and later
expanding would make more sense. But as pointed out in review for
PR llvm#129220, this restriction is generating additional unnecessary work.

This patch alters the intrinsic to accept patterns of any type. Future
patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to
take advantage of this. The verifier will complain if an unsized type is
used. I've additionally taken the opportunity to remove a comment from
the LangRef about some bit widths potentially not being supported by the
target. I don't think this is any more true than it is for arbitrary
width loads/stores which don't carry a similar warning that I can see.
asb added a commit that referenced this pull request Mar 19, 2025
… the pattern argument (#132026)

I initially thought starting with a more narrow definition and later
expanding would make more sense. But as pointed out in review for PR
#129220, this restriction is generating additional unnecessary work.

This patch alters the intrinsic to accept patterns of any type. Future
patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to
take advantage of this. The verifier will complain if an unsized type is
used. I've additionally taken the opportunity to remove a comment from
the LangRef about some bit widths potentially not being supported by the
target. I don't think this is any more true than it is for arbitrary
width loads/stores which don't carry a similar warning that I can see.

A verifier check ensures that only sized types are used for the pattern.
asb added a commit that referenced this pull request Mar 27, 2025
…memset.pattern with ptr pattern argument

These tests all show the desired output because after #132026, the
intrinsic can take any sized type as an argument. The tests were adapted
from those I proposed adding in #129220.
@asb
Copy link
Contributor Author

asb commented Mar 27, 2025

With #132026 this is no longer necessary (thank you again for pointing out I was making this unnecessarily fiddly). In c676eb7 I've directly committed test cases that give coverage for memset.pattern with ptr arguments.

Closing this PR as it has been rendered redundant.

@asb asb closed this Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants