[PreISelIntrinsicLowering] Look through ptrtoint for a constant (e.g. global ptr) when trying to produce memset_pattern16 #129220

asb · 2025-02-28T10:15:51Z

This is motivated by #126736, and catches a case that would have
resulted in memset_pattern16 being produced by LoopIdiomRecognize
previously but is missed after moving to the intrinsic in #126736 and
relying on PreISelintrinsicLoewring to produce the libcall when
available.

The logic for looking through casts could be made more extensive, but the assumption is that the cast will only be present (in this case, ptrtoint) if a constant integer couldn't have been produced directly. This is true for a global ptr, but not for constant ints or constant fp values.

… memset_pattern16 for loaded constant val

…ading from constant global This is motivated by llvm#126736, and catches a case that would have resulted in memset_pattern16 being produced by LoopIdiomRecognize previously but is missed after moving to the intrinsic in llvm#126736 and relying on PreISelintrinsicLoewring to produce the libcall when available. The logic for handling load instructions that access constant globals could be made more extensive, but it's not clear it would be worthwhile. For now we prioritise the patterns that could be produced by LoopIdiomRecognize.

llvmbot · 2025-02-28T10:16:27Z

@llvm/pr-subscribers-llvm-transforms

Author: Alex Bradbury (asb)

Changes

This is motivated by #126736, and catches a case that would have
resulted in memset_pattern16 being produced by LoopIdiomRecognize
previously but is missed after moving to the intrinsic in #126736 and
relying on PreISelintrinsicLoewring to produce the libcall when
available.

The logic for handling load instructions that access constant globals
could be made more extensive, but it's not clear it would be worthwhile.
For now we prioritise the patterns that could be produced by
LoopIdiomRecognize.

Full diff: https://github.com/llvm/llvm-project/pull/129220.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp (+15-2)
(modified) llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll (+67-8)

diff --git a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
index 27fa0b43d74f6..458e52bcb7c68 100644
--- a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
@@ -254,10 +254,23 @@ static Constant *getMemSetPattern16Value(MemSetPatternInst *Inst,
   if (!isLibFuncEmittable(M, &TLI, LibFunc_memset_pattern16))
     return nullptr;
 
+  // If V is a load instruction that loads from a constant global then attempt
+  // to use that constant to produce the pattern.
+  Constant *C = nullptr;
+  if (auto *LI = dyn_cast<LoadInst>(V)) {
+    if (auto *GV = dyn_cast<GlobalVariable>(LI->getPointerOperand())) {
+      if (GV->isConstant() && GV->hasInitializer()) {
+        C = GV->getInitializer();
+      }
+    }
+  }
+
+  if (!C)
+    C = dyn_cast<Constant>(V);
+
   // If the value isn't a constant, we can't promote it to being in a constant
   // array.  We could theoretically do a store to an alloca or something, but
   // that doesn't seem worthwhile.
-  Constant *C = dyn_cast<Constant>(V);
   if (!C || isa<ConstantExpr>(C))
     return nullptr;
 
@@ -284,7 +297,7 @@ static Constant *getMemSetPattern16Value(MemSetPatternInst *Inst,
 
   // Otherwise, we'll use an array of the constants.
   uint64_t ArraySize = 16 / Size;
-  ArrayType *AT = ArrayType::get(V->getType(), ArraySize);
+  ArrayType *AT = ArrayType::get(C->getType(), ArraySize);
   return ConstantArray::get(AT, std::vector<Constant *>(ArraySize, C));
 }
 
diff --git a/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll b/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
index 7cfdcb8578809..e44007f736370 100644
--- a/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
+++ b/llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
@@ -2,13 +2,18 @@
 ; RUN: opt -mtriple=x86_64-apple-darwin10.0.0 -passes=pre-isel-intrinsic-lowering -S -o - %s | FileCheck %s
 
 ;.
+; CHECK: @G = global i32 5
+; CHECK: @ptr_pat = private unnamed_addr constant ptr @G, align 8
+; CHECK: @nonconst_ptr_pat = private unnamed_addr global ptr @G, align 8
 ; CHECK: @.memset_pattern = private unnamed_addr constant [2 x i64] [i64 -6148895925951734307, i64 -6148895925951734307], align 16
 ; CHECK: @.memset_pattern.1 = private unnamed_addr constant [2 x i64] [i64 4614256656552045848, i64 4614256656552045848], align 16
-; CHECK: @.memset_pattern.2 = private unnamed_addr constant [8 x i16] [i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555], align 16
-; CHECK: @.memset_pattern.3 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
-; CHECK: @.memset_pattern.4 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.2 = private unnamed_addr constant [2 x ptr] [ptr @G, ptr @G], align 16
+; CHECK: @.memset_pattern.3 = private unnamed_addr constant [2 x ptr] [ptr @G, ptr @G], align 16
+; CHECK: @.memset_pattern.4 = private unnamed_addr constant [8 x i16] [i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555], align 16
 ; CHECK: @.memset_pattern.5 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
 ; CHECK: @.memset_pattern.6 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.7 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
+; CHECK: @.memset_pattern.8 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
 ;.
 define void @memset_pattern_i128_1_dynvalue(ptr %a, i128 %value) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_1_dynvalue(
@@ -31,7 +36,7 @@ define void @memset_pattern_i128_1_dynvalue(ptr %a, i128 %value) nounwind {
 define void @memset_pattern_i128_1(ptr %a, i128 %value) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_1(
 ; CHECK-SAME: ptr [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.3, i64 16)
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.5, i64 16)
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
@@ -59,7 +64,7 @@ define void @memset_pattern_i128_1_nz_as(ptr addrspace(1) %a, i128 %value) nounw
 define void @memset_pattern_i128_1_align_attr(ptr align(16) %a, i128 %value) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_1_align_attr(
 ; CHECK-SAME: ptr align 16 [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    call void @memset_pattern16(ptr align 16 [[A]], ptr @.memset_pattern.4, i64 16)
+; CHECK-NEXT:    call void @memset_pattern16(ptr align 16 [[A]], ptr @.memset_pattern.6, i64 16)
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr align(16) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
@@ -69,7 +74,7 @@ define void @memset_pattern_i128_1_align_attr(ptr align(16) %a, i128 %value) nou
 define void @memset_pattern_i128_16(ptr %a) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_16(
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.5, i64 256)
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.7, i64 256)
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 16, i1 false)
@@ -80,7 +85,7 @@ define void @memset_pattern_i128_x(ptr %a, i64 %x) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i128_x(
 ; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul i64 16, [[X]]
-; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.6, i64 [[TMP1]])
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.8, i64 [[TMP1]])
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 %x, i1 false)
@@ -110,7 +115,7 @@ define void @memset_pattern_i16_x(ptr %a, i64 %x) nounwind {
 ; CHECK-LABEL: define void @memset_pattern_i16_x(
 ; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul i64 2, [[X]]
-; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.2, i64 [[TMP1]])
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.4, i64 [[TMP1]])
 ; CHECK-NEXT:    ret void
 ;
   tail call void @llvm.experimental.memset.pattern(ptr %a, i16 u0xabcd, i64 %x, i1 false)
@@ -144,6 +149,60 @@ define void @memset_pattern_i64_128_tbaa(ptr %a) nounwind {
 !7 = !{!"omnipotent char", !8, i64 0}
 !8 = !{!"Simple C++ TBAA"}
 
+@G = global i32 5
+@ptr_pat = private unnamed_addr constant ptr @G, align 8
+
+define void @memset_pattern_i64_16_fromptr(ptr %a) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_16_fromptr(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load i64, ptr @ptr_pat, align 8
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.2, i64 128)
+; CHECK-NEXT:    ret void
+;
+  %1 = load i64, ptr @ptr_pat, align 8
+  tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 16, i1 false)
+  ret void
+}
+
+define void @memset_pattern_i64_x_fromptr(ptr %a, i64 %x) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_x_fromptr(
+; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load i64, ptr @ptr_pat, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 8, [[X]]
+; CHECK-NEXT:    call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.3, i64 [[TMP2]])
+; CHECK-NEXT:    ret void
+;
+  %1 = load i64, ptr @ptr_pat, align 8
+  tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false)
+  ret void
+}
+
+@nonconst_ptr_pat = private unnamed_addr global ptr @G, align 8
+
+; memset_pattern16 shouldn't be used for this example (at least not by just
+; creating a constantarray global at compile tiem), as the global isn't
+; constant.
+define void @memset_pattern_i64_x_fromnonconstptr(ptr %a, i64 %x) nounwind {
+; CHECK-LABEL: define void @memset_pattern_i64_x_fromnonconstptr(
+; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load i64, ptr @nonconst_ptr_pat, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 0, [[X]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]]
+; CHECK:       [[LOADSTORELOOP]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP5:%.*]], %[[LOADSTORELOOP]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
+; CHECK-NEXT:    store i64 [[TMP1]], ptr [[TMP4]], align 1
+; CHECK-NEXT:    [[TMP5]] = add i64 [[TMP3]], 1
+; CHECK-NEXT:    [[TMP6:%.*]] = icmp ult i64 [[TMP5]], [[X]]
+; CHECK-NEXT:    br i1 [[TMP6]], label %[[LOADSTORELOOP]], label %[[SPLIT]]
+; CHECK:       [[SPLIT]]:
+; CHECK-NEXT:    ret void
+;
+  %1 = load i64, ptr @nonconst_ptr_pat, align 8
+  tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false)
+  ret void
+}
+
 ;.
 ; CHECK: attributes #[[ATTR0]] = { nounwind }
 ; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: write) }

nikic · 2025-02-28T15:27:22Z

I'm not sure I understand why we need to do this. That is, shouldn't LIR be passing the pattern directly as an argument to experimental.memset.pattern and leave it to intrinsic lowering to create the global, instead of creating the global itself?

…iclowering-memset-pattern-ptrpat

…lobalvalue

asb · 2025-03-05T14:21:25Z

I'm not sure I understand why we need to do this. That is, shouldn't LIR be passing the pattern directly as an argument to experimental.memset.pattern and leave it to intrinsic lowering to create the global, instead of creating the global itself?

You're absolutely right, the process of LoopIdiomRecognize creating a global value and loading that is unnecessary and should have been removed for the ptr case as it had been for int and fp. I've updated #126736 to reflect this and reworked this patch to just look through a ptrtoint for a constant arg. Thank you for calling out this weirdness.

This continues along the path of leaving memset.pattern as taking an integer type for now, and potentially allowing overloads of other types in future work.

…iclowering-memset-pattern-ptrpat

asb · 2025-03-12T12:24:35Z

Ping.

nikic

This continues along the path of leaving memset.pattern as taking an integer type for now, and potentially allowing overloads of other types in future work.

I feel like it would make more sense to lift that restriction right away. It seem like both this PR and #129220 have to do unnecessary work to force the use of integers, while (I think?) non-integer types should "just work" if we allow them.

nikic · 2025-03-12T13:59:47Z

llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp

+  // Look through a ptrtoint cast for a candidate constant. This could be
+  // extended to look through other casts, but the assumption is earlier
+  // passes that introduced memset.pattern intrinsic would have just emitted
+  // the integer argument directly for CosntantFP or ConstantInt cases.


Suggested change

// the integer argument directly for CosntantFP or ConstantInt cases.

// the integer argument directly for ConstantFP or ConstantInt cases.

arsenm · 2025-03-13T04:01:43Z

llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll

+  %1 = ptrtoint ptr %p to i64
+  tail call void @llvm.experimental.memset.pattern(ptr %a, i64 %1, i64 %x, i1 false)
+  ret void


(finding the new no-required mangling suffix behavior confusing)

Could instcombine fold this to pull the cast in, and adjust the call signature?

preames · 2025-03-18T19:28:27Z

I feel like it would make more sense to lift that restriction right away. It seem like both this PR and #129220 have to do unnecessary work to force the use of integers, while (I think?) non-integer types should "just work" if we allow them.

After looking through this and the dependent patch, I strongly agree with @nikic's point here. The right thing to do here is simply to allow non-integer types for the pattern argument in memset.pattern intrinsic. In principal, the pattern value could be any (sized) type.

… the pattern argument I initially thought starting with a more narrow definition and later expanding would make more sense. But as pointed out in review for PR llvm#129220, this restriction is generating additional unnecessary work. This patch alters the intrinsic to accept patterns of any type. Future patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to take advantage of this. The verifier will complain if an unsized type is used. I've additionally taken the opportunity to remove a comment from the LangRef about some bit widths potentially not being supported by the target. I don't think this is any more true than it is for arbitrary width loads/stores which don't carry a similar warning that I can see.

… the pattern argument (#132026) I initially thought starting with a more narrow definition and later expanding would make more sense. But as pointed out in review for PR #129220, this restriction is generating additional unnecessary work. This patch alters the intrinsic to accept patterns of any type. Future patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to take advantage of this. The verifier will complain if an unsized type is used. I've additionally taken the opportunity to remove a comment from the LangRef about some bit widths potentially not being supported by the target. I don't think this is any more true than it is for arbitrary width loads/stores which don't carry a similar warning that I can see. A verifier check ensures that only sized types are used for the pattern.

…memset.pattern with ptr pattern argument These tests all show the desired output because after #132026, the intrinsic can take any sized type as an argument. The tests were adapted from those I proposed adding in #129220.

asb · 2025-03-27T22:43:05Z

With #132026 this is no longer necessary (thank you again for pointing out I was making this unnecessarily fiddly). In c676eb7 I've directly committed test cases that give coverage for memset.pattern with ptr arguments.

Closing this PR as it has been rendered redundant.

asb added 2 commits February 28, 2025 10:14

[PreISelIntrinsicLowering][test] Precommit test for failure to select…

a313322

… memset_pattern16 for loaded constant val

asb requested review from arsenm, nikic and preames February 28, 2025 10:15

llvmbot added the llvm:transforms label Feb 28, 2025

asb added 3 commits March 5, 2025 13:35

Merge remote-tracking branch 'origin/main' into 2025q1-preiselintrins…

cb47c53

…iclowering-memset-pattern-ptrpat

Update test case to no longer assume global ptrs arg is loaded from g…

8fd2d02

…lobalvalue

Look through a ptrtoint cast for a constant (e.g. global ptr)

bd31315

asb changed the title ~~[PreISelIntrinsicLowering] Support producing memset_pattern16 when loading from constant global~~ [PreISelIntrinsicLowering] Look through ptrtoint for a constant (e.g. global ptr) when trying to produce memset_pattern16 Mar 5, 2025

asb added 2 commits March 12, 2025 12:07

Merge remote-tracking branch 'origin/main' into 2025q1-preiselintrins…

237c0bf

…iclowering-memset-pattern-ptrpat

Merge remote-tracking branch 'origin/main' into 2025q1-preiselintrins…

2b6d262

…iclowering-memset-pattern-ptrpat

nikic reviewed Mar 12, 2025

View reviewed changes

arsenm reviewed Mar 13, 2025

View reviewed changes

asb mentioned this pull request Mar 19, 2025

[IR] Allow llvm.experimental.memset.pattern to take any sized type as the pattern argument #132026

Merged

asb closed this Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PreISelIntrinsicLowering] Look through ptrtoint for a constant (e.g. global ptr) when trying to produce memset_pattern16 #129220

[PreISelIntrinsicLowering] Look through ptrtoint for a constant (e.g. global ptr) when trying to produce memset_pattern16 #129220

Uh oh!

asb commented Feb 28, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 28, 2025

Uh oh!

nikic commented Feb 28, 2025

Uh oh!

asb commented Mar 5, 2025 •

edited

Loading

Uh oh!

asb commented Mar 12, 2025

Uh oh!

nikic left a comment

Uh oh!

nikic Mar 12, 2025

Uh oh!

arsenm Mar 13, 2025

Uh oh!

preames commented Mar 18, 2025

Uh oh!

asb commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	// the integer argument directly for CosntantFP or ConstantInt cases.
	// the integer argument directly for ConstantFP or ConstantInt cases.

[PreISelIntrinsicLowering] Look through ptrtoint for a constant (e.g. global ptr) when trying to produce memset_pattern16 #129220

[PreISelIntrinsicLowering] Look through ptrtoint for a constant (e.g. global ptr) when trying to produce memset_pattern16 #129220

Uh oh!

Conversation

asb commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 28, 2025

Uh oh!

nikic commented Feb 28, 2025

Uh oh!

asb commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asb commented Mar 12, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

nikic Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

preames commented Mar 18, 2025

Uh oh!

asb commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

asb commented Feb 28, 2025 •

edited

Loading

asb commented Mar 5, 2025 •

edited

Loading