-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AArch64][InstCombine] Canonicalize whilelo intrinsic #151553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
InstCombine llvm.aarch64.sve.whilelo to the generic LLVM llvm.get.active.lane.mask intrinsic
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-aarch64 Author: Matthew Devereau (MDevereau) ChangesInstCombine llvm.aarch64.sve.whilelo to the generic LLVM llvm.get.active.lane.mask intrinsic Full diff: https://github.com/llvm/llvm-project/pull/151553.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 18ca22fc9f211..1220a0fc8ee82 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2696,6 +2696,15 @@ static std::optional<Instruction *> instCombineDMB(InstCombiner &IC,
return std::nullopt;
}
+static std::optional<Instruction *> instCombineWhilelo(InstCombiner &IC,
+ IntrinsicInst &II) {
+ return IC.replaceInstUsesWith(
+ II,
+ IC.Builder.CreateIntrinsic(Intrinsic::get_active_lane_mask,
+ {II.getType(), II.getOperand(0)->getType()},
+ {II.getOperand(0), II.getOperand(1)}));
+}
+
static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
IntrinsicInst &II) {
if (match(II.getOperand(0), m_ConstantInt<AArch64SVEPredPattern::all>()))
@@ -2830,6 +2839,8 @@ AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
return instCombineSVEDupqLane(IC, II);
case Intrinsic::aarch64_sve_insr:
return instCombineSVEInsr(IC, II);
+ case Intrinsic::aarch64_sve_whilelo:
+ return instCombineWhilelo(IC, II);
case Intrinsic::aarch64_sve_ptrue:
return instCombinePTrue(IC, II);
case Intrinsic::aarch64_sve_uxtb:
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll
new file mode 100644
index 0000000000000..9dde171217432
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+
+define <vscale x 4 x float> @const_whilelo_nxv4i32(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x float> @const_whilelo_nxv4i32(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 4)
+; CHECK-NEXT: [[LOAD:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 4 x i1> [[MASK]], <vscale x 4 x float> zeroinitializer)
+; CHECK-NEXT: ret <vscale x 4 x float> [[LOAD]]
+;
+ %mask = tail call <vscale x 4 x i1> @llvm.aarch64.sve.whilelo.nxv4i1.i32(i32 0, i32 4)
+ %load = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull %0, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x float> zeroinitializer)
+ ret <vscale x 4 x float> %load
+}
+
+define <vscale x 8 x float> @const_whilelo_nxv8f32(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x float> @const_whilelo_nxv8f32(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 0, i32 8)
+; CHECK-NEXT: [[LOAD:%.*]] = tail call <vscale x 8 x float> @llvm.masked.load.nxv8f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 8 x i1> [[MASK]], <vscale x 8 x float> zeroinitializer)
+; CHECK-NEXT: ret <vscale x 8 x float> [[LOAD]]
+;
+ %mask = tail call <vscale x 8 x i1> @llvm.aarch64.sve.whilelo.nxv8i1.i32(i32 0, i32 8)
+ %load = tail call <vscale x 8 x float> @llvm.masked.load.nxv8f32.p0(ptr nonnull %0, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x float> zeroinitializer)
+ ret <vscale x 8 x float> %load
+}
+
+define <vscale x 8 x i16> @const_whilelo_nxv8i16(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @const_whilelo_nxv8i16(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 0, i32 8)
+; CHECK-NEXT: [[LOAD:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 8 x i1> [[MASK]], <vscale x 8 x i16> zeroinitializer)
+; CHECK-NEXT: ret <vscale x 8 x i16> [[LOAD]]
+;
+ %mask = tail call <vscale x 8 x i1> @llvm.aarch64.sve.whilelo.nxv8i1.i16(i32 0, i32 8)
+ %load = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr nonnull %0, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i16> zeroinitializer)
+ ret <vscale x 8 x i16> %load
+}
+
+define <vscale x 16 x i8> @const_whilelo_nxv16i8(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 16 x i8> @const_whilelo_nxv16i8(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 0, i32 16)
+; CHECK-NEXT: [[LOAD:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 16 x i1> [[MASK]], <vscale x 16 x i8> zeroinitializer)
+; CHECK-NEXT: ret <vscale x 16 x i8> [[LOAD]]
+;
+ %mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i8(i32 0, i32 16)
+ %load = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull %0, i32 1, <vscale x 16 x i1> %mask, <vscale x 16 x i8> zeroinitializer)
+ ret <vscale x 16 x i8> %load
+}
+
+
+define <vscale x 16 x i8> @whilelo_nxv16i8(ptr %0, i32 %a, i32 %b) #0 {
+; CHECK-LABEL: define <vscale x 16 x i8> @whilelo_nxv16i8(
+; CHECK-SAME: ptr [[TMP0:%.*]], i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 [[A]], i32 [[B]])
+; CHECK-NEXT: [[LOAD:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 16 x i1> [[MASK]], <vscale x 16 x i8> zeroinitializer)
+; CHECK-NEXT: ret <vscale x 16 x i8> [[LOAD]]
+;
+ %mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i8(i32 %a, i32 %b)
+ %load = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull %0, i32 1, <vscale x 16 x i1> %mask, <vscale x 16 x i8> zeroinitializer)
+ ret <vscale x 16 x i8> %load
+}
|
david-arm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sensible to me. Just have a few minor comments about tests.
| define <vscale x 4 x float> @const_whilelo_nxv4i32(ptr %0) #0 { | ||
| ; CHECK-LABEL: define <vscale x 4 x float> @const_whilelo_nxv4i32( | ||
| ; CHECK-SAME: ptr [[TMP0:%.*]]) { | ||
| ; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding at least one test for the i64 variant too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've addressed all of your comments
| ret <vscale x 8 x i16> %load | ||
| } | ||
|
|
||
| define <vscale x 16 x i8> @const_whilelo_nxv16i8(ptr %0) #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you name the tests according to the predicate variant, i.e. nxv16i1, nxv8i1, etc, since you're actually testing these variants of the whilelo intrinsic. Thanks!
| ; CHECK-NEXT: [[LOAD:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 4 x i1> [[MASK]], <vscale x 4 x float> zeroinitializer) | ||
| ; CHECK-NEXT: ret <vscale x 4 x float> [[LOAD]] | ||
| ; | ||
| %mask = tail call <vscale x 4 x i1> @llvm.aarch64.sve.whilelo.nxv4i1.i32(i32 0, i32 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for the nxv2i1 variant as well?
|
With #152140 landed it should be OK to InstCombine whilelo(0,0) to get_active_lane_mask(0,0). I've added the test |
david-arm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with comment addressed!
| ; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i16(i16 0, i16 16) | ||
| ; CHECK-NEXT: ret <vscale x 16 x i1> [[MASK]] | ||
| ; | ||
| %mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i16(i16 0, i16 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are only 32 and 64 bit versions of whilelo builtins so you can drop this function and the ones below.
InstCombine llvm.aarch64.sve.whilelo to the generic LLVM llvm.get.active.lane.mask intrinsic