[AArch64][InstCombine] Canonicalize whilelo intrinsic #151553

MDevereau · 2025-07-31T16:18:03Z

InstCombine llvm.aarch64.sve.whilelo to the generic LLVM llvm.get.active.lane.mask intrinsic

llvmbot · 2025-07-31T16:18:33Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Matthew Devereau (MDevereau)

Changes

InstCombine llvm.aarch64.sve.whilelo to the generic LLVM llvm.get.active.lane.mask intrinsic

Full diff: https://github.com/llvm/llvm-project/pull/151553.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+11)
(added) llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll (+66)

diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 18ca22fc9f211..1220a0fc8ee82 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2696,6 +2696,15 @@ static std::optional<Instruction *> instCombineDMB(InstCombiner &IC,
   return std::nullopt;
 }
 
+static std::optional<Instruction *> instCombineWhilelo(InstCombiner &IC,
+                                                       IntrinsicInst &II) {
+  return IC.replaceInstUsesWith(
+      II,
+      IC.Builder.CreateIntrinsic(Intrinsic::get_active_lane_mask,
+                                 {II.getType(), II.getOperand(0)->getType()},
+                                 {II.getOperand(0), II.getOperand(1)}));
+}
+
 static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
                                                      IntrinsicInst &II) {
   if (match(II.getOperand(0), m_ConstantInt<AArch64SVEPredPattern::all>()))
@@ -2830,6 +2839,8 @@ AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
     return instCombineSVEDupqLane(IC, II);
   case Intrinsic::aarch64_sve_insr:
     return instCombineSVEInsr(IC, II);
+  case Intrinsic::aarch64_sve_whilelo:
+    return instCombineWhilelo(IC, II);
   case Intrinsic::aarch64_sve_ptrue:
     return instCombinePTrue(IC, II);
   case Intrinsic::aarch64_sve_uxtb:
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll
new file mode 100644
index 0000000000000..9dde171217432
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+
+define <vscale x 4 x float> @const_whilelo_nxv4i32(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x float> @const_whilelo_nxv4i32(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 4)
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 4 x i1> [[MASK]], <vscale x 4 x float> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 4 x float> [[LOAD]]
+;
+  %mask = tail call <vscale x 4 x i1> @llvm.aarch64.sve.whilelo.nxv4i1.i32(i32 0, i32 4)
+  %load = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull %0, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x float> zeroinitializer)
+  ret <vscale x 4 x float> %load
+}
+
+define <vscale x 8 x float> @const_whilelo_nxv8f32(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x float> @const_whilelo_nxv8f32(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 0, i32 8)
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 8 x float> @llvm.masked.load.nxv8f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 8 x i1> [[MASK]], <vscale x 8 x float> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 8 x float> [[LOAD]]
+;
+  %mask = tail call <vscale x 8 x i1> @llvm.aarch64.sve.whilelo.nxv8i1.i32(i32 0, i32 8)
+  %load = tail call <vscale x 8 x float> @llvm.masked.load.nxv8f32.p0(ptr nonnull %0, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x float> zeroinitializer)
+  ret <vscale x 8 x float> %load
+}
+
+define <vscale x 8 x i16> @const_whilelo_nxv8i16(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @const_whilelo_nxv8i16(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 0, i32 8)
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 8 x i1> [[MASK]], <vscale x 8 x i16> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[LOAD]]
+;
+  %mask = tail call <vscale x 8 x i1> @llvm.aarch64.sve.whilelo.nxv8i1.i16(i32 0, i32 8)
+  %load = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr nonnull %0, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i16> zeroinitializer)
+  ret <vscale x 8 x i16> %load
+}
+
+define <vscale x 16 x i8> @const_whilelo_nxv16i8(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 16 x i8> @const_whilelo_nxv16i8(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 0, i32 16)
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 16 x i1> [[MASK]], <vscale x 16 x i8> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 16 x i8> [[LOAD]]
+;
+  %mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i8(i32 0, i32 16)
+  %load = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull %0, i32 1, <vscale x 16 x i1> %mask, <vscale x 16 x i8> zeroinitializer)
+  ret <vscale x 16 x i8> %load
+}
+
+
+define <vscale x 16 x i8> @whilelo_nxv16i8(ptr %0, i32 %a, i32 %b) #0 {
+; CHECK-LABEL: define <vscale x 16 x i8> @whilelo_nxv16i8(
+; CHECK-SAME: ptr [[TMP0:%.*]], i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 [[A]], i32 [[B]])
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 16 x i1> [[MASK]], <vscale x 16 x i8> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 16 x i8> [[LOAD]]
+;
+  %mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i8(i32 %a, i32 %b)
+  %load = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull %0, i32 1, <vscale x 16 x i1> %mask, <vscale x 16 x i8> zeroinitializer)
+  ret <vscale x 16 x i8> %load
+}

david-arm

Seems sensible to me. Just have a few minor comments about tests.

david-arm · 2025-08-01T09:37:38Z

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll

+define <vscale x 4 x float> @const_whilelo_nxv4i32(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x float> @const_whilelo_nxv4i32(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 4)


It might be worth adding at least one test for the i64 variant too?

Thanks. I've addressed all of your comments

david-arm · 2025-08-01T09:43:22Z

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll

+  ret <vscale x 8 x i16> %load
+}
+
+define <vscale x 16 x i8> @const_whilelo_nxv16i8(ptr %0) #0 {


nit: Can you name the tests according to the predicate variant, i.e. nxv16i1, nxv8i1, etc, since you're actually testing these variants of the whilelo intrinsic. Thanks!

david-arm · 2025-08-01T09:45:03Z

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll

+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 4 x i1> [[MASK]], <vscale x 4 x float> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 4 x float> [[LOAD]]
+;
+  %mask = tail call <vscale x 4 x i1> @llvm.aarch64.sve.whilelo.nxv4i1.i32(i32 0, i32 4)


Can you add a test for the nxv2i1 variant as well?

MDevereau · 2025-09-10T14:57:07Z

With #152140 landed it should be OK to InstCombine whilelo(0,0) to get_active_lane_mask(0,0). I've added the test whilelo_nxv16i1_0_0.

david-arm

LGTM with comment addressed!

david-arm · 2025-09-11T13:20:00Z

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll

+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i16(i16 0, i16 16)
+; CHECK-NEXT:    ret <vscale x 16 x i1> [[MASK]]
+;
+  %mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i16(i16 0, i16 16)


I think there are only 32 and 64 bit versions of whilelo builtins so you can drop this function and the ones below.

[AArch64][InstCombine] Canonicalize whilelo intrinsic

f4711c6

InstCombine llvm.aarch64.sve.whilelo to the generic LLVM llvm.get.active.lane.mask intrinsic

MDevereau requested review from davemgreen and david-arm July 31, 2025 16:18

llvmbot added backend:AArch64 llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jul 31, 2025

david-arm reviewed Aug 1, 2025

View reviewed changes

MDevereau added 3 commits August 1, 2025 14:27

Add i64 test, rename tests, add nxv2i1 test

90ba31b

Remove loads from tests

e400fe1

Add test for (0,0)

614b6f9

MDevereau requested a review from david-arm September 10, 2025 14:57

david-arm approved these changes Sep 11, 2025

View reviewed changes

Remove 16 bit tests

61572eb

MDevereau merged commit 1e10b78 into llvm:main Sep 12, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64][InstCombine] Canonicalize whilelo intrinsic #151553

[AArch64][InstCombine] Canonicalize whilelo intrinsic #151553

Uh oh!

MDevereau commented Jul 31, 2025

Uh oh!

llvmbot commented Jul 31, 2025 •

edited

Loading

Uh oh!

david-arm left a comment

Uh oh!

david-arm Aug 1, 2025

Uh oh!

MDevereau Aug 1, 2025

Uh oh!

david-arm Aug 1, 2025

Uh oh!

david-arm Aug 1, 2025

Uh oh!

MDevereau commented Sep 10, 2025

Uh oh!

david-arm left a comment

Uh oh!

david-arm Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AArch64][InstCombine] Canonicalize whilelo intrinsic #151553

[AArch64][InstCombine] Canonicalize whilelo intrinsic #151553

Uh oh!

Conversation

MDevereau commented Jul 31, 2025

Uh oh!

llvmbot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

david-arm Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

MDevereau Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

MDevereau commented Sep 10, 2025

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

david-arm Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Jul 31, 2025 •

edited

Loading