Skip to content

Conversation

@MDevereau
Copy link
Contributor

InstCombine llvm.aarch64.sve.whilelo to the generic LLVM llvm.get.active.lane.mask intrinsic

InstCombine llvm.aarch64.sve.whilelo to the generic LLVM
llvm.get.active.lane.mask intrinsic
@llvmbot llvmbot added backend:AArch64 llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jul 31, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Matthew Devereau (MDevereau)

Changes

InstCombine llvm.aarch64.sve.whilelo to the generic LLVM llvm.get.active.lane.mask intrinsic


Full diff: https://github.com/llvm/llvm-project/pull/151553.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+11)
  • (added) llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll (+66)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 18ca22fc9f211..1220a0fc8ee82 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2696,6 +2696,15 @@ static std::optional<Instruction *> instCombineDMB(InstCombiner &IC,
   return std::nullopt;
 }
 
+static std::optional<Instruction *> instCombineWhilelo(InstCombiner &IC,
+                                                       IntrinsicInst &II) {
+  return IC.replaceInstUsesWith(
+      II,
+      IC.Builder.CreateIntrinsic(Intrinsic::get_active_lane_mask,
+                                 {II.getType(), II.getOperand(0)->getType()},
+                                 {II.getOperand(0), II.getOperand(1)}));
+}
+
 static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
                                                      IntrinsicInst &II) {
   if (match(II.getOperand(0), m_ConstantInt<AArch64SVEPredPattern::all>()))
@@ -2830,6 +2839,8 @@ AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
     return instCombineSVEDupqLane(IC, II);
   case Intrinsic::aarch64_sve_insr:
     return instCombineSVEInsr(IC, II);
+  case Intrinsic::aarch64_sve_whilelo:
+    return instCombineWhilelo(IC, II);
   case Intrinsic::aarch64_sve_ptrue:
     return instCombinePTrue(IC, II);
   case Intrinsic::aarch64_sve_uxtb:
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll
new file mode 100644
index 0000000000000..9dde171217432
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-whilelo.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+
+define <vscale x 4 x float> @const_whilelo_nxv4i32(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x float> @const_whilelo_nxv4i32(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 4)
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 4 x i1> [[MASK]], <vscale x 4 x float> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 4 x float> [[LOAD]]
+;
+  %mask = tail call <vscale x 4 x i1> @llvm.aarch64.sve.whilelo.nxv4i1.i32(i32 0, i32 4)
+  %load = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull %0, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x float> zeroinitializer)
+  ret <vscale x 4 x float> %load
+}
+
+define <vscale x 8 x float> @const_whilelo_nxv8f32(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x float> @const_whilelo_nxv8f32(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 0, i32 8)
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 8 x float> @llvm.masked.load.nxv8f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 8 x i1> [[MASK]], <vscale x 8 x float> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 8 x float> [[LOAD]]
+;
+  %mask = tail call <vscale x 8 x i1> @llvm.aarch64.sve.whilelo.nxv8i1.i32(i32 0, i32 8)
+  %load = tail call <vscale x 8 x float> @llvm.masked.load.nxv8f32.p0(ptr nonnull %0, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x float> zeroinitializer)
+  ret <vscale x 8 x float> %load
+}
+
+define <vscale x 8 x i16> @const_whilelo_nxv8i16(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @const_whilelo_nxv8i16(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 0, i32 8)
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 8 x i1> [[MASK]], <vscale x 8 x i16> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[LOAD]]
+;
+  %mask = tail call <vscale x 8 x i1> @llvm.aarch64.sve.whilelo.nxv8i1.i16(i32 0, i32 8)
+  %load = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr nonnull %0, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i16> zeroinitializer)
+  ret <vscale x 8 x i16> %load
+}
+
+define <vscale x 16 x i8> @const_whilelo_nxv16i8(ptr %0) #0 {
+; CHECK-LABEL: define <vscale x 16 x i8> @const_whilelo_nxv16i8(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 0, i32 16)
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 16 x i1> [[MASK]], <vscale x 16 x i8> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 16 x i8> [[LOAD]]
+;
+  %mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i8(i32 0, i32 16)
+  %load = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull %0, i32 1, <vscale x 16 x i1> %mask, <vscale x 16 x i8> zeroinitializer)
+  ret <vscale x 16 x i8> %load
+}
+
+
+define <vscale x 16 x i8> @whilelo_nxv16i8(ptr %0, i32 %a, i32 %b) #0 {
+; CHECK-LABEL: define <vscale x 16 x i8> @whilelo_nxv16i8(
+; CHECK-SAME: ptr [[TMP0:%.*]], i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 [[A]], i32 [[B]])
+; CHECK-NEXT:    [[LOAD:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 16 x i1> [[MASK]], <vscale x 16 x i8> zeroinitializer)
+; CHECK-NEXT:    ret <vscale x 16 x i8> [[LOAD]]
+;
+  %mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i8(i32 %a, i32 %b)
+  %load = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr nonnull %0, i32 1, <vscale x 16 x i1> %mask, <vscale x 16 x i8> zeroinitializer)
+  ret <vscale x 16 x i8> %load
+}

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems sensible to me. Just have a few minor comments about tests.

define <vscale x 4 x float> @const_whilelo_nxv4i32(ptr %0) #0 {
; CHECK-LABEL: define <vscale x 4 x float> @const_whilelo_nxv4i32(
; CHECK-SAME: ptr [[TMP0:%.*]]) {
; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding at least one test for the i64 variant too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've addressed all of your comments

ret <vscale x 8 x i16> %load
}

define <vscale x 16 x i8> @const_whilelo_nxv16i8(ptr %0) #0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you name the tests according to the predicate variant, i.e. nxv16i1, nxv8i1, etc, since you're actually testing these variants of the whilelo intrinsic. Thanks!

; CHECK-NEXT: [[LOAD:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr nonnull [[TMP0]], i32 1, <vscale x 4 x i1> [[MASK]], <vscale x 4 x float> zeroinitializer)
; CHECK-NEXT: ret <vscale x 4 x float> [[LOAD]]
;
%mask = tail call <vscale x 4 x i1> @llvm.aarch64.sve.whilelo.nxv4i1.i32(i32 0, i32 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for the nxv2i1 variant as well?

@MDevereau
Copy link
Contributor Author

With #152140 landed it should be OK to InstCombine whilelo(0,0) to get_active_lane_mask(0,0). I've added the test whilelo_nxv16i1_0_0.

@MDevereau MDevereau requested a review from david-arm September 10, 2025 14:57
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comment addressed!

; CHECK-NEXT: [[MASK:%.*]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i16(i16 0, i16 16)
; CHECK-NEXT: ret <vscale x 16 x i1> [[MASK]]
;
%mask = tail call <vscale x 16 x i1> @llvm.aarch64.sve.whilelo.nxv16i1.i16(i16 0, i16 16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are only 32 and 64 bit versions of whilelo builtins so you can drop this function and the ones below.

@MDevereau MDevereau merged commit 1e10b78 into llvm:main Sep 12, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants