-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISCV][VLOPT] Support segmented store instructions #155467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV][VLOPT] Support segmented store instructions #155467
Conversation
|
@llvm/pr-subscribers-backend-risc-v Author: Min-Yih Hsu (mshockwave) ChangesAdd RISCVVLOptimizer supported for unit-stride, strided, and indexed strided segmented stores. The biggest change was adding the capability to look through INSERT_SUBREG, which was used for composing segmented register class values. Fix #149350 I was going to roll out the segmented load support -- reducing segmented load's VL from its users -- first. But Craig pointed out that it's less likely for vectorizer to generate sub-optimal VL on loads since memory operations should generally be the source of truth when it comes to VL. That being said, if anyone saw cases like that for segmented load please let me know. Patch is 21.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155467.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 53557049ea33c..e429960af72cb 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -178,6 +178,19 @@ static unsigned getIntegerExtensionOperandEEW(unsigned Factor,
return Log2EEW;
}
+#define VSEG_CASES(Prefix, EEW) \
+ RISCV::Prefix##SEG2E##EEW##_V: \
+ case RISCV::Prefix##SEG3E##EEW##_V: \
+ case RISCV::Prefix##SEG4E##EEW##_V: \
+ case RISCV::Prefix##SEG5E##EEW##_V: \
+ case RISCV::Prefix##SEG6E##EEW##_V: \
+ case RISCV::Prefix##SEG7E##EEW##_V: \
+ case RISCV::Prefix##SEG8E##EEW##_V
+#define VSSEG_CASES(EEW) VSEG_CASES(VS, EEW)
+#define VSSSEG_CASES(EEW) VSEG_CASES(VSS, EEW)
+#define VSUXSEG_CASES(EEW) VSEG_CASES(VSUX, I##EEW)
+#define VSOXSEG_CASES(EEW) VSEG_CASES(VSOX, I##EEW)
+
static std::optional<unsigned>
getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
const MachineInstr &MI = *MO.getParent();
@@ -225,21 +238,29 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
case RISCV::VSE8_V:
case RISCV::VLSE8_V:
case RISCV::VSSE8_V:
+ case VSSEG_CASES(8):
+ case VSSSEG_CASES(8):
return 3;
case RISCV::VLE16_V:
case RISCV::VSE16_V:
case RISCV::VLSE16_V:
case RISCV::VSSE16_V:
+ case VSSEG_CASES(16):
+ case VSSSEG_CASES(16):
return 4;
case RISCV::VLE32_V:
case RISCV::VSE32_V:
case RISCV::VLSE32_V:
case RISCV::VSSE32_V:
+ case VSSEG_CASES(32):
+ case VSSSEG_CASES(32):
return 5;
case RISCV::VLE64_V:
case RISCV::VSE64_V:
case RISCV::VLSE64_V:
case RISCV::VSSE64_V:
+ case VSSEG_CASES(64):
+ case VSSSEG_CASES(64):
return 6;
// Vector Indexed Instructions
@@ -248,7 +269,9 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
case RISCV::VLUXEI8_V:
case RISCV::VLOXEI8_V:
case RISCV::VSUXEI8_V:
- case RISCV::VSOXEI8_V: {
+ case RISCV::VSOXEI8_V:
+ case VSUXSEG_CASES(8):
+ case VSOXSEG_CASES(8): {
if (MO.getOperandNo() == 0)
return MILog2SEW;
return 3;
@@ -256,7 +279,9 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
case RISCV::VLUXEI16_V:
case RISCV::VLOXEI16_V:
case RISCV::VSUXEI16_V:
- case RISCV::VSOXEI16_V: {
+ case RISCV::VSOXEI16_V:
+ case VSUXSEG_CASES(16):
+ case VSOXSEG_CASES(16): {
if (MO.getOperandNo() == 0)
return MILog2SEW;
return 4;
@@ -264,7 +289,9 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
case RISCV::VLUXEI32_V:
case RISCV::VLOXEI32_V:
case RISCV::VSUXEI32_V:
- case RISCV::VSOXEI32_V: {
+ case RISCV::VSOXEI32_V:
+ case VSUXSEG_CASES(32):
+ case VSOXSEG_CASES(32): {
if (MO.getOperandNo() == 0)
return MILog2SEW;
return 5;
@@ -272,7 +299,9 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
case RISCV::VLUXEI64_V:
case RISCV::VLOXEI64_V:
case RISCV::VSUXEI64_V:
- case RISCV::VSOXEI64_V: {
+ case RISCV::VSOXEI64_V:
+ case VSUXSEG_CASES(64):
+ case VSOXSEG_CASES(64): {
if (MO.getOperandNo() == 0)
return MILog2SEW;
return 6;
@@ -1376,6 +1405,55 @@ RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
return VLOp;
}
+/// Return true if MI is an instruction used for assembling registers
+/// for segmented store instructions, namely, RISCVISD::TUPLE_INSERT.
+/// Currently it's lowered to INSERT_SUBREG.
+static bool isTupleInsertInstr(const MachineInstr &MI,
+ const MachineRegisterInfo &MRI) {
+ if (MI.getOpcode() != RISCV::INSERT_SUBREG)
+ return false;
+
+ const TargetRegisterClass *DstRC = MRI.getRegClass(MI.getOperand(0).getReg());
+ // Check whether it was lowered with the correct subreg index.
+ [[maybe_unused]] const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
+ [[maybe_unused]] unsigned SubRegIdx = MI.getOperand(3).getImm();
+ switch (DstRC->getID()) {
+ case RISCV::VRN2M1RegClassID:
+ case RISCV::VRN2M1NoV0RegClassID:
+ case RISCV::VRN3M1RegClassID:
+ case RISCV::VRN3M1NoV0RegClassID:
+ case RISCV::VRN4M1RegClassID:
+ case RISCV::VRN4M1NoV0RegClassID:
+ case RISCV::VRN5M1RegClassID:
+ case RISCV::VRN5M1NoV0RegClassID:
+ case RISCV::VRN6M1RegClassID:
+ case RISCV::VRN6M1NoV0RegClassID:
+ case RISCV::VRN7M1RegClassID:
+ case RISCV::VRN7M1NoV0RegClassID:
+ case RISCV::VRN8M1RegClassID:
+ case RISCV::VRN8M1NoV0RegClassID:
+ assert(TRI->getSubRegIdxSize(SubRegIdx) == RISCV::RVVBitsPerBlock &&
+ "unexpected subreg index for VRM1 sub-register");
+ return true;
+ case RISCV::VRN2M2RegClassID:
+ case RISCV::VRN2M2NoV0RegClassID:
+ case RISCV::VRN3M2RegClassID:
+ case RISCV::VRN3M2NoV0RegClassID:
+ case RISCV::VRN4M2RegClassID:
+ case RISCV::VRN4M2NoV0RegClassID:
+ assert(TRI->getSubRegIdxSize(SubRegIdx) == RISCV::RVVBitsPerBlock * 2 &&
+ "unexpected subreg index for VRM2 sub-register");
+ return true;
+ case RISCV::VRN2M4RegClassID:
+ case RISCV::VRN2M4NoV0RegClassID:
+ assert(TRI->getSubRegIdxSize(SubRegIdx) == RISCV::RVVBitsPerBlock * 4 &&
+ "unexpected subreg index for VRM4 sub-register");
+ return true;
+ default:
+ return false;
+ }
+}
+
std::optional<MachineOperand>
RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
std::optional<MachineOperand> CommonVL;
@@ -1396,6 +1474,13 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
continue;
}
+ if (isTupleInsertInstr(UserMI, *MRI)) {
+ LLVM_DEBUG(dbgs().indent(4) << "Peeking through uses of INSERT_SUBREG\n");
+ Worklist.insert_range(llvm::make_pointer_range(
+ MRI->use_operands(UserMI.getOperand(0).getReg())));
+ continue;
+ }
+
if (UserMI.isPHI()) {
// Don't follow PHI cycles
if (!PHISeen.insert(&UserMI).second)
diff --git a/llvm/test/CodeGen/RISCV/rvv/pr141907.ll b/llvm/test/CodeGen/RISCV/rvv/pr141907.ll
index f93f88a5bc06c..1f485ea348396 100644
--- a/llvm/test/CodeGen/RISCV/rvv/pr141907.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/pr141907.ll
@@ -12,7 +12,7 @@ define void @pr141907(ptr %0) nounwind {
; CHECK-NEXT: vmv.v.i v8, 0
; CHECK-NEXT: vmclr.m v0
; CHECK-NEXT: li a1, 0
-; CHECK-NEXT: vsetvli a5, zero, e16, mf2, ta, ma
+; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
; CHECK-NEXT: vmv.v.i v10, 0
; CHECK-NEXT: addi a2, sp, 16
; CHECK-NEXT: addi a3, sp, 20
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
index bfa4067394aa7..e6d2f133ed7fd 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
@@ -2128,6 +2128,7 @@ body: |
; CHECK-LABEL: name: vrgatherei16_vv
; CHECK: early-clobber %x:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0
%y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 5 /* e32 */, 0
$v8 = COPY %y
@@ -2139,6 +2140,7 @@ body: |
; CHECK-LABEL: name: vrgatherei16_vv_incompatible_data_eew
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0
%y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0
$v8 = COPY %y
@@ -2150,6 +2152,7 @@ body: |
; CHECK-LABEL: name: vrgatherei16_vv_incompatible_index_eew
; CHECK: %x:vr = PseudoVADD_VV_MF2 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, %x, 1, 5 /* e32 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_MF2 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0
%y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, %x, 1, 5 /* e32 */, 0
$v8 = COPY %y
@@ -2161,6 +2164,7 @@ body: |
; CHECK-LABEL: name: vrgatherei16_vv_incompatible_dest_emul
; CHECK: early-clobber %x:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0
%y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0
$v8 = COPY %y
@@ -2172,6 +2176,7 @@ body: |
; CHECK-LABEL: name: vrgatherei16_vv_incompatible_source_emul
; CHECK: %x:vr = PseudoVADD_VV_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0
%y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0
$v8 = COPY %y
@@ -2183,6 +2188,223 @@ body: |
; CHECK-LABEL: name: vrgatherei16_vv_incompatible_index_emul
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, %x, 1, 5 /* e32 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0
%y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, %x, 1, 5 /* e32 */, 0
$v8 = COPY %y
+...
+---
+name: vsseg3e32_v
+body: |
+ bb.0:
+ liveins: $v8
+
+ ; CHECK-LABEL: name: vsseg3e32_v
+ ; CHECK: liveins: $v8
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+ ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+ ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+ ; CHECK-NEXT: PseudoVSSEG3E32_V_M1 killed [[INSERT_SUBREG2]], $noreg, 1, 5 /* e32 */
+ %0:vr = COPY $v8
+ %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+ %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+ %6:vrn3m1 = IMPLICIT_DEF
+ %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+ %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+ %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+ PseudoVSSEG3E32_V_M1 killed %8, $noreg, 1, 5 /* e32 */
+...
+---
+name: vsseg3e64_v_incompatible_eew
+body: |
+ bb.0:
+ liveins: $v8
+
+ ; CHECK-LABEL: name: vsseg3e64_v_incompatible_eew
+ ; CHECK: liveins: $v8
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+ ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+ ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+ ; CHECK-NEXT: PseudoVSSEG3E64_V_M1 killed [[INSERT_SUBREG2]], $noreg, 1, 6 /* e64 */
+ %0:vr = COPY $v8
+ %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+ %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+ %6:vrn3m1 = IMPLICIT_DEF
+ %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+ %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+ %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+ PseudoVSSEG3E64_V_M1 killed %8, $noreg, 1, 6 /* e64 */
+...
+---
+name: vsseg3e32_v_incompatible_emul
+body: |
+ bb.0:
+ liveins: $v8
+
+ ; CHECK-LABEL: name: vsseg3e32_v_incompatible_emul
+ ; CHECK: liveins: $v8
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+ ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+ ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+ ; CHECK-NEXT: PseudoVSSEG3E32_V_M1 killed [[INSERT_SUBREG2]], $noreg, 1, 6 /* e64 */
+ %0:vr = COPY $v8
+ %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+ %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+ %6:vrn3m1 = IMPLICIT_DEF
+ %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+ %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+ %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+ PseudoVSSEG3E32_V_M1 killed %8, $noreg, 1, 6 /* e64 */
+...
+---
+name: vssseg3e32_v
+body: |
+ bb.0:
+ liveins: $v8
+
+ ; CHECK-LABEL: name: vssseg3e32_v
+ ; CHECK: liveins: $v8
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+ ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+ ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+ ; CHECK-NEXT: PseudoVSSSEG3E32_V_M1 killed [[INSERT_SUBREG2]], $noreg, $noreg, 1, 5 /* e32 */
+ %0:vr = COPY $v8
+ %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+ %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+ %6:vrn3m1 = IMPLICIT_DEF
+ %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+ %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+ %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+ PseudoVSSSEG3E32_V_M1 killed %8, $noreg, $noreg, 1, 5 /* e32 */
+...
+---
+name: vsuxseg3ei64_v
+body: |
+ bb.0:
+ liveins: $v8
+
+ ; CHECK-LABEL: name: vsuxseg3ei64_v
+ ; CHECK: liveins: $v8
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+ ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+ ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+ ; CHECK-NEXT: PseudoVSUXSEG3EI64_V_M2_M1 killed [[INSERT_SUBREG2]], $noreg, $noreg, 1, 5 /* e32 */
+ %0:vr = COPY $v8
+ %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+ %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+ %6:vrn3m1 = IMPLICIT_DEF
+ %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+ %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+ %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+ PseudoVSUXSEG3EI64_V_M2_M1 killed %8, $noreg, $noreg, 1, 5 /* e32 */
+...
+---
+name: vsuxseg3ei64_v_incompatible_data_eew
+body: |
+ bb.0:
+ liveins: $v8
+
+ ; CHECK-LABEL: name: vsuxseg3ei64_v_incompatible_data_eew
+ ; CHECK: liveins: $v8
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 6 /* e64 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 6 /* e64 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+ ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+ ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+ ; CHECK-NEXT: PseudoVSUXSEG3EI64_V_M2_M1 killed [[INSERT_SUBREG2]], $noreg, $noreg, 1, 5 /* e32 */
+ %0:vr = COPY $v8
+ %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 6 /* e64 */, 3 /* ta, ma */
+ %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 6 /* e64 */, 3 /* ta, ma */
+ %6:vrn3m1 = IMPLICIT_DEF
+ %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+ %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+ %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+ PseudoVSUXSEG3EI64_V_M2_M1 killed %8, $noreg, $noreg, 1, 5 /* e32 */
+...
+---
+name: vsuxseg3ei32_v_index
+body: |
+ bb.0:
+
+ ; CHECK-LABEL: name: vsuxseg3ei32_v_index
+ ; CHECK: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: PseudoVSUXSEG3EI32_V_M1_M2 $noreg, $noreg, [[PseudoVADD_VV_M1_]], 1, 6 /* e64 */
+ %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+ PseudoVSUXSEG3EI32_V_M1_M2 $noreg, $noreg, %2, 1, 6 /* e64 */
+...
+---
+name: vsuxseg3ei32_v_incompatible_index_eew
+body: |
+ bb.0:
+
+ ; CHECK-LABEL: name: vsuxseg3ei32_v_incompatible_index_eew
+ ; CHECK: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 6 /* e64 */, 3 /* ta, ma */
+ ; CHECK-NEXT: PseudoVSUXSEG3EI32_V_M1_M2 $noreg, $noreg, [[PseudoVADD_VV_M1_]], 1, 6 /* e64 */
+ %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 6 /* e64 */, 3 /* ta, ma */
+ PseudoVSUXSEG3EI32_V_M1_M2 $noreg, $noreg, %2, 1, 6 /* e64 */
+...
+---
+name: vsoxseg3ei64_v
+body: |
+ bb.0:
+ liveins: $v8
+
+ ; CHECK-LABEL: name: vsoxseg3ei64_v
+ ; CHECK: liveins: $v8
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+ ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $no...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
View the diff from clang-format here.diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 4d4f1db21..e477b9774 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -179,17 +179,16 @@ static unsigned getIntegerExtensionOperandEEW(unsigned Factor,
}
#define VSEG_CASES(Prefix, EEW) \
- RISCV::Prefix##SEG2E##EEW##_V: \
- case RISCV::Prefix##SEG3E##EEW##_V: \
+ RISCV::Prefix##SEG2E##EEW##_V : case RISCV::Prefix##SEG3E##EEW##_V: \
case RISCV::Prefix##SEG4E##EEW##_V: \
case RISCV::Prefix##SEG5E##EEW##_V: \
case RISCV::Prefix##SEG6E##EEW##_V: \
case RISCV::Prefix##SEG7E##EEW##_V: \
case RISCV::Prefix##SEG8E##EEW##_V
-#define VSSEG_CASES(EEW) VSEG_CASES(VS, EEW)
-#define VSSSEG_CASES(EEW) VSEG_CASES(VSS, EEW)
-#define VSUXSEG_CASES(EEW) VSEG_CASES(VSUX, I##EEW)
-#define VSOXSEG_CASES(EEW) VSEG_CASES(VSOX, I##EEW)
+#define VSSEG_CASES(EEW) VSEG_CASES(VS, EEW)
+#define VSSSEG_CASES(EEW) VSEG_CASES(VSS, EEW)
+#define VSUXSEG_CASES(EEW) VSEG_CASES(VSUX, I##EEW)
+#define VSOXSEG_CASES(EEW) VSEG_CASES(VSOX, I##EEW)
static std::optional<unsigned>
getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using macros, we can also use the search tables we have for segmented loads/stores here. I'm open to either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macros are fine by me, I think the search tables are on the pseudos anyway so it would be weird to mix the non-pseudo + pseudo opcodes.
39824d4 to
84546b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macros are fine by me, I think the search tables are on the pseudos anyway so it would be weird to mix the non-pseudo + pseudo opcodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we always guaranteed that every user of an insert_subreg into a vrnXm1 register is a segmented store?
E.g. is it possible via bitcasts that we might ever get something like
%8:vrn4m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
%9:vrm4 = COPY %8
PseudoVSE32_V_M1 %9:vrm4, %ptr, 1, 5 /* e32 */
Which in this case I don't think we can propagate the VL from the vse because it's not "per-segment". Is this something we can assert at the start of the pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I haven't seen this in the wild but I agree we should be a little defensive here. I'd added the check and a negative test case for it.
Is this something we can assert at the start of the pass?
Are we always guaranteed that every user of an insert_subreg into a vrnXm1 register is a segmented store?
IIUC, RISCV::INSERT_SUBREG are currently only generated from tuple_insert. Though we're not generating this tuple_insert + bitcast pattern, technically users can write it in LLVM IR. That's why I'm adding a check instead of assertion at this moment. Also, I'm just thinking out loud but vectorizer might generate this pattern in the (far) future if it wants to express strided segmented stores using vp.strided.store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RISCV::INSERT_SUBREG can be generated to insert an LMUL1 register into an LMUL2 register. This does not require segment load/store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RISCV::INSERT_SUBREG can be generated to insert an LMUL1 register into an LMUL2 register
Ah thanks for pointing that out. In that case I guess it's still not safe to propagate VL from LMUL2 to LMUL1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check isTupleInsertInstr instead of RISCV::INSERT_SUBREG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CandidateMI will be checked by isTupleInsertInstr when it got popped out the worklist, but I guess we can check it earlier. It's updated now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just add this info to TSFlags like those in RISCVII?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we probably should, there are still plenty of space in TSFlags. I'll do that in a follow-up patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a predicate in RISCVInstrPredicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to return std::nullopt if any of the users aren't insert_subregs or segmented stores? IIUC we're currently bailing in the insert_subreg_bitcast_no_peekthru test case because the VADD only has one user so CommonVL stays as nullopt. But if there was another regular non-top user of VADD I think it would still propagate that VL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we'll continue the propagation on users with known VL and use a "safe approximation" VL to substitute users with unknown VL. But reading this again it seem like this safe approximation VL is the original VL of the current MI itself.
I've fixed that and updated the test case too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think #151285 should make this less confusing. Currently checkUsers does two things, it checks to see if the EEW/EMUL are compatible, and it also computes the maximum VL of its users.
#151285 splits it out so checkUsers only does the EEW/EMUL check and leaves the maximum VL computation to the transfer function.
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I think this looks correct to me now, just have some minor comments
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7ea2959 to
f32e1f4
Compare
Add RISCVVLOptimizer supported for unit-stride, strided, and indexed strided segmented stores. The biggest change was adding the capability to look through INSERT_SUBREG, which was used for composing segmented register class values.
Fix #149350
I was going to roll out the segmented load support -- reducing segmented load's VL from its users -- first. But Craig pointed out that it's less likely for vectorizer to generate sub-optimal VL on loads since memory operations should generally be the source of truth when it comes to VL. That being said, if anyone saw cases like that for segmented load please let me know.