Skip to content

Conversation

@mshockwave
Copy link
Member

Add RISCVVLOptimizer supported for unit-stride, strided, and indexed strided segmented stores. The biggest change was adding the capability to look through INSERT_SUBREG, which was used for composing segmented register class values.

Fix #149350


I was going to roll out the segmented load support -- reducing segmented load's VL from its users -- first. But Craig pointed out that it's less likely for vectorizer to generate sub-optimal VL on loads since memory operations should generally be the source of truth when it comes to VL. That being said, if anyone saw cases like that for segmented load please let me know.

@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

Add RISCVVLOptimizer supported for unit-stride, strided, and indexed strided segmented stores. The biggest change was adding the capability to look through INSERT_SUBREG, which was used for composing segmented register class values.

Fix #149350


I was going to roll out the segmented load support -- reducing segmented load's VL from its users -- first. But Craig pointed out that it's less likely for vectorizer to generate sub-optimal VL on loads since memory operations should generally be the source of truth when it comes to VL. That being said, if anyone saw cases like that for segmented load please let me know.


Patch is 21.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155467.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp (+89-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/pr141907.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir (+222)
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 53557049ea33c..e429960af72cb 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -178,6 +178,19 @@ static unsigned getIntegerExtensionOperandEEW(unsigned Factor,
   return Log2EEW;
 }
 
+#define VSEG_CASES(Prefix, EEW)                                                \
+  RISCV::Prefix##SEG2E##EEW##_V:                                               \
+  case RISCV::Prefix##SEG3E##EEW##_V:                                          \
+  case RISCV::Prefix##SEG4E##EEW##_V:                                          \
+  case RISCV::Prefix##SEG5E##EEW##_V:                                          \
+  case RISCV::Prefix##SEG6E##EEW##_V:                                          \
+  case RISCV::Prefix##SEG7E##EEW##_V:                                          \
+  case RISCV::Prefix##SEG8E##EEW##_V
+#define VSSEG_CASES(EEW)    VSEG_CASES(VS, EEW)
+#define VSSSEG_CASES(EEW)   VSEG_CASES(VSS, EEW)
+#define VSUXSEG_CASES(EEW)  VSEG_CASES(VSUX, I##EEW)
+#define VSOXSEG_CASES(EEW)  VSEG_CASES(VSOX, I##EEW)
+
 static std::optional<unsigned>
 getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
   const MachineInstr &MI = *MO.getParent();
@@ -225,21 +238,29 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
   case RISCV::VSE8_V:
   case RISCV::VLSE8_V:
   case RISCV::VSSE8_V:
+  case VSSEG_CASES(8):
+  case VSSSEG_CASES(8):
     return 3;
   case RISCV::VLE16_V:
   case RISCV::VSE16_V:
   case RISCV::VLSE16_V:
   case RISCV::VSSE16_V:
+  case VSSEG_CASES(16):
+  case VSSSEG_CASES(16):
     return 4;
   case RISCV::VLE32_V:
   case RISCV::VSE32_V:
   case RISCV::VLSE32_V:
   case RISCV::VSSE32_V:
+  case VSSEG_CASES(32):
+  case VSSSEG_CASES(32):
     return 5;
   case RISCV::VLE64_V:
   case RISCV::VSE64_V:
   case RISCV::VLSE64_V:
   case RISCV::VSSE64_V:
+  case VSSEG_CASES(64):
+  case VSSSEG_CASES(64):
     return 6;
 
   // Vector Indexed Instructions
@@ -248,7 +269,9 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
   case RISCV::VLUXEI8_V:
   case RISCV::VLOXEI8_V:
   case RISCV::VSUXEI8_V:
-  case RISCV::VSOXEI8_V: {
+  case RISCV::VSOXEI8_V:
+  case VSUXSEG_CASES(8):
+  case VSOXSEG_CASES(8): {
     if (MO.getOperandNo() == 0)
       return MILog2SEW;
     return 3;
@@ -256,7 +279,9 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
   case RISCV::VLUXEI16_V:
   case RISCV::VLOXEI16_V:
   case RISCV::VSUXEI16_V:
-  case RISCV::VSOXEI16_V: {
+  case RISCV::VSOXEI16_V:
+  case VSUXSEG_CASES(16):
+  case VSOXSEG_CASES(16): {
     if (MO.getOperandNo() == 0)
       return MILog2SEW;
     return 4;
@@ -264,7 +289,9 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
   case RISCV::VLUXEI32_V:
   case RISCV::VLOXEI32_V:
   case RISCV::VSUXEI32_V:
-  case RISCV::VSOXEI32_V: {
+  case RISCV::VSOXEI32_V:
+  case VSUXSEG_CASES(32):
+  case VSOXSEG_CASES(32): {
     if (MO.getOperandNo() == 0)
       return MILog2SEW;
     return 5;
@@ -272,7 +299,9 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
   case RISCV::VLUXEI64_V:
   case RISCV::VLOXEI64_V:
   case RISCV::VSUXEI64_V:
-  case RISCV::VSOXEI64_V: {
+  case RISCV::VSOXEI64_V:
+  case VSUXSEG_CASES(64):
+  case VSOXSEG_CASES(64): {
     if (MO.getOperandNo() == 0)
       return MILog2SEW;
     return 6;
@@ -1376,6 +1405,55 @@ RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
   return VLOp;
 }
 
+/// Return true if MI is an instruction used for assembling registers
+/// for segmented store instructions, namely, RISCVISD::TUPLE_INSERT.
+/// Currently it's lowered to INSERT_SUBREG.
+static bool isTupleInsertInstr(const MachineInstr &MI,
+                               const MachineRegisterInfo &MRI) {
+  if (MI.getOpcode() != RISCV::INSERT_SUBREG)
+    return false;
+
+  const TargetRegisterClass *DstRC = MRI.getRegClass(MI.getOperand(0).getReg());
+  // Check whether it was lowered with the correct subreg index.
+  [[maybe_unused]] const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
+  [[maybe_unused]] unsigned SubRegIdx = MI.getOperand(3).getImm();
+  switch (DstRC->getID()) {
+  case RISCV::VRN2M1RegClassID:
+  case RISCV::VRN2M1NoV0RegClassID:
+  case RISCV::VRN3M1RegClassID:
+  case RISCV::VRN3M1NoV0RegClassID:
+  case RISCV::VRN4M1RegClassID:
+  case RISCV::VRN4M1NoV0RegClassID:
+  case RISCV::VRN5M1RegClassID:
+  case RISCV::VRN5M1NoV0RegClassID:
+  case RISCV::VRN6M1RegClassID:
+  case RISCV::VRN6M1NoV0RegClassID:
+  case RISCV::VRN7M1RegClassID:
+  case RISCV::VRN7M1NoV0RegClassID:
+  case RISCV::VRN8M1RegClassID:
+  case RISCV::VRN8M1NoV0RegClassID:
+    assert(TRI->getSubRegIdxSize(SubRegIdx) == RISCV::RVVBitsPerBlock &&
+           "unexpected subreg index for VRM1 sub-register");
+    return true;
+  case RISCV::VRN2M2RegClassID:
+  case RISCV::VRN2M2NoV0RegClassID:
+  case RISCV::VRN3M2RegClassID:
+  case RISCV::VRN3M2NoV0RegClassID:
+  case RISCV::VRN4M2RegClassID:
+  case RISCV::VRN4M2NoV0RegClassID:
+    assert(TRI->getSubRegIdxSize(SubRegIdx) == RISCV::RVVBitsPerBlock * 2 &&
+           "unexpected subreg index for VRM2 sub-register");
+    return true;
+  case RISCV::VRN2M4RegClassID:
+  case RISCV::VRN2M4NoV0RegClassID:
+    assert(TRI->getSubRegIdxSize(SubRegIdx) == RISCV::RVVBitsPerBlock * 4 &&
+           "unexpected subreg index for VRM4 sub-register");
+    return true;
+  default:
+    return false;
+  }
+}
+
 std::optional<MachineOperand>
 RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
   std::optional<MachineOperand> CommonVL;
@@ -1396,6 +1474,13 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
       continue;
     }
 
+    if (isTupleInsertInstr(UserMI, *MRI)) {
+      LLVM_DEBUG(dbgs().indent(4) << "Peeking through uses of INSERT_SUBREG\n");
+      Worklist.insert_range(llvm::make_pointer_range(
+          MRI->use_operands(UserMI.getOperand(0).getReg())));
+      continue;
+    }
+
     if (UserMI.isPHI()) {
       // Don't follow PHI cycles
       if (!PHISeen.insert(&UserMI).second)
diff --git a/llvm/test/CodeGen/RISCV/rvv/pr141907.ll b/llvm/test/CodeGen/RISCV/rvv/pr141907.ll
index f93f88a5bc06c..1f485ea348396 100644
--- a/llvm/test/CodeGen/RISCV/rvv/pr141907.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/pr141907.ll
@@ -12,7 +12,7 @@ define void @pr141907(ptr %0) nounwind {
 ; CHECK-NEXT:    vmv.v.i v8, 0
 ; CHECK-NEXT:    vmclr.m v0
 ; CHECK-NEXT:    li a1, 0
-; CHECK-NEXT:    vsetvli a5, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v10, 0
 ; CHECK-NEXT:    addi a2, sp, 16
 ; CHECK-NEXT:    addi a3, sp, 20
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
index bfa4067394aa7..e6d2f133ed7fd 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
@@ -2128,6 +2128,7 @@ body: |
     ; CHECK-LABEL: name: vrgatherei16_vv
     ; CHECK: early-clobber %x:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 5 /* e32 */, 0
     $v8 = COPY %y
@@ -2139,6 +2140,7 @@ body: |
     ; CHECK-LABEL: name: vrgatherei16_vv_incompatible_data_eew
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0
     %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0
     $v8 = COPY %y
@@ -2150,6 +2152,7 @@ body: |
     ; CHECK-LABEL: name: vrgatherei16_vv_incompatible_index_eew
     ; CHECK: %x:vr = PseudoVADD_VV_MF2 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, %x, 1, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_MF2 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0
     %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, %x, 1, 5 /* e32 */, 0
     $v8 = COPY %y
@@ -2161,6 +2164,7 @@ body: |
     ; CHECK-LABEL: name: vrgatherei16_vv_incompatible_dest_emul
     ; CHECK: early-clobber %x:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0
     %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0
     $v8 = COPY %y
@@ -2172,6 +2176,7 @@ body: |
     ; CHECK-LABEL: name: vrgatherei16_vv_incompatible_source_emul
     ; CHECK: %x:vr = PseudoVADD_VV_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_MF2 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0
     %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, %x, $noreg, 1, 5 /* e32 */, 0
     $v8 = COPY %y
@@ -2183,6 +2188,223 @@ body: |
     ; CHECK-LABEL: name: vrgatherei16_vv_incompatible_index_emul
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, %x, 1, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0
     %y:vr = PseudoVRGATHEREI16_VV_M1_E32_MF2 $noreg, $noreg, %x, 1, 5 /* e32 */, 0
     $v8 = COPY %y
+...
+---
+name: vsseg3e32_v
+body: |
+  bb.0:
+    liveins: $v8
+
+    ; CHECK-LABEL: name: vsseg3e32_v
+    ; CHECK: liveins: $v8
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+    ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+    ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+    ; CHECK-NEXT: PseudoVSSEG3E32_V_M1 killed [[INSERT_SUBREG2]], $noreg, 1, 5 /* e32 */
+    %0:vr = COPY $v8
+    %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+    %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+    %6:vrn3m1 = IMPLICIT_DEF
+    %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+    %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+    %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+    PseudoVSSEG3E32_V_M1 killed %8, $noreg, 1, 5 /* e32 */
+...
+---
+name: vsseg3e64_v_incompatible_eew
+body: |
+  bb.0:
+    liveins: $v8
+
+    ; CHECK-LABEL: name: vsseg3e64_v_incompatible_eew
+    ; CHECK: liveins: $v8
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+    ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+    ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+    ; CHECK-NEXT: PseudoVSSEG3E64_V_M1 killed [[INSERT_SUBREG2]], $noreg, 1, 6 /* e64 */
+    %0:vr = COPY $v8
+    %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+    %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+    %6:vrn3m1 = IMPLICIT_DEF
+    %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+    %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+    %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+    PseudoVSSEG3E64_V_M1 killed %8, $noreg, 1, 6 /* e64 */
+...
+---
+name: vsseg3e32_v_incompatible_emul
+body: |
+  bb.0:
+    liveins: $v8
+
+    ; CHECK-LABEL: name: vsseg3e32_v_incompatible_emul
+    ; CHECK: liveins: $v8
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+    ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+    ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+    ; CHECK-NEXT: PseudoVSSEG3E32_V_M1 killed [[INSERT_SUBREG2]], $noreg, 1, 6 /* e64 */
+    %0:vr = COPY $v8
+    %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+    %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+    %6:vrn3m1 = IMPLICIT_DEF
+    %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+    %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+    %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+    PseudoVSSEG3E32_V_M1 killed %8, $noreg, 1, 6 /* e64 */
+...
+---
+name: vssseg3e32_v
+body: |
+  bb.0:
+    liveins: $v8
+
+    ; CHECK-LABEL: name: vssseg3e32_v
+    ; CHECK: liveins: $v8
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+    ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+    ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+    ; CHECK-NEXT: PseudoVSSSEG3E32_V_M1 killed [[INSERT_SUBREG2]], $noreg, $noreg, 1, 5 /* e32 */
+    %0:vr = COPY $v8
+    %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+    %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+    %6:vrn3m1 = IMPLICIT_DEF
+    %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+    %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+    %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+    PseudoVSSSEG3E32_V_M1 killed %8, $noreg, $noreg, 1, 5 /* e32 */
+...
+---
+name: vsuxseg3ei64_v
+body: |
+  bb.0:
+    liveins: $v8
+
+    ; CHECK-LABEL: name: vsuxseg3ei64_v
+    ; CHECK: liveins: $v8
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+    ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+    ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+    ; CHECK-NEXT: PseudoVSUXSEG3EI64_V_M2_M1 killed [[INSERT_SUBREG2]], $noreg, $noreg, 1, 5 /* e32 */
+    %0:vr = COPY $v8
+    %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 5 /* e32 */, 3 /* ta, ma */
+    %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+    %6:vrn3m1 = IMPLICIT_DEF
+    %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+    %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+    %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+    PseudoVSUXSEG3EI64_V_M2_M1 killed %8, $noreg, $noreg, 1, 5 /* e32 */
+...
+---
+name: vsuxseg3ei64_v_incompatible_data_eew
+body: |
+  bb.0:
+    liveins: $v8
+
+    ; CHECK-LABEL: name: vsuxseg3ei64_v_incompatible_data_eew
+    ; CHECK: liveins: $v8
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 6 /* e64 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 6 /* e64 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vrn3m1 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[INSERT_SUBREG:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[DEF]], [[COPY]], %subreg.sub_vrm1_0
+    ; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG]], [[PseudoVADD_VV_M1_]], %subreg.sub_vrm1_1
+    ; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:vrn3m1 = INSERT_SUBREG [[INSERT_SUBREG1]], [[PseudoVADD_VV_M1_1]], %subreg.sub_vrm1_2
+    ; CHECK-NEXT: PseudoVSUXSEG3EI64_V_M2_M1 killed [[INSERT_SUBREG2]], $noreg, $noreg, 1, 5 /* e32 */
+    %0:vr = COPY $v8
+    %1:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 10, 6 /* e64 */, 3 /* ta, ma */
+    %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 6 /* e64 */, 3 /* ta, ma */
+    %6:vrn3m1 = IMPLICIT_DEF
+    %5:vrn3m1 = INSERT_SUBREG %6, %0, %subreg.sub_vrm1_0
+    %7:vrn3m1 = INSERT_SUBREG %5, %1, %subreg.sub_vrm1_1
+    %8:vrn3m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
+    PseudoVSUXSEG3EI64_V_M2_M1 killed %8, $noreg, $noreg, 1, 5 /* e32 */
+...
+---
+name: vsuxseg3ei32_v_index
+body: |
+  bb.0:
+
+    ; CHECK-LABEL: name: vsuxseg3ei32_v_index
+    ; CHECK: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: PseudoVSUXSEG3EI32_V_M1_M2 $noreg, $noreg, [[PseudoVADD_VV_M1_]], 1, 6 /* e64 */
+    %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 3 /* ta, ma */
+    PseudoVSUXSEG3EI32_V_M1_M2 $noreg, $noreg, %2, 1, 6 /* e64 */
+...
+---
+name: vsuxseg3ei32_v_incompatible_index_eew
+body: |
+  bb.0:
+
+    ; CHECK-LABEL: name: vsuxseg3ei32_v_incompatible_index_eew
+    ; CHECK: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 6 /* e64 */, 3 /* ta, ma */
+    ; CHECK-NEXT: PseudoVSUXSEG3EI32_V_M1_M2 $noreg, $noreg, [[PseudoVADD_VV_M1_]], 1, 6 /* e64 */
+    %2:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 6 /* e64 */, 3 /* ta, ma */
+    PseudoVSUXSEG3EI32_V_M1_M2 $noreg, $noreg, %2, 1, 6 /* e64 */
+...
+---
+name: vsoxseg3ei64_v
+body: |
+  bb.0:
+    liveins: $v8
+
+    ; CHECK-LABEL: name: vsoxseg3ei64_v
+    ; CHECK: liveins: $v8
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vr = COPY $v8
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 5 /* e32 */, 3 /* ta, ma */
+    ; CHECK-NEXT: [[PseudoVADD_VV_M1_1:%[0-9]+]]:vr = PseudoVADD_VV_M1 $noreg, $noreg, $no...
[truncated]

@github-actions
Copy link

github-actions bot commented Aug 26, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 4d4f1db21..e477b9774 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -179,17 +179,16 @@ static unsigned getIntegerExtensionOperandEEW(unsigned Factor,
 }
 
 #define VSEG_CASES(Prefix, EEW)                                                \
-  RISCV::Prefix##SEG2E##EEW##_V:                                               \
-  case RISCV::Prefix##SEG3E##EEW##_V:                                          \
+  RISCV::Prefix##SEG2E##EEW##_V : case RISCV::Prefix##SEG3E##EEW##_V:          \
   case RISCV::Prefix##SEG4E##EEW##_V:                                          \
   case RISCV::Prefix##SEG5E##EEW##_V:                                          \
   case RISCV::Prefix##SEG6E##EEW##_V:                                          \
   case RISCV::Prefix##SEG7E##EEW##_V:                                          \
   case RISCV::Prefix##SEG8E##EEW##_V
-#define VSSEG_CASES(EEW)    VSEG_CASES(VS, EEW)
-#define VSSSEG_CASES(EEW)   VSEG_CASES(VSS, EEW)
-#define VSUXSEG_CASES(EEW)  VSEG_CASES(VSUX, I##EEW)
-#define VSOXSEG_CASES(EEW)  VSEG_CASES(VSOX, I##EEW)
+#define VSSEG_CASES(EEW) VSEG_CASES(VS, EEW)
+#define VSSSEG_CASES(EEW) VSEG_CASES(VSS, EEW)
+#define VSUXSEG_CASES(EEW) VSEG_CASES(VSUX, I##EEW)
+#define VSOXSEG_CASES(EEW) VSEG_CASES(VSOX, I##EEW)
 
 static std::optional<unsigned>
 getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using macros, we can also use the search tables we have for segmented loads/stores here. I'm open to either way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macros are fine by me, I think the search tables are on the pseudos anyway so it would be weird to mix the non-pseudo + pseudo opcodes.

@mshockwave mshockwave force-pushed the patch/rvv/vl-opt-segmented-store branch from 39824d4 to 84546b8 Compare August 26, 2025 18:47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macros are fine by me, I think the search tables are on the pseudos anyway so it would be weird to mix the non-pseudo + pseudo opcodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we always guaranteed that every user of an insert_subreg into a vrnXm1 register is a segmented store?

E.g. is it possible via bitcasts that we might ever get something like

%8:vrn4m1 = INSERT_SUBREG %7, %2, %subreg.sub_vrm1_2
%9:vrm4 = COPY %8
PseudoVSE32_V_M1 %9:vrm4, %ptr, 1, 5 /* e32 */

Which in this case I don't think we can propagate the VL from the vse because it's not "per-segment". Is this something we can assert at the start of the pass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I haven't seen this in the wild but I agree we should be a little defensive here. I'd added the check and a negative test case for it.

Is this something we can assert at the start of the pass?
Are we always guaranteed that every user of an insert_subreg into a vrnXm1 register is a segmented store?

IIUC, RISCV::INSERT_SUBREG are currently only generated from tuple_insert. Though we're not generating this tuple_insert + bitcast pattern, technically users can write it in LLVM IR. That's why I'm adding a check instead of assertion at this moment. Also, I'm just thinking out loud but vectorizer might generate this pattern in the (far) future if it wants to express strided segmented stores using vp.strided.store.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RISCV::INSERT_SUBREG can be generated to insert an LMUL1 register into an LMUL2 register. This does not require segment load/store

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RISCV::INSERT_SUBREG can be generated to insert an LMUL1 register into an LMUL2 register

Ah thanks for pointing that out. In that case I guess it's still not safe to propagate VL from LMUL2 to LMUL1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check isTupleInsertInstr instead of RISCV::INSERT_SUBREG?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CandidateMI will be checked by isTupleInsertInstr when it got popped out the worklist, but I guess we can check it earlier. It's updated now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just add this info to TSFlags like those in RISCVII?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we probably should, there are still plenty of space in TSFlags. I'll do that in a follow-up patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a predicate in RISCVInstrPredicates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to return std::nullopt if any of the users aren't insert_subregs or segmented stores? IIUC we're currently bailing in the insert_subreg_bitcast_no_peekthru test case because the VADD only has one user so CommonVL stays as nullopt. But if there was another regular non-top user of VADD I think it would still propagate that VL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we'll continue the propagation on users with known VL and use a "safe approximation" VL to substitute users with unknown VL. But reading this again it seem like this safe approximation VL is the original VL of the current MI itself.
I've fixed that and updated the test case too.

Copy link
Contributor

@lukel97 lukel97 Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #151285 should make this less confusing. Currently checkUsers does two things, it checks to see if the EEW/EMUL are compatible, and it also computes the maximum VL of its users.

#151285 splits it out so checkUsers only does the EEW/EMUL check and leaves the maximum VL computation to the transfer function.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I think this looks correct to me now, just have some minor comments

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mshockwave mshockwave force-pushed the patch/rvv/vl-opt-segmented-store branch from 7ea2959 to f32e1f4 Compare September 4, 2025 19:00
@mshockwave mshockwave enabled auto-merge (squash) September 4, 2025 19:02
@mshockwave mshockwave merged commit b45582f into llvm:main Sep 4, 2025
8 of 9 checks passed
@mshockwave mshockwave deleted the patch/rvv/vl-opt-segmented-store branch September 4, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RISCV] Propagate VL from segment store in VLoptimizer

5 participants