[RISCV][DAGCombiner] Fix potential missed combine in VL->VW extension #168026

OMG-link · 2025-11-14T07:33:03Z

Problem:

The previous implementation of combineOp_VLToVWOp_VL maintained a private worklist to fold multiple nodes, but did not inform the external DAGCombiner. This could lead to missed combines, because combine logic of some users of updated nodes are outside this function.

Changes:

Simplify combineOp_VLToVWOp_VL
- Only attempts to combine the input node.
- Leaves replacement and user updates to the external DAGCombiner worklist.
Remove obsolete command line option in tests
- The -ext-max-web-size option was only used to limit the size of the private worklist. It is now unused and has been deleted.
- Updated tests by removing RUN lines containing this option, keeping the test coverage:
  - fixed-vectors-vfw-web-simplification.ll
  - fixed-vectors-vw-web-simplification.ll
  - vscale-vw-web-simplification.ll
Add a new test
- combine-vl-vw-macc.ll verifies that consecutive vwmacc-like operations are correctly generated.

Behavioral Improvement

A minimal reproducer (combine-vl-vw-macc.ll) is included in the test suite.

Before this patch, the DAG produces:

1× vwmacc.vx
1× vwmul.vx
1× vadd.vv

After this patch, both mul+add chains are folded into:

2× vwmacc.vx
no redundant vwmul/vadd

Verification:

Ran ninja check-llvm-codegen-riscv and confirmed the new test passes.
Checked that the patch generates the intended vwmacc.vx instructions.

github-actions · 2025-11-14T07:33:25Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-11-14T07:33:55Z

@llvm/pr-subscribers-backend-risc-v

Author: None (OMG-link)

Changes

Problem:

The previous implementation of combineOp_VLToVWOp_VL maintained a private worklist to fold multiple nodes, but did not inform the external DAGCombiner. This could lead to missed combines, because combine logic of some users of updated nodes are outside this function.

Changes:

Simplify combineOp_VLToVWOp_VL
- Only attempts to combine the input node.
- Leaves replacement and user updates to the external DAGCombiner worklist.
Remove obsolete code and tests
- The -ext-max-web-size option was only used to limit the size of this private worklist. It is now unused and has been deleted.
- Removed tests that relied on this option:
  - fixed-vectors-vfw-web-simplification.ll
  - fixed-vectors-vw-web-simplification.ll
  - vscale-vw-web-simplification.ll
Add a new test
- combine-vl-vw-macc.ll verifies that consecutive vwmacc-like operations are correctly generated.

Behavioral Improvement

A minimal reproducer (combine-vl-vw-macc.ll) is included in the test suite.

Before this patch, the DAG produces:

1× vwmacc.vx
1× vwmul.vx
1× vadd.vv

After this patch, both mul+add chains are folded into:

2× vwmacc.vx
no redundant vwmul/vadd

Verification:

Ran ninja check-llvm-codegen-riscv and confirmed the new test passes.
Checked that the patch generates the intended vwmacc.vx instructions.

Patch is 51.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168026.diff

6 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+18-103)
(added) llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll (+54)
(removed) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfw-web-simplification.ll (-238)
(removed) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vw-web-simplification.ll (-100)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll (+9-9)
(removed) llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll (-652)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 38cce26e44af4..ae29b39d5b675 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -57,12 +57,6 @@ using namespace llvm;
 
 STATISTIC(NumTailCalls, "Number of tail calls");
 
-static cl::opt<unsigned> ExtensionMaxWebSize(
-    DEBUG_TYPE "-ext-max-web-size", cl::Hidden,
-    cl::desc("Give the maximum size (in number of nodes) of the web of "
-             "instructions that we will consider for VW expansion"),
-    cl::init(18));
-
 static cl::opt<bool>
     AllowSplatInVW_W(DEBUG_TYPE "-form-vw-w-with-splat", cl::Hidden,
                      cl::desc("Allow the formation of VW_W operations (e.g., "
@@ -18201,109 +18195,30 @@ static SDValue combineOp_VLToVWOp_VL(SDNode *N,
   if (!NodeExtensionHelper::isSupportedRoot(N, Subtarget))
     return SDValue();
 
-  SmallVector<SDNode *> Worklist;
-  SmallPtrSet<SDNode *, 8> Inserted;
-  SmallPtrSet<SDNode *, 8> ExtensionsToRemove;
-  Worklist.push_back(N);
-  Inserted.insert(N);
-  SmallVector<CombineResult> CombinesToApply;
-
-  while (!Worklist.empty()) {
-    SDNode *Root = Worklist.pop_back_val();
-
-    NodeExtensionHelper LHS(Root, 0, DAG, Subtarget);
-    NodeExtensionHelper RHS(Root, 1, DAG, Subtarget);
-    auto AppendUsersIfNeeded =
-        [&Worklist, &Subtarget, &Inserted,
-         &ExtensionsToRemove](const NodeExtensionHelper &Op) {
-          if (Op.needToPromoteOtherUsers()) {
-            // Remember that we're supposed to remove this extension.
-            ExtensionsToRemove.insert(Op.OrigOperand.getNode());
-            for (SDUse &Use : Op.OrigOperand->uses()) {
-              SDNode *TheUser = Use.getUser();
-              if (!NodeExtensionHelper::isSupportedRoot(TheUser, Subtarget))
-                return false;
-              // We only support the first 2 operands of FMA.
-              if (Use.getOperandNo() >= 2)
-                return false;
-              if (Inserted.insert(TheUser).second)
-                Worklist.push_back(TheUser);
-            }
-          }
-          return true;
-        };
+  NodeExtensionHelper LHS(N, 0, DAG, Subtarget);
+  NodeExtensionHelper RHS(N, 1, DAG, Subtarget);
 
-    // Control the compile time by limiting the number of node we look at in
-    // total.
-    if (Inserted.size() > ExtensionMaxWebSize)
-      return SDValue();
+  SmallVector<NodeExtensionHelper::CombineToTry> FoldingStrategies =
+      NodeExtensionHelper::getSupportedFoldings(N);
 
-    SmallVector<NodeExtensionHelper::CombineToTry> FoldingStrategies =
-        NodeExtensionHelper::getSupportedFoldings(Root);
-
-    assert(!FoldingStrategies.empty() && "Nothing to be folded");
-    bool Matched = false;
-    for (int Attempt = 0;
-         (Attempt != 1 + NodeExtensionHelper::isCommutative(Root)) && !Matched;
-         ++Attempt) {
-
-      for (NodeExtensionHelper::CombineToTry FoldingStrategy :
-           FoldingStrategies) {
-        std::optional<CombineResult> Res =
-            FoldingStrategy(Root, LHS, RHS, DAG, Subtarget);
-        if (Res) {
-          // If this strategy wouldn't remove an extension we're supposed to
-          // remove, reject it.
-          if (!Res->LHSExt.has_value() &&
-              ExtensionsToRemove.contains(LHS.OrigOperand.getNode()))
-            continue;
-          if (!Res->RHSExt.has_value() &&
-              ExtensionsToRemove.contains(RHS.OrigOperand.getNode()))
-            continue;
+  if (FoldingStrategies.empty())
+    return SDValue();
 
-          Matched = true;
-          CombinesToApply.push_back(*Res);
-          // All the inputs that are extended need to be folded, otherwise
-          // we would be leaving the old input (since it is may still be used),
-          // and the new one.
-          if (Res->LHSExt.has_value())
-            if (!AppendUsersIfNeeded(LHS))
-              return SDValue();
-          if (Res->RHSExt.has_value())
-            if (!AppendUsersIfNeeded(RHS))
-              return SDValue();
-          break;
-        }
+  bool IsComm = NodeExtensionHelper::isCommutative(N);
+  for (int Attempt = 0; Attempt != 1 + IsComm; ++Attempt) {
+    for (NodeExtensionHelper::CombineToTry FoldingStrategy :
+         FoldingStrategies) {
+      std::optional<CombineResult> Res =
+          FoldingStrategy(N, LHS, RHS, DAG, Subtarget);
+      if (Res) {
+        SDValue NewValue = Res->materialize(DAG, Subtarget);
+        return NewValue;
       }
-      std::swap(LHS, RHS);
-    }
-    // Right now we do an all or nothing approach.
-    if (!Matched)
-      return SDValue();
-  }
-  // Store the value for the replacement of the input node separately.
-  SDValue InputRootReplacement;
-  // We do the RAUW after we materialize all the combines, because some replaced
-  // nodes may be feeding some of the yet-to-be-replaced nodes. Put differently,
-  // some of these nodes may appear in the NodeExtensionHelpers of some of the
-  // yet-to-be-visited CombinesToApply roots.
-  SmallVector<std::pair<SDValue, SDValue>> ValuesToReplace;
-  ValuesToReplace.reserve(CombinesToApply.size());
-  for (CombineResult Res : CombinesToApply) {
-    SDValue NewValue = Res.materialize(DAG, Subtarget);
-    if (!InputRootReplacement) {
-      assert(Res.Root == N &&
-             "First element is expected to be the current node");
-      InputRootReplacement = NewValue;
-    } else {
-      ValuesToReplace.emplace_back(SDValue(Res.Root, 0), NewValue);
     }
+    std::swap(LHS, RHS);
   }
-  for (std::pair<SDValue, SDValue> OldNewValues : ValuesToReplace) {
-    DAG.ReplaceAllUsesOfValueWith(OldNewValues.first, OldNewValues.second);
-    DCI.AddToWorklist(OldNewValues.second.getNode());
-  }
-  return InputRootReplacement;
+
+  return SDValue();
 }
 
 // Fold (vwadd(u).wv y, (vmerge cond, x, 0)) -> vwadd(u).wv y, x, y, cond
diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll b/llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll
new file mode 100644
index 0000000000000..6c179e4f1472c
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV64
+
+define void @matmul_min(<32 x i8>* %vptr, i8* %scalars, <32 x i16>* %acc0_ptr, <32 x i16>* %acc1_ptr) {
+; CHECK-LABEL: matmul_min:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    li a4, 64
+; CHECK-NEXT:    li a5, 32
+; CHECK-NEXT:    vsetvli zero, a5, e8, m2, ta, ma
+; CHECK-NEXT:    vle8.v v16, (a0)
+; CHECK-NEXT:    lb a0, 0(a1)
+; CHECK-NEXT:    lb a1, 1(a1)
+; CHECK-NEXT:    vsetvli zero, a4, e8, m4, ta, ma
+; CHECK-NEXT:    vle8.v v8, (a2)
+; CHECK-NEXT:    vle8.v v12, (a3)
+; CHECK-NEXT:    vsetvli zero, a5, e8, m2, ta, ma
+; CHECK-NEXT:    vwmacc.vx v8, a0, v16
+; CHECK-NEXT:    vwmacc.vx v12, a1, v16
+; CHECK-NEXT:    vsetvli zero, a4, e8, m4, ta, ma
+; CHECK-NEXT:    vse8.v v8, (a2)
+; CHECK-NEXT:    vse8.v v12, (a3)
+; CHECK-NEXT:    ret
+entry:
+  %acc0 = load <32 x i16>, <32 x i16>* %acc0_ptr, align 1
+  %acc1 = load <32 x i16>, <32 x i16>* %acc1_ptr, align 1
+
+  %v8 = load <32 x i8>, <32 x i8>* %vptr, align 1
+  %v16 = sext <32 x i8> %v8 to <32 x i16>
+
+  %s0_ptr = getelementptr i8, i8* %scalars, i32 0
+  %s0_i8 = load i8, i8* %s0_ptr, align 1
+  %s0_i16 = sext i8 %s0_i8 to i16
+  %tmp0 = insertelement <32 x i16> undef, i16 %s0_i16, i32 0
+  %splat0 = shufflevector <32 x i16> %tmp0, <32 x i16> undef, <32 x i32> zeroinitializer
+  %mul0 = mul <32 x i16> %splat0, %v16
+  %add0 = add <32 x i16> %mul0, %acc0
+
+  %s1_ptr = getelementptr i8, i8* %scalars, i32 1
+  %s1_i8 = load i8, i8* %s1_ptr, align 1
+  %s1_i16 = sext i8 %s1_i8 to i16
+  %tmp1 = insertelement <32 x i16> undef, i16 %s1_i16, i32 0
+  %splat1 = shufflevector <32 x i16> %tmp1, <32 x i16> undef, <32 x i32> zeroinitializer
+  %mul1 = mul <32 x i16> %splat1, %v16
+  %add1 = add <32 x i16> %mul1, %acc1
+
+  store <32 x i16> %add0, <32 x i16>* %acc0_ptr, align 1
+  store <32 x i16> %add1, <32 x i16>* %acc1_ptr, align 1
+
+  ret void
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; RV32: {{.*}}
+; RV64: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfw-web-simplification.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfw-web-simplification.ll
deleted file mode 100644
index 0561ee9addc7b..0000000000000
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfw-web-simplification.ll
+++ /dev/null
@@ -1,238 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv64 -mattr=+v,+zvfh,+f,+d -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=1 | FileCheck %s --check-prefixes=NO_FOLDING,NO_FOLDING1
-; RUN: llc -mtriple=riscv64 -mattr=+v,+zvfh,+f,+d -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=2 | FileCheck %s --check-prefixes=NO_FOLDING,NO_FOLDING2
-; RUN: llc -mtriple=riscv64 -mattr=+v,+zvfh,+f,+d -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=3 | FileCheck %s --check-prefixes=FOLDING,ZVFH
-; RUN: llc -mtriple=riscv64 -mattr=+v,+zvfhmin,+f,+d -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=3 | FileCheck %s --check-prefixes=FOLDING,ZVFHMIN
-; Check that the default value enables the web folding and
-; that it is bigger than 3.
-; RUN: llc -mtriple=riscv64 -mattr=+v,+zvfh,+f,+d -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=FOLDING,ZVFH
-
-define void @vfwmul_v2f116_multiple_users(ptr %x, ptr %y, ptr %z, <2 x half> %a, <2 x half> %b, <2 x half> %b2) {
-; NO_FOLDING1-LABEL: vfwmul_v2f116_multiple_users:
-; NO_FOLDING1:       # %bb.0:
-; NO_FOLDING1-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
-; NO_FOLDING1-NEXT:    vfwcvt.f.f.v v11, v8
-; NO_FOLDING1-NEXT:    vfwcvt.f.f.v v8, v9
-; NO_FOLDING1-NEXT:    vfwcvt.f.f.v v9, v10
-; NO_FOLDING1-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; NO_FOLDING1-NEXT:    vfmul.vv v10, v11, v8
-; NO_FOLDING1-NEXT:    vfadd.vv v11, v11, v9
-; NO_FOLDING1-NEXT:    vfsub.vv v8, v8, v9
-; NO_FOLDING1-NEXT:    vse32.v v10, (a0)
-; NO_FOLDING1-NEXT:    vse32.v v11, (a1)
-; NO_FOLDING1-NEXT:    vse32.v v8, (a2)
-; NO_FOLDING1-NEXT:    ret
-;
-; NO_FOLDING2-LABEL: vfwmul_v2f116_multiple_users:
-; NO_FOLDING2:       # %bb.0:
-; NO_FOLDING2-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
-; NO_FOLDING2-NEXT:    vfwcvt.f.f.v v11, v8
-; NO_FOLDING2-NEXT:    vfwcvt.f.f.v v8, v9
-; NO_FOLDING2-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; NO_FOLDING2-NEXT:    vfmul.vv v9, v11, v8
-; NO_FOLDING2-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; NO_FOLDING2-NEXT:    vfwadd.wv v11, v11, v10
-; NO_FOLDING2-NEXT:    vfwsub.wv v8, v8, v10
-; NO_FOLDING2-NEXT:    vse32.v v9, (a0)
-; NO_FOLDING2-NEXT:    vse32.v v11, (a1)
-; NO_FOLDING2-NEXT:    vse32.v v8, (a2)
-; NO_FOLDING2-NEXT:    ret
-;
-; ZVFH-LABEL: vfwmul_v2f116_multiple_users:
-; ZVFH:       # %bb.0:
-; ZVFH-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
-; ZVFH-NEXT:    vfwmul.vv v11, v8, v9
-; ZVFH-NEXT:    vfwadd.vv v12, v8, v10
-; ZVFH-NEXT:    vfwsub.vv v8, v9, v10
-; ZVFH-NEXT:    vse32.v v11, (a0)
-; ZVFH-NEXT:    vse32.v v12, (a1)
-; ZVFH-NEXT:    vse32.v v8, (a2)
-; ZVFH-NEXT:    ret
-;
-; ZVFHMIN-LABEL: vfwmul_v2f116_multiple_users:
-; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v9
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v10
-; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfmul.vv v10, v11, v8
-; ZVFHMIN-NEXT:    vfadd.vv v11, v11, v9
-; ZVFHMIN-NEXT:    vfsub.vv v8, v8, v9
-; ZVFHMIN-NEXT:    vse32.v v10, (a0)
-; ZVFHMIN-NEXT:    vse32.v v11, (a1)
-; ZVFHMIN-NEXT:    vse32.v v8, (a2)
-; ZVFHMIN-NEXT:    ret
-  %c = fpext <2 x half> %a to <2 x float>
-  %d = fpext <2 x half> %b to <2 x float>
-  %d2 = fpext <2 x half> %b2 to <2 x float>
-  %e = fmul <2 x float> %c, %d
-  %f = fadd <2 x float> %c, %d2
-  %g = fsub <2 x float> %d, %d2
-  store <2 x float> %e, ptr %x
-  store <2 x float> %f, ptr %y
-  store <2 x float> %g, ptr %z
-  ret void
-}
-
-define void @vfwmul_v2f32_multiple_users(ptr %x, ptr %y, ptr %z, <2 x float> %a, <2 x float> %b, <2 x float> %b2) {
-; NO_FOLDING1-LABEL: vfwmul_v2f32_multiple_users:
-; NO_FOLDING1:       # %bb.0:
-; NO_FOLDING1-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; NO_FOLDING1-NEXT:    vfwcvt.f.f.v v11, v8
-; NO_FOLDING1-NEXT:    vfwcvt.f.f.v v8, v9
-; NO_FOLDING1-NEXT:    vfwcvt.f.f.v v9, v10
-; NO_FOLDING1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
-; NO_FOLDING1-NEXT:    vfmul.vv v10, v11, v8
-; NO_FOLDING1-NEXT:    vfadd.vv v11, v11, v9
-; NO_FOLDING1-NEXT:    vfsub.vv v8, v8, v9
-; NO_FOLDING1-NEXT:    vse64.v v10, (a0)
-; NO_FOLDING1-NEXT:    vse64.v v11, (a1)
-; NO_FOLDING1-NEXT:    vse64.v v8, (a2)
-; NO_FOLDING1-NEXT:    ret
-;
-; NO_FOLDING2-LABEL: vfwmul_v2f32_multiple_users:
-; NO_FOLDING2:       # %bb.0:
-; NO_FOLDING2-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; NO_FOLDING2-NEXT:    vfwcvt.f.f.v v11, v8
-; NO_FOLDING2-NEXT:    vfwcvt.f.f.v v8, v9
-; NO_FOLDING2-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
-; NO_FOLDING2-NEXT:    vfmul.vv v9, v11, v8
-; NO_FOLDING2-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; NO_FOLDING2-NEXT:    vfwadd.wv v11, v11, v10
-; NO_FOLDING2-NEXT:    vfwsub.wv v8, v8, v10
-; NO_FOLDING2-NEXT:    vse64.v v9, (a0)
-; NO_FOLDING2-NEXT:    vse64.v v11, (a1)
-; NO_FOLDING2-NEXT:    vse64.v v8, (a2)
-; NO_FOLDING2-NEXT:    ret
-;
-; FOLDING-LABEL: vfwmul_v2f32_multiple_users:
-; FOLDING:       # %bb.0:
-; FOLDING-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; FOLDING-NEXT:    vfwmul.vv v11, v8, v9
-; FOLDING-NEXT:    vfwadd.vv v12, v8, v10
-; FOLDING-NEXT:    vfwsub.vv v8, v9, v10
-; FOLDING-NEXT:    vse64.v v11, (a0)
-; FOLDING-NEXT:    vse64.v v12, (a1)
-; FOLDING-NEXT:    vse64.v v8, (a2)
-; FOLDING-NEXT:    ret
-  %c = fpext <2 x float> %a to <2 x double>
-  %d = fpext <2 x float> %b to <2 x double>
-  %d2 = fpext <2 x float> %b2 to <2 x double>
-  %e = fmul <2 x double> %c, %d
-  %f = fadd <2 x double> %c, %d2
-  %g = fsub <2 x double> %d, %d2
-  store <2 x double> %e, ptr %x
-  store <2 x double> %f, ptr %y
-  store <2 x double> %g, ptr %z
-  ret void
-}
-
-define void @vfwmacc_v2f32_multiple_users(ptr %x, ptr %y, ptr %z, <2 x float> %a, <2 x float> %b, <2 x float> %b2, <2 x double> %w) {
-; NO_FOLDING-LABEL: vfwmacc_v2f32_multiple_users:
-; NO_FOLDING:       # %bb.0:
-; NO_FOLDING-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; NO_FOLDING-NEXT:    vfwcvt.f.f.v v12, v8
-; NO_FOLDING-NEXT:    vfwcvt.f.f.v v8, v9
-; NO_FOLDING-NEXT:    vfwcvt.f.f.v v9, v10
-; NO_FOLDING-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
-; NO_FOLDING-NEXT:    vfmul.vv v10, v12, v8
-; NO_FOLDING-NEXT:    vfmadd.vv v12, v9, v11
-; NO_FOLDING-NEXT:    vfsub.vv v8, v8, v9
-; NO_FOLDING-NEXT:    vse64.v v10, (a0)
-; NO_FOLDING-NEXT:    vse64.v v12, (a1)
-; NO_FOLDING-NEXT:    vse64.v v8, (a2)
-; NO_FOLDING-NEXT:    ret
-;
-; FOLDING-LABEL: vfwmacc_v2f32_multiple_users:
-; FOLDING:       # %bb.0:
-; FOLDING-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; FOLDING-NEXT:    vfwmul.vv v12, v8, v9
-; FOLDING-NEXT:    vfwsub.vv v13, v9, v10
-; FOLDING-NEXT:    vfwmacc.vv v11, v8, v10
-; FOLDING-NEXT:    vse64.v v12, (a0)
-; FOLDING-NEXT:    vse64.v v11, (a1)
-; FOLDING-NEXT:    vse64.v v13, (a2)
-; FOLDING-NEXT:    ret
-  %c = fpext <2 x float> %a to <2 x double>
-  %d = fpext <2 x float> %b to <2 x double>
-  %d2 = fpext <2 x float> %b2 to <2 x double>
-  %e = fmul <2 x double> %c, %d
-  %f = call <2 x double> @llvm.fma(<2 x double> %c, <2 x double> %d2, <2 x double> %w)
-  %g = fsub <2 x double> %d, %d2
-  store <2 x double> %e, ptr %x
-  store <2 x double> %f, ptr %y
-  store <2 x double> %g, ptr %z
-  ret void
-}
-
-; Negative test. We can't fold because the FMA addend is a user.
-define void @vfwmacc_v2f32_multiple_users_addend_user(ptr %x, ptr %y, ptr %z, <2 x float> %a, <2 x float> %b, <2 x float> %b2) {
-; NO_FOLDING-LABEL: vfwmacc_v2f32_multiple_users_addend_user:
-; NO_FOLDING:       # %bb.0:
-; NO_FOLDING-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; NO_FOLDING-NEXT:    vfwcvt.f.f.v v11, v8
-; NO_FOLDING-NEXT:    vfwcvt.f.f.v v8, v9
-; NO_FOLDING-NEXT:    vfwcvt.f.f.v v9, v10
-; NO_FOLDING-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
-; NO_FOLDING-NEXT:    vfmul.vv v10, v11, v8
-; NO_FOLDING-NEXT:    vfmadd.vv v11, v9, v8
-; NO_FOLDING-NEXT:    vfsub.vv v8, v8, v9
-; NO_FOLDING-NEXT:    vse64.v v10, (a0)
-; NO_FOLDING-NEXT:    vse64.v v11, (a1)
-; NO_FOLDING-NEXT:    vse64.v v8, (a2)
-; NO_FOLDING-NEXT:    ret
-;
-; FOLDING-LABEL: vfwmacc_v2f32_multiple_users_addend_user:
-; FOLDING:       # %bb.0:
-; FOLDING-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; FOLDING-NEXT:    vfwcvt.f.f.v v11, v8
-; FOLDING-NEXT:    vfwcvt.f.f.v v8, v9
-; FOLDING-NEXT:    vfwcvt.f.f.v v9, v10
-; FOLDING-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
-; FOLDING-NEXT:    vfmul.vv v10, v11, v8
-; FOLDING-NEXT:    vfmadd.vv v11, v9, v8
-; FOLDING-NEXT:    vfsub.vv v8, v8, v9
-; FOLDING-NEXT:    vse64.v v10, (a0)
-; FOLDING-NEXT:    vse64.v v11, (a1)
-; FOLDING-NEXT:    vse64.v v8, (a2)
-; FOLDING-NEXT:    ret
-  %c = fpext <2 x float> %a to <2 x double>
-  %d = fpext <2 x float> %b to <2 x double>
-  %d2 = fpext <2 x float> %b2 to <2 x double>
-  %e = fmul <2 x double> %c, %d
-  %f = call <2 x double> @llvm.fma(<2 x double> %c, <2 x double> %d2, <2 x double> %d)
-  %g = fsub <2 x double> %d, %d2
-  store <2 x double> %e, ptr %x
-  store <2 x double> %f, ptr %y
-  store <2 x double> %g, ptr %z
-  ret void
-}
-
-; Negative test. We can't fold because the FMA addend is a user.
-define void @vfwmacc_v2f32_addend_user(ptr %x, <2 x float> %a, <2 x float> %b) {
-; NO_FOLDING-LABEL: vfwmacc_v2f32_addend_user:
-; NO_FOLDING:       # %bb.0:
-; NO_FOLDING-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; NO_FOLDING-NEXT:    vfwcvt.f.f.v v10, v8
-; NO_FOLDING-NEXT:    vfwcvt.f.f.v v8, v9
-; NO_FOLDING-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
-; NO_FOLDING-NEXT:    vfmadd.vv v8, v10, v8
-; NO_FOLDING-NEXT:    vse64.v v8, (a0)
-; NO_FOLDING-NEXT:    ret
-;
-; FOLDING-LABEL: vfwmacc_v2f32_addend_user:
-; FOLDING:       # %bb.0:
-; FOLDING-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; FOLDING-NEXT:    vfwcvt.f.f.v v10, v8
-; FOLDING-NEXT:    vfwcvt.f.f.v v8, v9
-; FOLDING-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
-; FOLDING-NEXT:    vfmadd.vv v8, v10, v8
-; FOLDING-NEXT:    vse64.v v8, (a0)
-; FOLDING-NEXT:    ret
-  %c = fpext <2 x float> %a to <2 x double>
-  %d = fpext <2 x float> %b to <2 x double>
-  %f = call <2 x double> @llvm.fma(<2 x double> %c, <2 x double> %d, <2 x double> %d)
-  store <2 x double> %f, ptr %x
-  ret void
-}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vw-web-simplification.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vw-web-simplification.ll
deleted file mode 100644
index b9466e9f7cc15..0000000000000
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vw-web-simplification.ll
+++ /dev/null
@@ -1,100 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=1 | FileCheck %s --check-prefixes=NO_FOLDING1
-; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=1 | FileCheck %s --check-prefixes=NO_FOLDING1
-; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=2 | FileCheck %s --check-prefixes=NO_FOLDING2
-; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=2 | FileCheck %s --check-prefixes=NO_FOLDING2
-; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=3 | FileCheck %s --check-prefixes=FOLDING
-; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs %s -o - --riscv-lower-ext-max-web-size=3 | FileCheck %s --check-prefixes=FOLDING
-; Check that the default value enables the web folding and
-; that it is bigger than 3.
-; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs %...
[truncated]

llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll

llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll

topperc · 2025-11-14T08:36:33Z

This patch fixes the reported issue and still allows the whole web to be handled at once. The important part is that combineTo also adds the users of the replaced node to the worklist.

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 38cce26e44af..8ba1215561dc 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -18300,8 +18300,7 @@ static SDValue combineOp_VLToVWOp_VL(SDNode *N,
     }
   }
   for (std::pair<SDValue, SDValue> OldNewValues : ValuesToReplace) {
-    DAG.ReplaceAllUsesOfValueWith(OldNewValues.first, OldNewValues.second);
-    DCI.AddToWorklist(OldNewValues.second.getNode());
+    DCI.CombineTo(OldNewValues.first.getNode(), OldNewValues.second);
   }
   return InputRootReplacement;
 }

OMG-link · 2025-11-14T10:22:35Z

This patch fixes the reported issue and still allows the whole web to be handled at once. The important part is that combineTo also adds the users of the replaced node to the worklist.

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 38cce26e44af..8ba1215561dc 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -18300,8 +18300,7 @@ static SDValue combineOp_VLToVWOp_VL(SDNode *N,
     }
   }
   for (std::pair<SDValue, SDValue> OldNewValues : ValuesToReplace) {
-    DAG.ReplaceAllUsesOfValueWith(OldNewValues.first, OldNewValues.second);
-    DCI.AddToWorklist(OldNewValues.second.getNode());
+    DCI.CombineTo(OldNewValues.first.getNode(), OldNewValues.second);
   }
   return InputRootReplacement;
 }

This actually works. But duplicating similar logic in two places often leads to problems. We should remove the worklist inside unless it is necessary.

OMG-link · 2025-11-14T11:37:09Z

I looked at the changes @topperc made last week. To stay compatible with those changes, the only approach I can currently think of is as mentioned: keep the internal worklist, but also add the users of the updated nodes to the outer worklist.

In any case, I don’t think maintaining another worklist internally is good practice. But for now, it seems to be the only option we have.

topperc · 2025-11-14T17:09:03Z

This patch fixes the reported issue and still allows the whole web to be handled at once. The important part is that combineTo also adds the users of the replaced node to the worklist.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 38cce26e44af..8ba1215561dc 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -18300,8 +18300,7 @@ static SDValue combineOp_VLToVWOp_VL(SDNode *N,
     }
   }
   for (std::pair<SDValue, SDValue> OldNewValues : ValuesToReplace) {
-    DAG.ReplaceAllUsesOfValueWith(OldNewValues.first, OldNewValues.second);
-    DCI.AddToWorklist(OldNewValues.second.getNode());
+    DCI.CombineTo(OldNewValues.first.getNode(), OldNewValues.second);
   }
   return InputRootReplacement;
 }
This actually works. But duplicating similar logic in two places often leads to problems. We should remove the worklist inside unless it is necessary.

Adding this worklist was not taken lightly. Originally we did it for each node, with a check that the add/sub/mul was the only user of the extend. The worklist system was added so we could remove the one use check and ensure all users of the extend could be widened. It looks like we did not add any negative tests that show cases where the extend is used by a node that can't be widened. If we had, I think this patch would show a change.

topperc · 2025-11-14T17:53:16Z

I looked at the changes @topperc made last week. To stay compatible with those changes, the only approach I can currently think of is as mentioned: keep the internal worklist, but also add the users of the updated nodes to the outer worklist.

In any case, I don’t think maintaining another worklist internally is good practice. But for now, it seems to be the only option we have.

@preames and I had a conversation about this last week when I was preparing my change. We think the right long term direction is to move this entire transform to a MachineIR pass. That would still need some kind of worklist and visited set, but it wouldn't be nested inside DAG combine. It would also allow the optimization to be shared with GlobalISel.

topperc · 2025-11-14T17:54:09Z

@OMG-link Do you mind if I take your test and create a PR with my change?

OMG-link · 2025-11-14T18:06:33Z

@OMG-link Do you mind if I take your test and create a PR with my change?

I was planning to change my PR tomorrow because I can't connect to my server right now. But of course, you can do it if you want.

OMG-link · 2025-11-15T08:52:53Z

I have rebuilt the commit. This version keeps the private worklist and, when inserting newly created nodes into the outer worklist, preserves the same insertion logic used by the outer worklist. Although this leads to duplicated code, it’s the only way to maintain consistent worklist behavior as long as the private worklist exists.

In addition, this version of the changes does not affect the results of any existing tests; it only changes the new test case from mul+add to macc.

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 38cce26e44af..75fab219b6c6 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -18300,8 +18300,24 @@ static SDValue combineOp_VLToVWOp_VL(SDNode *N,
     }
   }
   for (std::pair<SDValue, SDValue> OldNewValues : ValuesToReplace) {
-    DAG.ReplaceAllUsesOfValueWith(OldNewValues.first, OldNewValues.second);
-    DCI.AddToWorklist(OldNewValues.second.getNode());
+    // This helper function does what is done for RV in `DAGCombiner::Run`.
+    [&DAG, &DCI](SDNode *N, SDValue RV) -> void {
+      if (N->getNumValues() == RV->getNumValues())
+        DAG.ReplaceAllUsesWith(N, RV.getNode());
+      else {
+        assert(N->getValueType(0) == RV.getValueType() &&
+               N->getNumValues() == 1 && "Type mismatch");
+        DAG.ReplaceAllUsesWith(N, &RV);
+      }
+
+      if (RV.getOpcode() != ISD::EntryToken) {
+        DCI.AddToWorklist(RV.getNode());
+        for (SDNode *Node : RV.getNode()->users())
+          DCI.AddToWorklist(Node);
+      }
+
+      DCI.recursivelyDeleteUnusedNodes(N);
+    }(OldNewValues.first.getNode(), OldNewValues.second);
   }
   return InputRootReplacement;
 }

However, I noticed that in the initial version of the changes, there was the following transformation:

diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll
index 5ec5ce11f85a..94c3138fd330 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll
@@ -866,16 +866,16 @@ define <2 x i16> @vwmul_v2i16_multiuse(ptr %x, ptr %y, ptr %z, ptr %w) {
 ; CHECK-NEXT:    vle8.v v8, (a1)
 ; CHECK-NEXT:    vle8.v v9, (a2)
 ; CHECK-NEXT:    vsext.vf2 v10, v8
-; CHECK-NEXT:    vsext.vf2 v11, v9
-; CHECK-NEXT:    vdivu.vv v9, v10, v11
-; CHECK-NEXT:    vle8.v v10, (a0)
+; CHECK-NEXT:    vsext.vf2 v8, v9
+; CHECK-NEXT:    vdivu.vv v8, v10, v8
+; CHECK-NEXT:    vle8.v v9, (a0)
 ; CHECK-NEXT:    vle8.v v11, (a3)
-; CHECK-NEXT:    vsetvli zero, zero, e8, mf8, ta, ma
-; CHECK-NEXT:    vwmul.vv v12, v10, v11
-; CHECK-NEXT:    vwmul.vv v10, v8, v11
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; CHECK-NEXT:    vor.vv v8, v12, v10
-; CHECK-NEXT:    vor.vv v8, v8, v9
+; CHECK-NEXT:    vsext.vf2 v12, v9
+; CHECK-NEXT:    vsext.vf2 v9, v11
+; CHECK-NEXT:    vmul.vv v11, v12, v9
+; CHECK-NEXT:    vmul.vv v9, v10, v9
+; CHECK-NEXT:    vor.vv v9, v11, v9
+; CHECK-NEXT:    vor.vv v8, v9, v8
 ; CHECK-NEXT:    ret
   %a = load <2 x i8>, ptr %x
   %b = load <2 x i8>, ptr %y

The new version can no longer discover this transformation (it introduces two additional vsetvli instructions, but this is clearly not the reason the current code fails to detect this opportunity). Perhaps we should open a separate PR to discuss this issue.

@topperc

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

topperc · 2025-11-15T09:08:59Z

I have rebuilt the commit. This version keeps the private worklist and, when inserting newly created nodes into the outer worklist, preserves the same insertion logic used by the outer worklist. Although this leads to duplicated code, it’s the only way to maintain consistent worklist behavior as long as the private worklist exists.

In addition, this version of the changes does not affect the results of any existing tests; it only changes the new test case from mul+add to macc.

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 38cce26e44af..75fab219b6c6 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -18300,8 +18300,24 @@ static SDValue combineOp_VLToVWOp_VL(SDNode *N,
     }
   }
   for (std::pair<SDValue, SDValue> OldNewValues : ValuesToReplace) {
-    DAG.ReplaceAllUsesOfValueWith(OldNewValues.first, OldNewValues.second);
-    DCI.AddToWorklist(OldNewValues.second.getNode());
+    // This helper function does what is done for RV in `DAGCombiner::Run`.
+    [&DAG, &DCI](SDNode *N, SDValue RV) -> void {
+      if (N->getNumValues() == RV->getNumValues())
+        DAG.ReplaceAllUsesWith(N, RV.getNode());
+      else {
+        assert(N->getValueType(0) == RV.getValueType() &&
+               N->getNumValues() == 1 && "Type mismatch");
+        DAG.ReplaceAllUsesWith(N, &RV);
+      }
+
+      if (RV.getOpcode() != ISD::EntryToken) {
+        DCI.AddToWorklist(RV.getNode());
+        for (SDNode *Node : RV.getNode()->users())
+          DCI.AddToWorklist(Node);
+      }
+
+      DCI.recursivelyDeleteUnusedNodes(N);
+    }(OldNewValues.first.getNode(), OldNewValues.second);
   }
   return InputRootReplacement;
 }

However, I noticed that in the initial version of the changes, there was the following transformation:

diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll
index 5ec5ce11f85a..94c3138fd330 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll
@@ -866,16 +866,16 @@ define <2 x i16> @vwmul_v2i16_multiuse(ptr %x, ptr %y, ptr %z, ptr %w) {
 ; CHECK-NEXT:    vle8.v v8, (a1)
 ; CHECK-NEXT:    vle8.v v9, (a2)
 ; CHECK-NEXT:    vsext.vf2 v10, v8
-; CHECK-NEXT:    vsext.vf2 v11, v9
-; CHECK-NEXT:    vdivu.vv v9, v10, v11
-; CHECK-NEXT:    vle8.v v10, (a0)
+; CHECK-NEXT:    vsext.vf2 v8, v9
+; CHECK-NEXT:    vdivu.vv v8, v10, v8
+; CHECK-NEXT:    vle8.v v9, (a0)
 ; CHECK-NEXT:    vle8.v v11, (a3)
-; CHECK-NEXT:    vsetvli zero, zero, e8, mf8, ta, ma
-; CHECK-NEXT:    vwmul.vv v12, v10, v11
-; CHECK-NEXT:    vwmul.vv v10, v8, v11
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; CHECK-NEXT:    vor.vv v8, v12, v10
-; CHECK-NEXT:    vor.vv v8, v8, v9
+; CHECK-NEXT:    vsext.vf2 v12, v9
+; CHECK-NEXT:    vsext.vf2 v9, v11
+; CHECK-NEXT:    vmul.vv v11, v12, v9
+; CHECK-NEXT:    vmul.vv v9, v10, v9
+; CHECK-NEXT:    vor.vv v9, v11, v9
+; CHECK-NEXT:    vor.vv v8, v9, v8
 ; CHECK-NEXT:    ret
   %a = load <2 x i8>, ptr %x
   %b = load <2 x i8>, ptr %y

The new version can no longer discover this transformation (it introduces two additional vsetvli instructions, but this is clearly not the reason the current code fails to detect this opportunity). Perhaps we should open a separate PR to discuss this issue.

@topperc

Looks like it doesn't do the transform because one of the vsext is used by the vdivu and one of the multiplies. So the transform gives up because it can't widen all users of that vsext. Perhaps we should loosen that restrict for multiplies since we don't have a .wv version of multiply. The restriction probably makes sense for add/sub where we can widen a single operand.

topperc

LGTM

topperc · 2025-11-15T19:05:39Z

llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll

+  store <32 x i16> %add0, <32 x i16>* %acc0_ptr, align 1
+  store <32 x i16> %add1, <32 x i16>* %acc1_ptr, align 1


Typed pointers are deprecated.

Suggested change

store <32 x i16> %add0, <32 x i16>* %acc0_ptr, align 1

store <32 x i16> %add1, <32 x i16>* %acc1_ptr, align 1

store <32 x i16> %add0, ptr %acc0_ptr, align 1

store <32 x i16> %add1, ptr %acc1_ptr, align 1

Updated. @topperc

diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll b/llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll index 2bbaf73e6e0d..f4615df5d310 100644 --- a/llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll +++ b/llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll @@ -2,7 +2,7 @@ ; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV32 ; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV64 -define void @matmul_min(<32 x i8>* %vptr, i8* %scalars, <32 x i16>* %acc0_ptr, <32 x i16>* %acc1_ptr) { +define void @matmul_min(ptr %vptr, ptr %scalars, ptr %acc0_ptr, ptr %acc1_ptr) { ; CHECK-LABEL: matmul_min: ; CHECK: # %bb.0: # %entry ; CHECK-NEXT: li a4, 64 @@ -25,30 +25,30 @@ define void @matmul_min(<32 x i8>* %vptr, i8* %scalars, <32 x i16>* %acc0_ptr, < ; CHECK-NEXT: vse8.v v12, (a3) ; CHECK-NEXT: ret entry: - %acc0 = load <32 x i16>, <32 x i16>* %acc0_ptr, align 1 - %acc1 = load <32 x i16>, <32 x i16>* %acc1_ptr, align 1 + %acc0 = load <32 x i16>, ptr %acc0_ptr, align 1 + %acc1 = load <32 x i16>, ptr %acc1_ptr, align 1 - %v8 = load <32 x i8>, <32 x i8>* %vptr, align 1 + %v8 = load <32 x i8>, ptr %vptr, align 1 %v16 = sext <32 x i8> %v8 to <32 x i16> - %s0_ptr = getelementptr i8, i8* %scalars, i32 0 - %s0_i8 = load i8, i8* %s0_ptr, align 1 + %s0_ptr = getelementptr i8, ptr %scalars, i32 0 + %s0_i8 = load i8, ptr %s0_ptr, align 1 %s0_i16 = sext i8 %s0_i8 to i16 %tmp0 = insertelement <32 x i16> undef, i16 %s0_i16, i32 0 %splat0 = shufflevector <32 x i16> %tmp0, <32 x i16> undef, <32 x i32> zeroinitializer %mul0 = mul <32 x i16> %splat0, %v16 %add0 = add <32 x i16> %mul0, %acc0 - %s1_ptr = getelementptr i8, i8* %scalars, i32 1 - %s1_i8 = load i8, i8* %s1_ptr, align 1 + %s1_ptr = getelementptr i8, ptr %scalars, i32 1 + %s1_i8 = load i8, ptr %s1_ptr, align 1 %s1_i16 = sext i8 %s1_i8 to i16 %tmp1 = insertelement <32 x i16> undef, i16 %s1_i16, i32 0 %splat1 = shufflevector <32 x i16> %tmp1, <32 x i16> undef, <32 x i32> zeroinitializer %mul1 = mul <32 x i16> %splat1, %v16 %add1 = add <32 x i16> %mul1, %acc1 - store <32 x i16> %add0, <32 x i16>* %acc0_ptr, align 1 - store <32 x i16> %add1, <32 x i16>* %acc1_ptr, align 1 + store <32 x i16> %add0, ptr %acc0_ptr, align 1 + store <32 x i16> %add1, ptr %acc1_ptr, align 1 ret void

llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll

topperc

LGTM

github-actions · 2025-11-17T04:48:06Z

✅ With the latest revision this PR passed the undef deprecator.

topperc · 2025-11-17T04:49:38Z

llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll

+  %s0_ptr = getelementptr i8, ptr %scalars, i32 0
+  %s0_i8 = load i8, ptr %s0_ptr, align 1
+  %s0_i16 = sext i8 %s0_i8 to i16
+  %tmp0 = insertelement <32 x i16> undef, i16 %s0_i16, i32 0


Suggested change

%tmp0 = insertelement <32 x i16> undef, i16 %s0_i16, i32 0

%tmp0 = insertelement <32 x i16> poison, i16 %s0_i16, i32 0

To fix the warning about new uses of undef.

topperc · 2025-11-17T04:49:49Z

llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll

+  %s1_ptr = getelementptr i8, ptr %scalars, i32 1
+  %s1_i8 = load i8, ptr %s1_ptr, align 1
+  %s1_i16 = sext i8 %s1_i8 to i16
+  %tmp1 = insertelement <32 x i16> undef, i16 %s1_i16, i32 0


Suggested change

%tmp1 = insertelement <32 x i16> undef, i16 %s1_i16, i32 0

%tmp1 = insertelement <32 x i16> poison, i16 %s1_i16, i32 0

llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll

Add a minimal reproducer for consecutive vwmacc-like operations to illustrate that the previous DAG combine logic may miss combining mul+add chains into a single vwmacc.vx instruction.

The previous implementation of combineOp_VLToVWOp_VL manually replaced old nodes with newly created widened nodes, but only added the new node itself to the DAGCombiner worklist. Since the users of the new node were not added, some combine opportunities could be missed when external DAGCombiner passes expected those users to be reconsidered. This patch replaces the custom replacement logic with a call to DCI.CombineTo(), which performs node replacement in a way consistent with DAGCombiner::Run: - Replace all uses of the old node. - Add the new node and its users to the worklist. - Clean up unused nodes when appropriate. Using CombineTo ensures that combineOp_VLToVWOp_VL behaves consistently with the standard DAGCombiner update model, avoiding discrepancies between the private worklist inside this routine and the global worklist managed by the combiner. This resolves missed combine cases involving VL -> VW operator widening.

llvmbot added the backend:RISC-V label Nov 14, 2025

lukel97 reviewed Nov 14, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll Outdated Show resolved Hide resolved

OMG-link force-pushed the fix-dagcombine-vl-vw-fold branch from b501480 to 4a1918d Compare November 14, 2025 08:00

lukel97 reviewed Nov 14, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll Outdated Show resolved Hide resolved

topperc reviewed Nov 14, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll Show resolved Hide resolved

OMG-link force-pushed the fix-dagcombine-vl-vw-fold branch from 4a1918d to 66af1db Compare November 14, 2025 10:31

OMG-link force-pushed the fix-dagcombine-vl-vw-fold branch from 66af1db to 4b50405 Compare November 15, 2025 08:52

topperc reviewed Nov 15, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

OMG-link force-pushed the fix-dagcombine-vl-vw-fold branch from 4b50405 to 5147468 Compare November 15, 2025 09:22

topperc approved these changes Nov 15, 2025

View reviewed changes

topperc requested changes Nov 15, 2025

View reviewed changes

topperc reviewed Nov 15, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/combine-vl-vw-macc.ll Outdated Show resolved Hide resolved

OMG-link force-pushed the fix-dagcombine-vl-vw-fold branch from 5147468 to 590531a Compare November 16, 2025 05:53

topperc approved these changes Nov 16, 2025

View reviewed changes

topperc reviewed Nov 17, 2025

View reviewed changes

Kai Lin added 2 commits November 17, 2025 13:01

[RVV] Add test for missed VWMACC combine

aefd5bd

Add a minimal reproducer for consecutive vwmacc-like operations to illustrate that the previous DAG combine logic may miss combining mul+add chains into a single vwmacc.vx instruction.

OMG-link force-pushed the fix-dagcombine-vl-vw-fold branch from 590531a to 16d98bc Compare November 17, 2025 05:03

		store <32 x i16> %add0, <32 x i16>* %acc0_ptr, align 1
		store <32 x i16> %add1, <32 x i16>* %acc1_ptr, align 1

	%tmp0 = insertelement <32 x i16> undef, i16 %s0_i16, i32 0
	%tmp0 = insertelement <32 x i16> poison, i16 %s0_i16, i32 0

	%tmp1 = insertelement <32 x i16> undef, i16 %s1_i16, i32 0
	%tmp1 = insertelement <32 x i16> poison, i16 %s1_i16, i32 0

[RISCV][DAGCombiner] Fix potential missed combine in VL->VW extension #168026

Are you sure you want to change the base?

[RISCV][DAGCombiner] Fix potential missed combine in VL->VW extension #168026

Conversation

OMG-link commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

llvmbot commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

topperc commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OMG-link commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OMG-link commented Nov 14, 2025

Uh oh!

topperc commented Nov 14, 2025

Uh oh!

topperc commented Nov 14, 2025

Uh oh!

topperc commented Nov 14, 2025

Uh oh!

OMG-link commented Nov 14, 2025

Uh oh!

OMG-link commented Nov 15, 2025

Uh oh!

Uh oh!

topperc commented Nov 15, 2025

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

topperc Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

OMG-link Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

topperc Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

OMG-link commented Nov 14, 2025 •

edited

Loading

topperc commented Nov 14, 2025 •

edited

Loading

OMG-link commented Nov 14, 2025 •

edited

Loading

OMG-link Nov 16, 2025 •

edited

Loading

github-actions bot commented Nov 17, 2025 •

edited

Loading