[vector][distribution] Bug fix in `moveRegionToNewWarpOpAndAppendReturns` #153656

charithaintc · 2025-08-14T19:38:43Z

moveRegionToNewWarpOpAndAppendReturns implicitly assumes that there are no duplicates in the WarpOp's yield operands. It uses a SetVector to store the yielded values and SmallVector to collect the corresponding yielded types. This creates an issue when there are duplicated yielded values as shown in the test case.

func.func @warp_propagate_duplicated_operands_in_yield(%laneid: index)  {
  %r:3 = gpu.warp_execute_on_lane_0(%laneid)[32] -> (vector<1xf32>, vector<1xf32>, vector<1xf32>) {
    %0 = "some_def"() : () -> (vector<32xf32>)
    %1 = "some_other_def"() : () -> (vector<32xf32>)
    %2 = math.exp %1 : vector<32xf32>
    gpu.yield %2, %0, %0 : vector<32xf32>, vector<32xf32>, vector<32xf32>
  }
  "some_use"(%r#0) : (vector<1xf32>) -> ()
  return
}

This causes a size mismatch in yieldedValues and types causing a crash.

This is a subtle bug. Notice that if WarpOpDeadResult is run before the WarpOpElementwise, this crash won't occur because then the duplicate operands will be simplified by WarpOpDeadResult. However moveRegionToNewWarpOpAndAppendReturns should not assume such pattern application order in practice.

llvmbot · 2025-08-14T19:39:16Z

@llvm/pr-subscribers-mlir-vector
@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Charitha Saumya (charithaintc)

Changes

moveRegionToNewWarpOpAndAppendReturns implicitly assumes that there are no duplicates in the WarpOp's yield operands. It uses a SetVector to store the yielded values and SmallVector to collect the corresponding yielded types. This creates an issue when there are duplicated yielded values as shown in the test case.

func.func @<!-- -->warp_propagate_duplicated_operands_in_yield(%laneid: index)  {
  %r:3 = gpu.warp_execute_on_lane_0(%laneid)[32] -&gt; (vector&lt;1xf32&gt;, vector&lt;1xf32&gt;, vector&lt;1xf32&gt;) {
    %0 = "some_def"() : () -&gt; (vector&lt;32xf32&gt;)
    %1 = "some_other_def"() : () -&gt; (vector&lt;32xf32&gt;)
    %2 = math.exp %1 : vector&lt;32xf32&gt;
    gpu.yield %2, %0, %0 : vector&lt;32xf32&gt;, vector&lt;32xf32&gt;, vector&lt;32xf32&gt;
  }
  "some_use"(%r#<!-- -->0) : (vector&lt;1xf32&gt;) -&gt; ()
  return
}

This causes a size mismatch in yieldedValues and types causing a crash.

This is a subtle bug. Notice that if WarpOpDeadResult is run before the WarpOpElementwise, this crash won't occur because then the duplicate operands will be simplified by WarpOpDeadResult. However moveRegionToNewWarpOpAndAppendReturns should not assume such pattern application order in practice.

Full diff: https://github.com/llvm/llvm-project/pull/153656.diff

2 Files Affected:

(modified) mlir/lib/Dialect/GPU/Utils/DistributionUtils.cpp (+18-14)
(modified) mlir/test/Dialect/Vector/vector-warp-distribute.mlir (+21)

diff --git a/mlir/lib/Dialect/GPU/Utils/DistributionUtils.cpp b/mlir/lib/Dialect/GPU/Utils/DistributionUtils.cpp
index 384d1a0ddccd2..be71bd02fc43b 100644
--- a/mlir/lib/Dialect/GPU/Utils/DistributionUtils.cpp
+++ b/mlir/lib/Dialect/GPU/Utils/DistributionUtils.cpp
@@ -14,6 +14,7 @@
 #include "mlir/Dialect/Affine/IR/AffineOps.h"
 #include "mlir/Dialect/Arith/IR/Arith.h"
 #include "mlir/IR/Value.h"
+#include "llvm/ADT/DenseMap.h"
 
 #include <numeric>
 
@@ -57,26 +58,29 @@ WarpDistributionPattern::moveRegionToNewWarpOpAndAppendReturns(
                           warpOp.getResultTypes().end());
   auto yield = cast<gpu::YieldOp>(
       warpOp.getBodyRegion().getBlocks().begin()->getTerminator());
-  llvm::SmallSetVector<Value, 32> yieldValues(yield.getOperands().begin(),
-                                              yield.getOperands().end());
+  SmallVector<Value> yieldValues(yield.getOperands().begin(),
+                                 yield.getOperands().end());
+  llvm::SmallDenseMap<Value, unsigned> indexLookup;
+  // Record the value -> first index mapping for faster lookup.
+  for (auto [i, v] : llvm::enumerate(yieldValues)) {
+    if (!indexLookup.count(v))
+      indexLookup[v] = i;
+  }
+
   for (auto [value, type] : llvm::zip_equal(newYieldedValues, newReturnTypes)) {
-    if (yieldValues.insert(value)) {
+    // If the value already exists in the yield, don't create a new output.
+    if (indexLookup.count(value)) {
+      indices.push_back(indexLookup[value]);
+    } else {
+      // If the value is new, add it to the yield and to the types.
+      yieldValues.push_back(value);
       types.push_back(type);
       indices.push_back(yieldValues.size() - 1);
-    } else {
-      // If the value already exit the region don't create a new output.
-      for (auto [idx, yieldOperand] :
-           llvm::enumerate(yieldValues.getArrayRef())) {
-        if (yieldOperand == value) {
-          indices.push_back(idx);
-          break;
-        }
-      }
     }
   }
-  yieldValues.insert_range(newYieldedValues);
+
   WarpExecuteOnLane0Op newWarpOp = moveRegionToNewWarpOpAndReplaceReturns(
-      rewriter, warpOp, yieldValues.getArrayRef(), types);
+      rewriter, warpOp, yieldValues, types);
   rewriter.replaceOp(warpOp,
                      newWarpOp.getResults().take_front(warpOp.getNumResults()));
   return newWarpOp;
diff --git a/mlir/test/Dialect/Vector/vector-warp-distribute.mlir b/mlir/test/Dialect/Vector/vector-warp-distribute.mlir
index ae8fce786ee57..c3ce7e9ca7fda 100644
--- a/mlir/test/Dialect/Vector/vector-warp-distribute.mlir
+++ b/mlir/test/Dialect/Vector/vector-warp-distribute.mlir
@@ -1803,3 +1803,24 @@ func.func @warp_propagate_nd_write(%laneid: index, %dest: memref<4x1024xf32>) {
 //       CHECK-DIST-AND-PROP:   %[[IDS:.+]]:2 = affine.delinearize_index %{{.*}} into (4, 8) : index, index
 //       CHECK-DIST-AND-PROP:   %[[INNER_ID:.+]] = affine.apply #map()[%[[IDS]]#1]
 //       CHECK-DIST-AND-PROP:   vector.transfer_write %[[W]], %{{.*}}[%[[IDS]]#0, %[[INNER_ID]]] {{.*}} : vector<1x128xf32>
+
+// -----
+func.func @warp_propagate_duplicated_operands_in_yield(%laneid: index)  {
+  %r:3 = gpu.warp_execute_on_lane_0(%laneid)[32] -> (vector<1xf32>, vector<1xf32>, vector<1xf32>) {
+    %0 = "some_def"() : () -> (vector<32xf32>)
+    %1 = "some_other_def"() : () -> (vector<32xf32>)
+    %2 = math.exp %1 : vector<32xf32>
+    gpu.yield %2, %0, %0 : vector<32xf32>, vector<32xf32>, vector<32xf32>
+  }
+  "some_use"(%r#0) : (vector<1xf32>) -> ()
+  return
+}
+
+// CHECK-PROP-LABEL : func.func @warp_propagate_duplicated_operands_in_yield(
+// CHECK-PROP       :   %[[W:.*]] = gpu.warp_execute_on_lane_0(%{{.*}})[32] -> (vector<1xf32>) {
+// CHECK-PROP       :     %{{.*}} = "some_def"() : () -> vector<32xf32>
+// CHECK-PROP       :     %[[T3:.*]] = "some_other_def"() : () -> vector<32xf32>
+// CHECK-PROP       :     gpu.yield %[[T3]] : vector<32xf32>
+// CHECK-PROP       :   }
+// CHECK-PROP       :   %[T1:.*] = math.exp %[[W]] : vector<1xf32>
+// CHECK-PROP       :   "some_use"(%[[T1]]) : (vector<1xf32>) -> ()

Jianhui-Li

LGTM

adam-smnk · 2025-08-15T09:58:14Z

What happens when there are users of all three %r:3 results? Is it tracked correctly?

Btw, this example still crashes:

func.func @warp_propagate_duplicated_operands_in_yield(%laneid: index)  {
  %r:3 = gpu.warp_execute_on_lane_0(%laneid)[32] -> (vector<1xf32>, vector<1xf32>, vector<1xf32>) {
    %0 = "some_def"() : () -> (vector<32xf32>)
    %1 = "some_other_def"() : () -> (vector<32xf32>)
    %2 = math.exp %1 : vector<32xf32>
    gpu.yield %2, %0, %0 : vector<32xf32>, vector<32xf32>, vector<32xf32>
  }
  "some_use"(%r#2) : (vector<1xf32>) -> () // Note user of the duplicate only
  return
}

charithaintc · 2025-08-15T15:48:20Z

What happens when there are users of all three %r:3 results? Is it tracked correctly?

yes. Other results gets folded away by WarpOpDeadResult pattern.

Btw, this example still crashes:

func.func @warp_propagate_duplicated_operands_in_yield(%laneid: index)  {
  %r:3 = gpu.warp_execute_on_lane_0(%laneid)[32] -> (vector<1xf32>, vector<1xf32>, vector<1xf32>) {
    %0 = "some_def"() : () -> (vector<32xf32>)
    %1 = "some_other_def"() : () -> (vector<32xf32>)
    %2 = math.exp %1 : vector<32xf32>
    gpu.yield %2, %0, %0 : vector<32xf32>, vector<32xf32>, vector<32xf32>
  }
  "some_use"(%r#2) : (vector<1xf32>) -> () // Note user of the duplicate only
  return
}

Good catch. There is another issue in WarpOpDeadResult. I think this is the reason for that. I have a fix in another PR. But did not merge it yet because that issue not bothering us at this point. But you are right, we need to take care of it too at some point.

#148067

In any case, I think issue in this PR is clearly isolated. The fact that yielded values using a SetVector and types using SmallVector is clearly not correct in the presence of duplicated yielded values.

adam-smnk

Thanks for clarification 👍
LGTM

charithaintc added 3 commits August 13, 2025 23:57

bug fix

2ad884c

Merge branch 'main' into distr_utils_fix

29ca7c7

add test

e49a6f1

llvmbot added mlir:gpu mlir:vectorops mlir mlir:vector labels Aug 14, 2025

charithaintc requested a review from kurapov-peter August 14, 2025 19:39

charithaintc assigned Jianhui-Li Aug 14, 2025

charithaintc requested review from adam-smnk and chencha3 August 14, 2025 19:39

charithaintc unassigned Jianhui-Li Aug 14, 2025

charithaintc requested a review from Jianhui-Li August 14, 2025 19:39

charithaintc changed the title ~~[vector][distribution] Bug fix in moveRegionToNewWarpOpAndAppendReturns~~ [vector][distribution] Bug fix in moveRegionToNewWarpOpAndAppendReturns Aug 14, 2025

Jianhui-Li approved these changes Aug 15, 2025

View reviewed changes

adam-smnk approved these changes Aug 18, 2025

View reviewed changes

charithaintc and others added 4 commits August 18, 2025 16:23

Merge branch 'main' into distr_utils_fix

6c0660f

Merge branch 'main' into distr_utils_fix

521f0ff

Merge branch 'main' into distr_utils_fix

5692546

Merge branch 'main' into distr_utils_fix

4206dfe

charithaintc merged commit 9617ce4 into llvm:main Aug 18, 2025
5 of 8 checks passed

Garra1980 mentioned this pull request Aug 20, 2025

[LLVM Pulldown] Bump LLVM rev b44e47a68f9b49a6283b1beaab3af55fa39e8907 intel/mlir-extensions#1110

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vector][distribution] Bug fix in `moveRegionToNewWarpOpAndAppendReturns` #153656

[vector][distribution] Bug fix in `moveRegionToNewWarpOpAndAppendReturns` #153656

Uh oh!

charithaintc commented Aug 14, 2025

Uh oh!

llvmbot commented Aug 14, 2025 •

edited

Loading

Uh oh!

Jianhui-Li left a comment

Uh oh!

adam-smnk commented Aug 15, 2025 •

edited

Loading

Uh oh!

charithaintc commented Aug 15, 2025 •

edited

Loading

Uh oh!

adam-smnk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[vector][distribution] Bug fix in moveRegionToNewWarpOpAndAppendReturns #153656

[vector][distribution] Bug fix in moveRegionToNewWarpOpAndAppendReturns #153656

Uh oh!

Conversation

charithaintc commented Aug 14, 2025

Uh oh!

llvmbot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jianhui-Li left a comment

Choose a reason for hiding this comment

Uh oh!

adam-smnk commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charithaintc commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adam-smnk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[vector][distribution] Bug fix in `moveRegionToNewWarpOpAndAppendReturns` #153656

[vector][distribution] Bug fix in `moveRegionToNewWarpOpAndAppendReturns` #153656

llvmbot commented Aug 14, 2025 •

edited

Loading

adam-smnk commented Aug 15, 2025 •

edited

Loading

charithaintc commented Aug 15, 2025 •

edited

Loading