[mlir][vector][xegpu] Accept uniform values in `getDistributedType` #163887

charithaintc · 2025-10-16T23:17:44Z

Uniform values should not be distributed during vector distribution. Example would be a reduction result where reduction happens across lanes.

However, current getDistributedType does not accept a zero result affine map (i.e. no distributed dims) when describing the distributed dimensions. This result in null type being returned and crashing the vector distribution in some cases. An example case would be a scf.for op (about to be distributed) in which one of the for result is a uniform value and it does not have a user outside the warp op. This necessitates querying the getDistributedType to figure our the distributed type of this value.

llvmbot · 2025-10-16T23:18:24Z

@llvm/pr-subscribers-mlir-gpu
@llvm/pr-subscribers-mlir-vector

@llvm/pr-subscribers-mlir

Author: Charitha Saumya (charithaintc)

Changes

Uniform values should not be distributed during vector distribution. Example would be a reduction result where reduction happens across lanes.

However, current getDistributedType does not accept a zero result affine map (i.e. no distributed dims) when describing the distributed dimensions. This result in null type being returned and crashing the vector distribution in some cases. An example case would be a scf.for op (about to be distributed) in which one of the for result is a uniform value and it does not have a user outside the warp op. This necessitates querying the getDistributedType to figure our the distributed type of this value.

Full diff: https://github.com/llvm/llvm-project/pull/163887.diff

3 Files Affected:

(modified) mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp (+6-1)
(modified) mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp (+6-1)
(modified) mlir/test/Dialect/XeGPU/subgroup-distribute.mlir (+51)

diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp
index 12e6475fa66e3..15f0c077a6d3f 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp
@@ -341,13 +341,18 @@ struct WarpOpToScfIfPattern : public WarpDistributionPattern {
 /// Return the distributed vector type based on the original type and the
 /// distribution map. The map is expected to have a dimension equal to the
 /// original type rank and should be a projection where the results are the
-/// distributed dimensions. The number of results should be equal to the number
+/// distributed dimensions. If the number of results is zero there is no
+/// distribution (i.e. original type is returned).
+/// Otherwise, The number of results should be equal to the number
 /// of warp sizes which is currently limited to 1.
 /// Example: For a vector<16x32x64> distributed with a map(d0, d1, d2) -> (d1)
 /// and a warp size of 16 would distribute the second dimension (associated to
 /// d1) and return vector<16x2x64>
 static VectorType getDistributedType(VectorType originalType, AffineMap map,
                                      int64_t warpSize) {
+  // If the map has zero results, return the original type.
+  if (map.getNumResults() == 0)
+    return originalType;
   SmallVector<int64_t> targetShape(originalType.getShape());
   for (unsigned i = 0, e = map.getNumResults(); i < e; i++) {
     unsigned position = map.getDimPosition(i);
diff --git a/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp b/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
index 26770b3c003ea..5ea9784ae6787 100644
--- a/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+++ b/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
@@ -1510,9 +1510,14 @@ void XeGPUSubgroupDistributePass::runOnOperation() {
     if (!layout)
       return AffineMap::getMultiDimMapWithTargets(
           vecRank, {static_cast<unsigned int>(vecRank - 1)}, val.getContext());
+    // Expecting vector and layout rank to match.
+    assert(layout.getRank() == vecRank &&
+           "Expecting vector and layout rank to match");
+    // A dimension is distributed if its layout value is > 1 and the dimension
+    // size is evenly divisible by the layout value.
     SmallVector<unsigned int> distributedDims;
     for (auto [i, v] : llvm::enumerate(layout.getEffectiveLaneLayoutAsInt())) {
-      if (v > 1)
+      if (v > 1 && vecType.getShape()[i] % v == 0)
         distributedDims.push_back(i);
     }
     return AffineMap::getMultiDimMapWithTargets(vecRank, distributedDims,
diff --git a/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir b/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir
index 0e1365aa64171..27a3dc373c739 100644
--- a/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir
+++ b/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir
@@ -214,3 +214,54 @@ gpu.module @xevm_module{
 
   }
 }
+
+// -----
+// CHECK-LABEL: gpu.func @warp_scf_for_unused_uniform_for_result(
+// CHECK:         %[[W:.*]]:2 = gpu.warp_execute_on_lane_0(%{{.*}})[16] args(%{{.*}} : index,
+// CHECK-SAME:      !xegpu.tensor_desc<16x16xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>,
+// CHECK-SAME:      memref<16x16xf32>) -> (vector<16x1xf32>, vector<16x1xf32>) {
+// CHECK:           gpu.yield %{{.*}}, {{.*}} : vector<16x16xf32>, vector<16x1xf32>
+// CHECK:         }
+// CHECK:         %{{.*}}:2 = scf.for {{.*}} to %{{.*}} step %{{.*}} iter_args
+// CHECK-SAME:      (%{{.*}} = %[[W]]#0, %{{.*}} = %[[W]]#1) -> (vector<16x1xf32>, vector<16x1xf32>) {
+// CHECK:           %[[W1:.*]]:2 = gpu.warp_execute_on_lane_0(%{{.*}})[16]
+// CHECK-SAME:        args(%{{.*}} : vector<16x1xf32>, vector<16x1xf32>) -> (vector<16x1xf32>, vector<16x1xf32>) {
+// CHECK:             gpu.yield %{{.*}}, %{{.*}} : vector<16x16xf32>, vector<16x1xf32>
+// CHECK:           }
+// CHECK:           scf.yield %[[W1]]#0, %[[W1]]#1 : vector<16x1xf32>, vector<16x1xf32>
+// CHECK:         }
+gpu.module @xevm_module{
+  gpu.func @warp_scf_for_unused_uniform_for_result(%arg0: index,
+    %arg1: !xegpu.tensor_desc<16x16xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>,
+    %arg2: memref<16x16xf32>) {
+    %c128 = arith.constant 128 : index
+    %c1 = arith.constant 1 : index
+    %c0 = arith.constant 0 : index
+    %ini = "some_def"() {layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>}
+      : () -> (vector<16x1xf32>)
+    %ini2 = "some_def"() {layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>}
+      : () -> (vector<16x16xf32>)
+    %3:2 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini2, %arg5 = %ini) -> (vector<16x16xf32>, vector<16x1xf32>) {
+      %1  = "some_def"(%arg5)
+        {
+          layout_operand_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>,
+          layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>
+        }
+        : (vector<16x1xf32>) -> (vector<16x1xf32>)
+      %acc = "some_def"(%arg4, %1)
+        {
+          layout_operand_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>,
+          layout_operand_1 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>,
+          layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>
+        }
+        : (vector<16x16xf32>, vector<16x1xf32>) -> (vector<16x16xf32>)
+      scf.yield %acc, %1 : vector<16x16xf32>, vector<16x1xf32>
+    }
+    {
+      layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>
+    }
+    xegpu.store_nd %3#0, %arg1[%c0, %c0]
+      : vector<16x16xf32>, !xegpu.tensor_desc<16x16xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>
+    gpu.return
+  }
+}

Jianhui-Li · 2025-10-17T00:15:50Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

    if (!layout)
      return AffineMap::getMultiDimMapWithTargets(


if there is no layout assigned, there should not be distributed.

Jianhui-Li · 2025-10-17T00:18:13Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

+    // Expecting vector and layout rank to match.
+    assert(layout.getRank() == vecRank &&
+           "Expecting vector and layout rank to match");
+    // A dimension is distributed if its layout value is > 1 and the dimension


A dimension is distributed only if xegpu.layout suggests there are multiple lanes assigned for this dimension and the shape can be even distributed to lanes.

Jianhui-Li · 2025-10-17T00:30:00Z

mlir/test/Dialect/XeGPU/subgroup-distribute.mlir

+// CHECK:           }
+// CHECK:           scf.yield %[[W1]]#0, %[[W1]]#1 : vector<16x1xf32>, vector<16x1xf32>
+// CHECK:         }
+gpu.module @xevm_module{


Can a simple test be used to motivate the change? Like form a vector of 2 scalar and extract scalar back, what would current distribution do for the vector of 2?

this would not trigger the bug. if the vector of 2 is extracted, then it means it has a user.

This bug triggers when "before sinking a region op we don't know the distributed type of all the operands of this region op". This won't ever be triggered for a non-region op that has at least one user.

save work

3ff8593

charithaintc requested review from banach-space, dcaballe, hanhanW and nicolasvasilache as code owners October 16, 2025 23:17

charithaintc requested a review from Jianhui-Li October 16, 2025 23:17

llvmbot added mlir:gpu mlir:vectorops mlir mlir:vector labels Oct 16, 2025

charithaintc requested review from adam-smnk and akroviakov October 16, 2025 23:18

Jianhui-Li reviewed Oct 17, 2025

View reviewed changes

charithaintc added 2 commits October 17, 2025 16:53

Merge branch 'main' into support_uniform_value_dist

a9eb4d4

address comments

2ac69dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][vector][xegpu] Accept uniform values in `getDistributedType` #163887

[mlir][vector][xegpu] Accept uniform values in `getDistributedType` #163887

charithaintc commented Oct 16, 2025

Uh oh!

llvmbot commented Oct 16, 2025 •

edited

Loading

Uh oh!

Jianhui-Li Oct 17, 2025

Uh oh!

charithaintc Oct 17, 2025

Uh oh!

Jianhui-Li Oct 17, 2025

Uh oh!

charithaintc Oct 17, 2025

Uh oh!

Jianhui-Li Oct 17, 2025

Uh oh!

charithaintc Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[mlir][vector][xegpu] Accept uniform values in getDistributedType #163887

Are you sure you want to change the base?

[mlir][vector][xegpu] Accept uniform values in getDistributedType #163887

Conversation

charithaintc commented Oct 16, 2025

Uh oh!

llvmbot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jianhui-Li Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[mlir][vector][xegpu] Accept uniform values in `getDistributedType` #163887

[mlir][vector][xegpu] Accept uniform values in `getDistributedType` #163887

llvmbot commented Oct 16, 2025 •

edited

Loading