Skip to content

Conversation

@leewei05
Copy link
Contributor

Support vector reduction for xor/maxnumf/minnumf.

@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2025

@llvm/pr-subscribers-mlir-spirv
@llvm/pr-subscribers-mlir-vector

@llvm/pr-subscribers-mlir-affine

Author: Lee Wei (leewei05)

Changes

Support vector reduction for xor/maxnumf/minnumf.


Full diff: https://github.com/llvm/llvm-project/pull/163310.diff

3 Files Affected:

  • (modified) mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp (+3-2)
  • (modified) mlir/lib/Dialect/Vector/IR/VectorOps.cpp (+9-1)
  • (modified) mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir (+100)
diff --git a/mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp b/mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp
index 4d2d8738aa4ad..3d1a73417d1ea 100644
--- a/mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp
+++ b/mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp
@@ -66,9 +66,10 @@ static Value getSupportedReduction(AffineForOp forOp, unsigned pos,
           .Case([](arith::MaxSIOp) { return arith::AtomicRMWKind::maxs; })
           .Case([](arith::MinUIOp) { return arith::AtomicRMWKind::minu; })
           .Case([](arith::MaxUIOp) { return arith::AtomicRMWKind::maxu; })
+          .Case([](arith::XOrIOp) { return arith::AtomicRMWKind::xori; })
+          .Case([](arith::MaxNumFOp) { return arith::AtomicRMWKind::maxnumf; })
+          .Case([](arith::MinNumFOp) { return arith::AtomicRMWKind::minnumf; })
           .Default([](Operation *) -> std::optional<arith::AtomicRMWKind> {
-            // TODO: AtomicRMW supports other kinds of reductions this is
-            // currently not detecting, add those when the need arises.
             return std::nullopt;
           });
   if (!maybeKind)
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index 58256b0ade9f6..e74c72dbcd33e 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -717,7 +717,15 @@ Value mlir::vector::getVectorReductionOp(arith::AtomicRMWKind op,
   case arith::AtomicRMWKind::ori:
     return vector::ReductionOp::create(builder, vector.getLoc(),
                                        CombiningKind::OR, vector);
-  // TODO: Add remaining reduction operations.
+  case arith::AtomicRMWKind::minnumf:
+    return vector::ReductionOp::create(builder, vector.getLoc(),
+                                       CombiningKind::MINNUMF, vector);
+  case arith::AtomicRMWKind::maxnumf:
+    return vector::ReductionOp::create(builder, vector.getLoc(),
+                                       CombiningKind::MAXNUMF, vector);
+  case arith::AtomicRMWKind::xori:
+    return vector::ReductionOp::create(builder, vector.getLoc(),
+                                       CombiningKind::XOR, vector);
   default:
     (void)emitOptionalError(loc, "Reduction operation type not supported");
     break;
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir
index b616632a6fe24..a6db6ec719472 100644
--- a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir
@@ -243,6 +243,106 @@ func.func @vecdim_reduction_ori(%in: memref<256x512xi32>, %out: memref<256xi32>)
 // CHECK:         affine.store %[[final_red]], %{{.*}} : memref<256xi32>
 // CHECK:       }
 
+// -----
+
+func.func @vecdim_reduction_xori(%in: memref<256x512xi32>, %out: memref<256xi32>) {
+ %cst = arith.constant 0 : i32
+ affine.for %i = 0 to 256 {
+   %final_red = affine.for %j = 0 to 512 iter_args(%red_iter = %cst) -> (i32) {
+     %ld = affine.load %in[%i, %j] : memref<256x512xi32>
+     %xor = arith.xori %red_iter, %ld : i32
+     affine.yield %xor : i32
+   }
+   affine.store %final_red, %out[%i] : memref<256xi32>
+ }
+ return
+}
+
+// CHECK-LABEL:   func.func @vecdim_reduction_xori(
+// CHECK-SAME:      %[[ARG0:.*]]: memref<256x512xi32>,
+// CHECK-SAME:      %[[ARG1:.*]]: memref<256xi32>) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 0 : i32
+// CHECK:           affine.for %[[VAL_1:.*]] = 0 to 256 {
+// CHECK:             %[[VAL_2:.*]] = arith.constant dense<0> : vector<128xi32>
+// CHECK:             %[[VAL_3:.*]] = affine.for %[[VAL_4:.*]] = 0 to 512 step 128 iter_args(%[[VAL_5:.*]] = %[[VAL_2]]) -> (vector<128xi32>) {
+// CHECK:               %[[VAL_6:.*]] = ub.poison : i32
+// CHECK:               %[[VAL_7:.*]] = vector.transfer_read %[[ARG0]]{{\[}}%[[VAL_1]], %[[VAL_4]]], %[[VAL_6]] : memref<256x512xi32>, vector<128xi32>
+// CHECK:               %[[VAL_8:.*]] = arith.xori %[[VAL_5]], %[[VAL_7]] : vector<128xi32>
+// CHECK:               affine.yield %[[VAL_8]] : vector<128xi32>
+// CHECK:             }
+// CHECK:             %[[VAL_9:.*]] = vector.reduction <xor>, %[[VAL_3]] : vector<128xi32> into i32
+// CHECK:             affine.store %[[VAL_9]], %[[ARG1]]{{\[}}%[[VAL_1]]] : memref<256xi32>
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+
+// -----
+
+func.func @vecdim_reduction_minnumf(%in: memref<256x512xf32>, %out: memref<256xf32>) {
+ %cst = arith.constant 0xFF800000 : f32
+ affine.for %i = 0 to 256 {
+   %final_red = affine.for %j = 0 to 512 iter_args(%red_iter = %cst) -> (f32) {
+     %ld = affine.load %in[%i, %j] : memref<256x512xf32>
+     %min = arith.minnumf %red_iter, %ld : f32
+     affine.yield %min : f32
+   }
+   affine.store %final_red, %out[%i] : memref<256xf32>
+ }
+ return
+}
+
+// CHECK-LABEL:   func.func @vecdim_reduction_minnumf(
+// CHECK-SAME:      %[[ARG0:.*]]: memref<256x512xf32>,
+// CHECK-SAME:      %[[ARG1:.*]]: memref<256xf32>) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 0xFF800000 : f32
+// CHECK:           affine.for %[[VAL_1:.*]] = 0 to 256 {
+// CHECK:             %[[VAL_2:.*]] = arith.constant dense<0x7FC00000> : vector<128xf32>
+// CHECK:             %[[VAL_3:.*]] = affine.for %[[VAL_4:.*]] = 0 to 512 step 128 iter_args(%[[VAL_5:.*]] = %[[VAL_2]]) -> (vector<128xf32>) {
+// CHECK:               %[[VAL_6:.*]] = ub.poison : f32
+// CHECK:               %[[VAL_7:.*]] = vector.transfer_read %[[ARG0]]{{\[}}%[[VAL_1]], %[[VAL_4]]], %[[VAL_6]] : memref<256x512xf32>, vector<128xf32>
+// CHECK:               %[[VAL_8:.*]] = arith.minnumf %[[VAL_5]], %[[VAL_7]] : vector<128xf32>
+// CHECK:               affine.yield %[[VAL_8]] : vector<128xf32>
+// CHECK:             }
+// CHECK:             %[[VAL_9:.*]] = vector.reduction <minnumf>, %[[VAL_3]] : vector<128xf32> into f32
+// CHECK:             %[[VAL_10:.*]] = arith.minnumf %[[VAL_9]], %[[VAL_0]] : f32
+// CHECK:             affine.store %[[VAL_10]], %[[ARG1]]{{\[}}%[[VAL_1]]] : memref<256xf32>
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+
+// -----
+
+func.func @vecdim_reduction_maxnumf(%in: memref<256x512xf32>, %out: memref<256xf32>) {
+ %cst = arith.constant 0xFF800000 : f32
+ affine.for %i = 0 to 256 {
+   %final_red = affine.for %j = 0 to 512 iter_args(%red_iter = %cst) -> (f32) {
+     %ld = affine.load %in[%i, %j] : memref<256x512xf32>
+     %max = arith.maxnumf %red_iter, %ld : f32
+     affine.yield %max : f32
+   }
+   affine.store %final_red, %out[%i] : memref<256xf32>
+ }
+ return
+}
+
+// CHECK-LABEL:   func.func @vecdim_reduction_maxnumf(
+// CHECK-SAME:      %[[ARG0:.*]]: memref<256x512xf32>,
+// CHECK-SAME:      %[[ARG1:.*]]: memref<256xf32>) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 0xFF800000 : f32
+// CHECK:           affine.for %[[VAL_1:.*]] = 0 to 256 {
+// CHECK:             %[[VAL_2:.*]] = arith.constant dense<0xFFC00000> : vector<128xf32>
+// CHECK:             %[[VAL_3:.*]] = affine.for %[[VAL_4:.*]] = 0 to 512 step 128 iter_args(%[[VAL_5:.*]] = %[[VAL_2]]) -> (vector<128xf32>) {
+// CHECK:               %[[VAL_6:.*]] = ub.poison : f32
+// CHECK:               %[[VAL_7:.*]] = vector.transfer_read %[[ARG0]]{{\[}}%[[VAL_1]], %[[VAL_4]]], %[[VAL_6]] : memref<256x512xf32>, vector<128xf32>
+// CHECK:               %[[VAL_8:.*]] = arith.maxnumf %[[VAL_5]], %[[VAL_7]] : vector<128xf32>
+// CHECK:               affine.yield %[[VAL_8]] : vector<128xf32>
+// CHECK:             }
+// CHECK:             %[[VAL_9:.*]] = vector.reduction <maxnumf>, %[[VAL_3]] : vector<128xf32> into f32
+// CHECK:             %[[VAL_10:.*]] = arith.maxnumf %[[VAL_9]], %[[VAL_0]] : f32
+// CHECK:             affine.store %[[VAL_10]], %[[ARG1]]{{\[}}%[[VAL_1]]] : memref<256xf32>
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
 
 // -----
 

@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2025

@llvm/pr-subscribers-mlir

Author: Lee Wei (leewei05)

Changes

Support vector reduction for xor/maxnumf/minnumf.


Full diff: https://github.com/llvm/llvm-project/pull/163310.diff

3 Files Affected:

  • (modified) mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp (+3-2)
  • (modified) mlir/lib/Dialect/Vector/IR/VectorOps.cpp (+9-1)
  • (modified) mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir (+100)
diff --git a/mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp b/mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp
index 4d2d8738aa4ad..3d1a73417d1ea 100644
--- a/mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp
+++ b/mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp
@@ -66,9 +66,10 @@ static Value getSupportedReduction(AffineForOp forOp, unsigned pos,
           .Case([](arith::MaxSIOp) { return arith::AtomicRMWKind::maxs; })
           .Case([](arith::MinUIOp) { return arith::AtomicRMWKind::minu; })
           .Case([](arith::MaxUIOp) { return arith::AtomicRMWKind::maxu; })
+          .Case([](arith::XOrIOp) { return arith::AtomicRMWKind::xori; })
+          .Case([](arith::MaxNumFOp) { return arith::AtomicRMWKind::maxnumf; })
+          .Case([](arith::MinNumFOp) { return arith::AtomicRMWKind::minnumf; })
           .Default([](Operation *) -> std::optional<arith::AtomicRMWKind> {
-            // TODO: AtomicRMW supports other kinds of reductions this is
-            // currently not detecting, add those when the need arises.
             return std::nullopt;
           });
   if (!maybeKind)
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index 58256b0ade9f6..e74c72dbcd33e 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -717,7 +717,15 @@ Value mlir::vector::getVectorReductionOp(arith::AtomicRMWKind op,
   case arith::AtomicRMWKind::ori:
     return vector::ReductionOp::create(builder, vector.getLoc(),
                                        CombiningKind::OR, vector);
-  // TODO: Add remaining reduction operations.
+  case arith::AtomicRMWKind::minnumf:
+    return vector::ReductionOp::create(builder, vector.getLoc(),
+                                       CombiningKind::MINNUMF, vector);
+  case arith::AtomicRMWKind::maxnumf:
+    return vector::ReductionOp::create(builder, vector.getLoc(),
+                                       CombiningKind::MAXNUMF, vector);
+  case arith::AtomicRMWKind::xori:
+    return vector::ReductionOp::create(builder, vector.getLoc(),
+                                       CombiningKind::XOR, vector);
   default:
     (void)emitOptionalError(loc, "Reduction operation type not supported");
     break;
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir
index b616632a6fe24..a6db6ec719472 100644
--- a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_reduction.mlir
@@ -243,6 +243,106 @@ func.func @vecdim_reduction_ori(%in: memref<256x512xi32>, %out: memref<256xi32>)
 // CHECK:         affine.store %[[final_red]], %{{.*}} : memref<256xi32>
 // CHECK:       }
 
+// -----
+
+func.func @vecdim_reduction_xori(%in: memref<256x512xi32>, %out: memref<256xi32>) {
+ %cst = arith.constant 0 : i32
+ affine.for %i = 0 to 256 {
+   %final_red = affine.for %j = 0 to 512 iter_args(%red_iter = %cst) -> (i32) {
+     %ld = affine.load %in[%i, %j] : memref<256x512xi32>
+     %xor = arith.xori %red_iter, %ld : i32
+     affine.yield %xor : i32
+   }
+   affine.store %final_red, %out[%i] : memref<256xi32>
+ }
+ return
+}
+
+// CHECK-LABEL:   func.func @vecdim_reduction_xori(
+// CHECK-SAME:      %[[ARG0:.*]]: memref<256x512xi32>,
+// CHECK-SAME:      %[[ARG1:.*]]: memref<256xi32>) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 0 : i32
+// CHECK:           affine.for %[[VAL_1:.*]] = 0 to 256 {
+// CHECK:             %[[VAL_2:.*]] = arith.constant dense<0> : vector<128xi32>
+// CHECK:             %[[VAL_3:.*]] = affine.for %[[VAL_4:.*]] = 0 to 512 step 128 iter_args(%[[VAL_5:.*]] = %[[VAL_2]]) -> (vector<128xi32>) {
+// CHECK:               %[[VAL_6:.*]] = ub.poison : i32
+// CHECK:               %[[VAL_7:.*]] = vector.transfer_read %[[ARG0]]{{\[}}%[[VAL_1]], %[[VAL_4]]], %[[VAL_6]] : memref<256x512xi32>, vector<128xi32>
+// CHECK:               %[[VAL_8:.*]] = arith.xori %[[VAL_5]], %[[VAL_7]] : vector<128xi32>
+// CHECK:               affine.yield %[[VAL_8]] : vector<128xi32>
+// CHECK:             }
+// CHECK:             %[[VAL_9:.*]] = vector.reduction <xor>, %[[VAL_3]] : vector<128xi32> into i32
+// CHECK:             affine.store %[[VAL_9]], %[[ARG1]]{{\[}}%[[VAL_1]]] : memref<256xi32>
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+
+// -----
+
+func.func @vecdim_reduction_minnumf(%in: memref<256x512xf32>, %out: memref<256xf32>) {
+ %cst = arith.constant 0xFF800000 : f32
+ affine.for %i = 0 to 256 {
+   %final_red = affine.for %j = 0 to 512 iter_args(%red_iter = %cst) -> (f32) {
+     %ld = affine.load %in[%i, %j] : memref<256x512xf32>
+     %min = arith.minnumf %red_iter, %ld : f32
+     affine.yield %min : f32
+   }
+   affine.store %final_red, %out[%i] : memref<256xf32>
+ }
+ return
+}
+
+// CHECK-LABEL:   func.func @vecdim_reduction_minnumf(
+// CHECK-SAME:      %[[ARG0:.*]]: memref<256x512xf32>,
+// CHECK-SAME:      %[[ARG1:.*]]: memref<256xf32>) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 0xFF800000 : f32
+// CHECK:           affine.for %[[VAL_1:.*]] = 0 to 256 {
+// CHECK:             %[[VAL_2:.*]] = arith.constant dense<0x7FC00000> : vector<128xf32>
+// CHECK:             %[[VAL_3:.*]] = affine.for %[[VAL_4:.*]] = 0 to 512 step 128 iter_args(%[[VAL_5:.*]] = %[[VAL_2]]) -> (vector<128xf32>) {
+// CHECK:               %[[VAL_6:.*]] = ub.poison : f32
+// CHECK:               %[[VAL_7:.*]] = vector.transfer_read %[[ARG0]]{{\[}}%[[VAL_1]], %[[VAL_4]]], %[[VAL_6]] : memref<256x512xf32>, vector<128xf32>
+// CHECK:               %[[VAL_8:.*]] = arith.minnumf %[[VAL_5]], %[[VAL_7]] : vector<128xf32>
+// CHECK:               affine.yield %[[VAL_8]] : vector<128xf32>
+// CHECK:             }
+// CHECK:             %[[VAL_9:.*]] = vector.reduction <minnumf>, %[[VAL_3]] : vector<128xf32> into f32
+// CHECK:             %[[VAL_10:.*]] = arith.minnumf %[[VAL_9]], %[[VAL_0]] : f32
+// CHECK:             affine.store %[[VAL_10]], %[[ARG1]]{{\[}}%[[VAL_1]]] : memref<256xf32>
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+
+// -----
+
+func.func @vecdim_reduction_maxnumf(%in: memref<256x512xf32>, %out: memref<256xf32>) {
+ %cst = arith.constant 0xFF800000 : f32
+ affine.for %i = 0 to 256 {
+   %final_red = affine.for %j = 0 to 512 iter_args(%red_iter = %cst) -> (f32) {
+     %ld = affine.load %in[%i, %j] : memref<256x512xf32>
+     %max = arith.maxnumf %red_iter, %ld : f32
+     affine.yield %max : f32
+   }
+   affine.store %final_red, %out[%i] : memref<256xf32>
+ }
+ return
+}
+
+// CHECK-LABEL:   func.func @vecdim_reduction_maxnumf(
+// CHECK-SAME:      %[[ARG0:.*]]: memref<256x512xf32>,
+// CHECK-SAME:      %[[ARG1:.*]]: memref<256xf32>) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 0xFF800000 : f32
+// CHECK:           affine.for %[[VAL_1:.*]] = 0 to 256 {
+// CHECK:             %[[VAL_2:.*]] = arith.constant dense<0xFFC00000> : vector<128xf32>
+// CHECK:             %[[VAL_3:.*]] = affine.for %[[VAL_4:.*]] = 0 to 512 step 128 iter_args(%[[VAL_5:.*]] = %[[VAL_2]]) -> (vector<128xf32>) {
+// CHECK:               %[[VAL_6:.*]] = ub.poison : f32
+// CHECK:               %[[VAL_7:.*]] = vector.transfer_read %[[ARG0]]{{\[}}%[[VAL_1]], %[[VAL_4]]], %[[VAL_6]] : memref<256x512xf32>, vector<128xf32>
+// CHECK:               %[[VAL_8:.*]] = arith.maxnumf %[[VAL_5]], %[[VAL_7]] : vector<128xf32>
+// CHECK:               affine.yield %[[VAL_8]] : vector<128xf32>
+// CHECK:             }
+// CHECK:             %[[VAL_9:.*]] = vector.reduction <maxnumf>, %[[VAL_3]] : vector<128xf32> into f32
+// CHECK:             %[[VAL_10:.*]] = arith.maxnumf %[[VAL_9]], %[[VAL_0]] : f32
+// CHECK:             affine.store %[[VAL_10]], %[[ARG1]]{{\[}}%[[VAL_1]]] : memref<256xf32>
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
 
 // -----
 

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution!

I am not familiar with the Affine part, but the tests that you added look like a pretty straightforward extension of the existing tests.

That said, for a change like this I would expect more comprehensive testing. Here are two places where new tests should be added:

  • // CHECK-LABEL: @reduce_fp
    func.func @reduce_fp(%arg0: vector<16xf32>, %arg1: f32) -> f32 {
    // CHECK: vector.reduction <add>, %{{.*}} : vector<16xf32> into f32
    vector.reduction <add>, %arg0 : vector<16xf32> into f32
    // CHECK: vector.reduction <add>, %{{.*}}, %{{.*}} : vector<16xf32> into f32
    vector.reduction <add>, %arg0, %arg1 : vector<16xf32> into f32
    // CHECK: vector.reduction <mul>, %{{.*}} : vector<16xf32> into f32
    vector.reduction <mul>, %arg0 : vector<16xf32> into f32
    // CHECK: vector.reduction <mul>, %{{.*}}, %{{.*}} : vector<16xf32> into f32
    vector.reduction <mul>, %arg0, %arg1 : vector<16xf32> into f32
    // CHECK: vector.reduction <minnumf>, %{{.*}} : vector<16xf32> into f32
    vector.reduction <minnumf>, %arg0 : vector<16xf32> into f32
    // CHECK: %[[X0:.*]] = vector.reduction <maxnumf>, %{{.*}} : vector<16xf32> into f32
    %0 = vector.reduction <maxnumf>, %arg0 : vector<16xf32> into f32
    // CHECK: vector.reduction <minimumf>, %{{.*}} : vector<16xf32> into f32
    vector.reduction <minimumf>, %arg0 : vector<16xf32> into f32
    // CHECK: %[[X1:.*]] = vector.reduction <maximumf>, %{{.*}} : vector<16xf32> into f32
    %1 = vector.reduction <maximumf>, %arg0 : vector<16xf32> into f32
    // CHECK: return %[[X0]] : f32
    return %0 : f32
    }
  • func.func @reduce_2x_f32(%arg0: vector<2xf32>) -> (f32, f32, f32, f32, f32, f32) {
    %0 = vector.reduction <add>, %arg0 : vector<2xf32> into f32
    %1 = vector.reduction <mul>, %arg0 : vector<2xf32> into f32
    %2 = vector.reduction <minnumf>, %arg0 : vector<2xf32> into f32
    %3 = vector.reduction <maxnumf>, %arg0 : vector<2xf32> into f32
    %4 = vector.reduction <minimumf>, %arg0 : vector<2xf32> into f32
    %5 = vector.reduction <maximumf>, %arg0 : vector<2xf32> into f32
    return %0, %1, %2, %3, %4, %5 : f32, f32, f32, f32, f32, f32
    }

There's definitely more :) (*)

Thanks!

(*) To clarify, I am not suggesting that you add these new variants everywhere vector.reduction is used. However, please update tests that enumerate over all the existing options.

Comment on lines 261 to 277
// CHECK-LABEL: func.func @vecdim_reduction_xori(
// CHECK-SAME: %[[ARG0:.*]]: memref<256x512xi32>,
// CHECK-SAME: %[[ARG1:.*]]: memref<256xi32>) {
// CHECK: %[[VAL_0:.*]] = arith.constant 0 : i32
// CHECK: affine.for %[[VAL_1:.*]] = 0 to 256 {
// CHECK: %[[VAL_2:.*]] = arith.constant dense<0> : vector<128xi32>
// CHECK: %[[VAL_3:.*]] = affine.for %[[VAL_4:.*]] = 0 to 512 step 128 iter_args(%[[VAL_5:.*]] = %[[VAL_2]]) -> (vector<128xi32>) {
// CHECK: %[[VAL_6:.*]] = ub.poison : i32
// CHECK: %[[VAL_7:.*]] = vector.transfer_read %[[ARG0]]{{\[}}%[[VAL_1]], %[[VAL_4]]], %[[VAL_6]] : memref<256x512xi32>, vector<128xi32>
// CHECK: %[[VAL_8:.*]] = arith.xori %[[VAL_5]], %[[VAL_7]] : vector<128xi32>
// CHECK: affine.yield %[[VAL_8]] : vector<128xi32>
// CHECK: }
// CHECK: %[[VAL_9:.*]] = vector.reduction <xor>, %[[VAL_3]] : vector<128xi32> into i32
// CHECK: affine.store %[[VAL_9]], %[[ARG1]]{{\[}}%[[VAL_1]]] : memref<256xi32>
// CHECK: }
// CHECK: return
// CHECK: }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use meaningful FileCheck variable names. See comprehensive guidelines here: https://mlir.llvm.org/getting_started/TestingGuide/

@leewei05
Copy link
Contributor Author

(*) To clarify, I am not suggesting that you add these new variants everywhere vector.reduction is used. However, please update tests that enumerate over all the existing options.

Thanks! Just to clarify, I'm new to MLIR, and I want to know what is the suitable test coverage for a single PR. Let's take this PR as an example, I should focus on covering the tests in Vector Dialect and places where they use vector.reduction. What about those in Integration Dialect and in Conversion tests. I assume they should be in separate PRs, right? Since conversion involves in lowering to SPIR-V or LLVM dialects, and integration tests involve mlir-runner.

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The SuperVectorizer is not under active development but pieces added to common infrastructure make sense to me.

@banach-space
Copy link
Contributor

Just to clarify, I'm new to MLIR, and I want to know what is the suitable test coverage for a single PR.

That's a good question and the answer will vary from one PR to another.

Let's take this PR as an example, I should focus on covering the tests in Vector Dialect and places where they use vector.reduction.

Indeed. Specifically, when touching anything to do with Op definition, I would expect to see tests in:

In addition, I would cover the conversion tests (unless that requires extra code that's not included in a PR). So yes, ideally you would add tests for LLVM and SPIR-V.

Next, there are transformation tests. For that, I'd focus on tests that target vector.reduction specifically, e.g.:

As for integration tests, these are (from https://mlir.llvm.org/getting_started/TestingGuide/#integration-tests):

for cases that are hard to verify otherwise, e.g. when composing and testing complex compilation pipelines

TBH, you are hitting a pretty rare case. In principle, most tests relevant to vector.reduction should be updated, but then you would be penalised for our negligence in the past 😅 That's not needed IMHO, and Diego seems to agree (given his approval).

Hopefully this makes sense and you don't mind the extra work :)

@leewei05
Copy link
Contributor Author

TBH, you are hitting a pretty rare case. In principle, most tests relevant to vector.reduction should be updated, but then you would be penalised for our negligence in the past 😅 That's not needed IMHO, and Diego seems to agree (given his approval).

Hopefully this makes sense and you don't mind the extra work :)

@banach-space Thank you for the reply! I was following this commit 059ee5d, so I thought the tests were sufficient. Anyways, I'll spend some time to cover enough tests for this PR. Thanks again!

@leewei05
Copy link
Contributor Author

Tests for minnumf, maxnumf, xor ops are already added in:
https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Vector/ops.mlir
https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Vector/invalid.mlir#L1176
https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Vector/break-down-vector-reduction.mlir#L19

As for conversion tests to LLVM, they are already in it.
https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/VectorToLLVM/vector-reduction-to-llvm.mlir#L133

For conversion to SPIR-V, xor reduction op is missing. I'll add the implementation and tests in a separate PR. As for integration test, I'll create a follow-up PR as well.

@leewei05 leewei05 requested a review from banach-space October 23, 2025 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants