Skip to content

[MLIR] convert-linalg-to-parallel-loops doesn't seem to parallelize reductions #138595

@brian-kelley

Description

@brian-kelley

I have a project that uses MLIR to lower linalg/tensor code to scf/memref. Ideally the resulting code exposes all possible parallelism from the input operations. One of the passes I'm using is --convert-linalg-to-parallel-loops, but it falls back to a sequential scf.for loop for reduction dimensions, instead of scf.parallel/scf.reduce.

Below are a couple of simple examples (dot and matmul) that turn the reduction dimension into scf.for loops. To replicate, run them through mlir-opt --convert-linalg-to-parallel-loops.

Is there another way to generate parallel loops from linalg ops that will keep reductions parallel? One alternative I tried is --convert-linalg-to-affine-loops --affine-parallelize, but this also parallelizes over the non-reduction dimensions only (and it seems like the affine dialect can't express a parallel reduction anyway).

dot.mlir:

module {
func.func @dot(%arg0: memref<?xf64>, %arg1: memref<?xf64>) -> f64 {
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<f64>
  linalg.dot ins(%arg0, %arg1 : memref<?xf64>, memref<?xf64>) outs(%alloc : memref<f64>)
  %0 = memref.load %alloc[] : memref<f64>
  return %0 : f64
}
}

result:

module {
  func.func @dot(%arg0: memref<?xf64>, %arg1: memref<?xf64>) -> f64 {
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<f64>
    %dim = memref.dim %arg0, %c0 : memref<?xf64>
    scf.for %arg2 = %c0 to %dim step %c1 {
      %1 = memref.load %arg0[%arg2] : memref<?xf64>
      %2 = memref.load %arg1[%arg2] : memref<?xf64>
      %3 = memref.load %alloc[] : memref<f64>
      %4 = arith.mulf %1, %2 : f64
      %5 = arith.addf %3, %4 : f64
      memref.store %5, %alloc[] : memref<f64>
    }
    %0 = memref.load %alloc[] : memref<f64>
    return %0 : f64
  }
}

matmul.mlir:

module {
func.func @dot(%arg0: memref<64x64xf64>, %arg1: memref<64x64xf64>) -> memref<64x64xf64> {
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<64x64xf64>
  linalg.matmul ins(%arg0, %arg1 : memref<64x64xf64>, memref<64x64xf64>) outs(%alloc : memref<64x64xf64>)
  return %alloc : memref<64x64xf64>
}
}

result:

module {
  func.func @dot(%arg0: memref<64x64xf64>, %arg1: memref<64x64xf64>) -> memref<64x64xf64> {
    %c1 = arith.constant 1 : index
    %c64 = arith.constant 64 : index
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<64x64xf64>
    scf.parallel (%arg2, %arg3) = (%c0, %c0) to (%c64, %c64) step (%c1, %c1) {
      scf.for %arg4 = %c0 to %c64 step %c1 {
        %0 = memref.load %arg0[%arg2, %arg4] : memref<64x64xf64>
        %1 = memref.load %arg1[%arg4, %arg3] : memref<64x64xf64>
        %2 = memref.load %alloc[%arg2, %arg3] : memref<64x64xf64>
        %3 = arith.mulf %0, %1 : f64
        %4 = arith.addf %2, %3 : f64
        memref.store %4, %alloc[%arg2, %arg3] : memref<64x64xf64>
      }
      scf.reduce 
    }
    return %alloc : memref<64x64xf64>
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions