Skip to content

Conversation

@makslevental
Copy link
Contributor

@makslevental makslevental commented Mar 26, 2025

Waiting on llvm/llvm-project#133151 upstream.

This PR adds a pattern that folds "true" arith.cmpi operations to arith.constant true; e.g.

%c0 = arith.constant 0 : i32
%c1024_i32 = arith.constant 1024 : i32
%cmpsge = arith.cmpi sge, %c1024_i32, %c0 : i32

->

%cmpsge = arith.constant true

(after DCE).

The specific use case is "unguarding" the epilogue in pipelined loops (e.g., as produced by tritonamdgpu-stream-pipeline). So e.g.,

tt.func @assume_matmul(%arg0: index, %arg1: index, %arg2: index, %arg3: !tt.ptr<f16>, %arg4: !tt.ptr<f16>) -> tensor<128x128xf32, #mma> {
  ...
  %20:6 = scf.for ... {
    scf.yield ...
  }
  ...
  %27 = arith.cmpi sge, %26, %c1 : index
  %31 = scf.if %27 -> (tensor<128x128xf32, #mma>) {
    %33 = tt.dot %28, %30, %20#2
    scf.yield %33 : tensor<128x128xf32, #mma>
  } else {
    scf.yield %20#2 : tensor<128x128xf32, #mma>
  }
  %32 = arith.select %27, %31, %20#2 : tensor<128x128xf32, #mma>
  ttg.local_dealloc %10 : !ttg.memdesc<1x128x32xf16, #shared, #smem, mutable>
  ttg.local_dealloc %11 : !ttg.memdesc<1x32x128xf16, #shared1, #smem, mutable>
  tt.return %32 : tensor<128x128xf32, #mma>
}

becomes

tt.func @assume_matmul(%arg0: index, %arg1: index, %arg2: index, %arg3: !tt.ptr<f16>, %arg4: !tt.ptr<f16>) -> tensor<128x128xf32, #mma> {
  ...
  %20:6 = scf.for ... {
    scf.yield ... 
  }
  %21 = ttg.local_load %20#4
  %22 = ttg.local_load %20#5
  %23 = arith.mulf %22, %cst
  %24 = tt.dot %21, %23, %20#2
  ttg.local_dealloc %10 : !ttg.memdesc<1x128x32xf16, #shared, #smem, mutable>
  ttg.local_dealloc %11 : !ttg.memdesc<1x32x128xf16, #shared1, #smem, mutable>
  tt.return %24 : tensor<128x128xf32, #mma>
}

Notice both the scf.if and arith.select are canonicalized away.

Note, this usually requires the use of tl.assume to hint/constrain the operands of the arith.cmpi; specifically wrt the original loop bounds something like %stop // %step >= 1 (or whatever the arithmetic on the loop bounds needs to be...).

Currently this is failing because I need to cherry-pick/PR an LLVM bump.

Waiting on #6334.

@makslevental makslevental force-pushed the makslevental/loop-epilogue-range-canon branch from ca93969 to 15df1ce Compare March 26, 2025 20:22
@makslevental makslevental marked this pull request as ready for review March 28, 2025 20:05
@makslevental makslevental force-pushed the makslevental/loop-epilogue-range-canon branch 2 times, most recently from dcbd67c to 987cfd6 Compare March 29, 2025 00:35
@makslevental
Copy link
Contributor Author

Some kind of bug around here https://github.com/llvm/llvm-project/blob/8726e973459d93d34653946ba1e01ad198cdf11f/mlir/lib/Dialect/Arith/Transforms/IntRangeOptimizations.cpp#L56-L81 related to how the constant is materialized. Will figure it out next week.

@makslevental
Copy link
Contributor Author

makslevental commented Mar 29, 2025

Upstream bug fix: llvm/llvm-project#133556

@makslevental
Copy link
Contributor Author

Same failure as here #6343 - related to a recent change @Mogball made upstream also to range analysis.

@Mogball
Copy link
Collaborator

Mogball commented Mar 29, 2025

I put a fix in the branch. IntRangeAnalysis will now return a dummy return for noninteger values, because it has to return something.

@makslevental makslevental force-pushed the makslevental/loop-epilogue-range-canon branch from 1d4b9a6 to 9e80a42 Compare March 31, 2025 19:35
@makslevental makslevental force-pushed the makslevental/loop-epilogue-range-canon branch 2 times, most recently from 27261f4 to 9803f17 Compare April 1, 2025 21:01
@makslevental makslevental force-pushed the makslevental/loop-epilogue-range-canon branch from 9803f17 to 9bc5cec Compare April 1, 2025 21:05
@antiagainst antiagainst merged commit 0315d72 into triton-lang:main Apr 1, 2025
8 checks passed
@makslevental makslevental deleted the makslevental/loop-epilogue-range-canon branch April 1, 2025 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants