You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[WARNINGS] Emit warning for WGMMA fp8 dot when transposition prevents pipelining (#6875)
**TL;DR**: For fp8 WGMMA matmuls, If input tensors are not in a specific
transposed format in global memory (row-major A, col-major B),
pipelining will be disabled. Emit a warning for these cases.
If you run an fp8 matmul (e.g. 03-matrix-multiplication) with the B
matrix in row-major format (e.g.
https://gist.github.com/davidberard98/21fcee4a46192a1a756a458dfc3669fe),
and use MLIR_ENABLE_DIAGNOSTICS=warnings, then a warning like this one
will be emitted:
```
/home/dberard/fbcode/scripts/dberard/triton/fp8_mm.py:171:35: warning: Warning: Forcing a different order [0, 1] on SMEM than the register order for the operand 1. Registers will be transposed before SMEM store and the pipelined load for this operand will be disabled, so poor performance is expected.
accumulator = tl.dot(a, b, accumulator)
```
Since this is a user-facing restriction that has significant
implications on the performance of fp8 matmuls, I think it makes sense
to make this a warning.
Note: This warning already exists for MMAv5; this PR just plumbs the
required info into the getSharedMemoryMMAOperand function so that
diagnostics can be emitted:
https://github.com/triton-lang/triton/blob/7dc549208aa3ce30612fe884bc4723f95f4b40b1/lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp#L188-L195
0 commit comments