You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SWP] Remove redundant SMEM encoding creation for MMAv3 (triton-lang#5640)
When we determine the SMEM encoding for a multi-buffered SMEM, we should
reuse the encoding of the operand SMEM created by `AccelerateMatmul`. We
do have such logic in the code, but currently there is additional
MMAv3-specific code path before it that creates a fresh encoding which,
in practice, always coincides with the existing operand encoding.
https://github.com/triton-lang/triton/blob/main/lib/Dialect/TritonGPU/Transforms/Pipeliner/MatmulLoopPipeline.cpp#L337-L361
The exception to this is multi-buffering of TMA load. `AccelerateMatmul`
may create an encoding [whose `order` is an transpose of the register
`order`](
https://github.com/triton-lang/triton/blob/main/lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp#L151-L159).
We cannot use such encoding as the destination of TMA. So for TMA load,
we always create a new encoding that's known to be compatible to it in
SWP .
(When TMA and MMA operand encodings are different and the TMA one is not
compatible with MMA, e.g. MMAv3 with row-major fp8 RHS, SWP ends up
making an invalid program due to the overwriting by the TMA layout. We
should not pipeline TMA load in such case. This is a bug that should be
fixed)
This work is mostly nit for the current main, but it is motivated for a
case where we want to create a new kind of SMEM encoding representing a
more complicated layout. Ideally, we only want to do that once in
`AccelerateMatmul` and reuse that in SWP rather than repeating the same
code there.
cc @ThomasRaoux@pawelszczerbuk@csullivan@mbrookhart
<!---
The core Triton is a small number of people, and we receive many PRs
(thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the
following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.
-->
# New contributor declaration
- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.
- Select one of the following.
- [x] I have added tests.
- `/test` for `lit` tests
- `/unittest` for C++ tests
- `/python/test` for end-to-end tests
- [ ] This PR does not need a test because `FILL THIS IN`.
- Select one of the following.
- [ ] I have not added any `lit` tests.
- [x] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
and using the instructions it generates is not minimal.)
---------
Co-authored-by: Masahiro Masuda <[email protected]>
0 commit comments