Commit 3d33f74
authored
[mxfp] adjust num_stages for bf16/fp16 x mxfp (#8773)
For fp16/bf16 x mxfp, we upcast weight on the fly, so we should size
smem_capacity accordingly.
w/o thischange , gets the following error:
"triton.runtime.errors.OutOfResources: out of resource: shared memory,
Required: 263356, Hardware limit: 232448. Reducing block sizes or
`num_stages` may help"
for x.shape = [2048, 5120] bf16 x [32, 5120, 5120] float8_e4m3fn
block_m=64, block_n=256, block_k=128, split_k=1, is_persistent=True ->
leading to num_stages=4
# New contributor declaration
- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.
- Select one of the following.
- [ ] I have added tests.
- `/test` for `lit` tests
- `/unittest` for C++ tests
- `/python/test` for end-to-end tests
- [x] This PR does not need a test because wasn't able to find a shape
that runs reliably w/o OOMs. The example shape above 32 x 5120 x 5120 is
too big. Will try to see if I can enable only on GB200.
- Select one of the following.
- [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
and using the instructions it generates is not minimal.)1 parent 3ac3994 commit 3d33f74
File tree
1 file changed
+17
-3
lines changed- python/triton_kernels/triton_kernels/matmul_details/opt_flags_details
1 file changed
+17
-3
lines changedLines changed: 17 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
1 | 3 | | |
2 | 4 | | |
3 | 5 | | |
4 | 6 | | |
5 | | - | |
6 | | - | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| |||
98 | 99 | | |
99 | 100 | | |
100 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
101 | 110 | | |
102 | 111 | | |
103 | 112 | | |
| |||
132 | 141 | | |
133 | 142 | | |
134 | 143 | | |
135 | | - | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
136 | 150 | | |
0 commit comments