Commit 77ba5d7
authored
[AMD] Enable ds_read_tr for fp4 packed along K (#7481)
This was defensively disabled in a previous commit but has been verified
to work fine.
FP4 when packed along K dimension needs to use ds_read_tr8 when loaded
from shared memory and transpose is needed. This is because packing
needs to stay the same so we need to operate on FP4 as if they were i8
types, this way we don't change the packing order.
Note: the LIT test that I've added is to show what the previous
behaviour was in comparison to current. The code was explicitly checking
dot_scaled usage so I've written the test to show the new behaviour
based on that. Although new behaviour doesn't need to look at dot_scaled
anymore.1 parent c944014 commit 77ba5d7
File tree
2 files changed
+14
-8
lines changed- test/Conversion/amd
- third_party/amd/lib/TritonAMDGPUToLLVM
2 files changed
+14
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
367 | 367 | | |
368 | 368 | | |
369 | 369 | | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
370 | 381 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
224 | 219 | | |
225 | 220 | | |
226 | 221 | | |
| |||
0 commit comments