Skip to content

Commit 3d83665

Browse files
committed
[AMDGPU][PeepholeOpt] Eliminate unnecessary packing in fp16 vector
operations for SDWA/OPSEL-able instruction As the compiler has no fp16 packed instruction, so isel scalarizes each fp16 operation in wide fp16 vectors and generates separate individual fp16 results, which are later packed. Now, in post- isel pass in SIPeepholeSDWA pass, opportunistically any instructions is eventually converted into its SDWA/OPSEL-able version. This patch gets rids of unnecessary packing in wider fp16 vectors operation for SDWA/OPSEL-able instruction, by overwriting the partial fp16 result into same input register partially, while maintaining the sanctity of rest of bits in input register, using OPSEL dst_unused operand set as UNUSED_PRESERVED. Owing to the context of generating SDWA instructions, it is invoked at the end of the SIPeepholeSDWA pass.
1 parent 45a3056 commit 3d83665

File tree

1 file changed

+514
-2
lines changed

1 file changed

+514
-2
lines changed

0 commit comments

Comments
 (0)