Commit 3d83665
committed
[AMDGPU][PeepholeOpt] Eliminate unnecessary packing in fp16 vector
operations for SDWA/OPSEL-able instruction
As the compiler has no fp16 packed instruction,
so isel scalarizes each fp16 operation in wide fp16 vectors and generates
separate individual fp16 results, which are later packed. Now, in post-
isel pass in SIPeepholeSDWA pass, opportunistically any instructions is
eventually converted into its SDWA/OPSEL-able version.
This patch gets rids of unnecessary packing in wider fp16 vectors
operation for SDWA/OPSEL-able instruction, by overwriting the partial
fp16 result into same input register partially, while maintaining the
sanctity of rest of bits in input register, using OPSEL dst_unused
operand set as UNUSED_PRESERVED. Owing to the context of generating SDWA
instructions, it is invoked at the end of the SIPeepholeSDWA pass.1 parent 45a3056 commit 3d83665
1 file changed
+514
-2
lines changed
0 commit comments