-
Notifications
You must be signed in to change notification settings - Fork 15k
Closed
Labels
Description
#151153 Introduced a new math.clampf op which has an efficient lowering on amdgpu hardware (atleast on gfx9+) as V_MED3_F16 (f16 for example, there are other types supported).
We should add support for this by introducing a new rocdl.med3 op and lowering math.clampf to it.
Example of the op being used in composable kernels for clamping before f16 -> f8 conversion: https://github.com/ROCm/composable_kernel/blob/83f607e2a68d778479ddd34fd5aac3d4e7c9e52f/include/ck/utility/amd_ck_fp8.hpp#L651
The conversion should look like:
rocdl.med3.f16
rocdl.med3.f32
rocdl.med3.i16
rocdl.med3.i32
^
|
amdgpu.med3
^
|
math.clampf
This would probably be 3 prs:
- Add rocdl ops and conversion to LLVM
- Add amdgpu.med3 and lower it to rocdl ops
- Add math.clampf -> amdgpu.med3 lowering