You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[NVPTX] Improve support for {ex2,lg2}.approx (#120519)
- Add support for `@llvm.exp2()`:
- LLVM: `float` -> PTX: `ex2.approx{.ftz}.f32`
- LLVM: `half` -> PTX: `ex2.approx.f16`
- LLVM: `<2 x half>` -> PTX: `ex2.approx.f16x2`
- LLVM: `bfloat` -> PTX: `ex2.approx.ftz.bf16`
- LLVM: `<2 x bfloat>` -> PTX: `ex2.approx.ftz.bf16x2`
- Any operations with non-native vector widths are expanded. On
targets not supporting f16/bf16, values are promoted to f32.
- Add *CONDITIONAL* support for `@llvm.log2()` [^1]:
- LLVM: `float` -> PTX: `lg2.approx{.ftz}.f32`
- Support for f16/bf16 is emulated by promoting values to f32.
[1]: CUDA implements `exp2()` with `ex2.approx` but `log2()` is
implemented differently, so this is off by default. To enable, use the
flag `-nvptx-approx-log2f32`.
0 commit comments