You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This patch adds an intrinsic to convert float to tf32.
* This intrinsic uses flags for rounding and saturation
modes as well as relu. The backend looks through these
flags and lowers to the appropriate instruction.
* Docs are updated to describe the usage of the flag arguments.
* Lit tests are added for all the combinations.
Note: We already have an intrinsic 'llvm.nvvm.f2tf32.rna'
which caters only to one variant of the PTX instruction. Once
this change lands, I will submit a follow-up PR to auto-upgrade
it to use the generic variant.
PTX Spec link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
Signed-off-by: Durgadoss R <[email protected]>
0 commit comments