Commit 833b904
committed
AMDGPU: Custom lower vector fptrunc of f32 -> f16
GFx950+ supports v_cvt_pk_f16_f32. However current implementation
of vector fptrunc lowering fully scalarizes the vector, and the scalar
conversions may not always be combined to generate the packed one.
We made v2f32 -> v2f16 legal in #139956.
This work is an extension to handle wider vectors. Instead of fully scalarization,
we split the vector to packs (v2f32 -> v2f16) to ensure the packed conversion can always
been generated.
NOTE:
minor changes1 parent 4ecace6 commit 833b904
1 file changed
+2
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6906 | 6906 | | |
6907 | 6907 | | |
6908 | 6908 | | |
6909 | | - | |
| 6909 | + | |
6910 | 6910 | | |
6911 | 6911 | | |
6912 | 6912 | | |
| |||
6930 | 6930 | | |
6931 | 6931 | | |
6932 | 6932 | | |
6933 | | - | |
| 6933 | + | |
6934 | 6934 | | |
6935 | 6935 | | |
6936 | 6936 | | |
| |||
0 commit comments