Commit eac6d6e
committed
AMDGPU: Custom lower vector fptrunc of f32 -> f16
GFx950+ supports v_cvt_pk_f16_f32. However current implementation
of vector fptrunc lowering fully scalarizes the vector, and the scalar
conversions may not always be combined to generate the packed one.
We made v2f32 -> v2f16 legal in #139956.
This work is an extension to handle wider vectors. Instead of fully scalarization,
we split the vector to packs (v2f32 -> v2f16) to ensure the packed conversion can always
been generated.
NOTE: Use .clampMaxNumElements(0, S16, 2)1 parent 833b904 commit eac6d6e
File tree
2 files changed
+10
-29
lines changed- llvm
- lib/Target/AMDGPU
- test/CodeGen/AMDGPU
2 files changed
+10
-29
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
223 | 223 | | |
224 | 224 | | |
225 | 225 | | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | 226 | | |
234 | 227 | | |
235 | 228 | | |
| |||
1070 | 1063 | | |
1071 | 1064 | | |
1072 | 1065 | | |
1073 | | - | |
1074 | | - | |
1075 | | - | |
| 1066 | + | |
1076 | 1067 | | |
1077 | 1068 | | |
1078 | 1069 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
32 | 23 | | |
33 | 24 | | |
34 | 25 | | |
| |||
102 | 93 | | |
103 | 94 | | |
104 | 95 | | |
105 | | - | |
106 | | - | |
107 | 96 | | |
108 | | - | |
| 97 | + | |
| 98 | + | |
109 | 99 | | |
110 | 100 | | |
111 | 101 | | |
| |||
0 commit comments