Commit 4f2e5e4
committed
[X86][AVX512] Better lowering for
For the function (https://godbolt.org/z/4rTYeMY4b)
```
#include <immintrin.h>
__m512i foo(__m512i a){
__m512i r0 = _mm512_maskz_shuffle_epi32(0xaaaa, a, 0xab);
return r0;
}
```
The assembly generated is unnecessarily long
```
.LCPI0_1:
.byte 0
.byte 18
.byte 2
.byte 18
.byte 4
.byte 22
.byte 6
.byte 22
.byte 8
.byte 26
.byte 10
.byte 26
.byte 12
.byte 30
.byte 14
.byte 30
foo(long long vector[8]):
vpmovsxbd zmm2, xmmword ptr [rip + .LCPI0_1]
vpxor xmm1, xmm1, xmm1
vpermt2d zmm1, zmm2, zmm0
vmovdqa64 zmm0, zmm1
ret
```
Instead we could simply generate a `vpshufd {{.*#+}} zmm0 {%k1} {z}` instruction and pass the mask and the `imm8` value to it.
The selection dag generated from the IR doesn't contain the mask and the `imm8` value directly but there is a pattern that can be matched here.
```
t6: v16i32 = BUILD_VECTOR Constant:i32<0>, undef:i32, Constant:i32<0>, undef:i32, Constant:i32<0>, undef:i32, Constant:i32<0>, undef:i32, Constant:i32<0>, undef:i32, Constant:i32<0>, undef:i32, Constant:i32<0>, undef:i32, Constant:i32<0>, undef:i32
t2: v8i64,ch = CopyFromReg t0, Register:v8i64 %0
t3: v16i32 = bitcast t2
t7: v16i32 = vector_shuffle<0,18,2,18,4,22,6,22,8,26,10,26,12,30,14,30> t6, t3
t8: v8i64 = bitcast t7
```
I've tried to match this pattern to get the value of the mask and imm8, and generate a `VSELECT` node.
The resulting assembly looks like
```
movw $-21846, %ax # imm = 0xAAAA
kmovw %eax, %k1
vpshufd $136, %zmm0, %zmm0 {%k1} {z} # zmm0 {%k1} {z} = zmm0[0,2,0,2,4,6,4,6,8,10,8,10,12,14,12,14]
retq
```_mm512_maskz_shuffle_epi32
1 parent cbe583b commit 4f2e5e4
1 file changed
+55
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17172 | 17172 | | |
17173 | 17173 | | |
17174 | 17174 | | |
| 17175 | + | |
| 17176 | + | |
| 17177 | + | |
| 17178 | + | |
| 17179 | + | |
| 17180 | + | |
| 17181 | + | |
| 17182 | + | |
| 17183 | + | |
| 17184 | + | |
| 17185 | + | |
| 17186 | + | |
| 17187 | + | |
| 17188 | + | |
| 17189 | + | |
| 17190 | + | |
| 17191 | + | |
| 17192 | + | |
| 17193 | + | |
| 17194 | + | |
| 17195 | + | |
| 17196 | + | |
| 17197 | + | |
| 17198 | + | |
| 17199 | + | |
| 17200 | + | |
| 17201 | + | |
| 17202 | + | |
| 17203 | + | |
| 17204 | + | |
| 17205 | + | |
| 17206 | + | |
| 17207 | + | |
| 17208 | + | |
| 17209 | + | |
| 17210 | + | |
| 17211 | + | |
| 17212 | + | |
| 17213 | + | |
| 17214 | + | |
| 17215 | + | |
| 17216 | + | |
| 17217 | + | |
| 17218 | + | |
| 17219 | + | |
| 17220 | + | |
| 17221 | + | |
| 17222 | + | |
| 17223 | + | |
| 17224 | + | |
| 17225 | + | |
| 17226 | + | |
17175 | 17227 | | |
17176 | 17228 | | |
17177 | 17229 | | |
| |||
17217 | 17269 | | |
17218 | 17270 | | |
17219 | 17271 | | |
| 17272 | + | |
| 17273 | + | |
| 17274 | + | |
17220 | 17275 | | |
17221 | 17276 | | |
17222 | 17277 | | |
| |||
0 commit comments