-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Closed
Labels
Description
I have code like so:
export fn foo(x: @Vector(8, u8)) @Vector(64, u8) {
return @splat(x[1]);
}Here is the LLVM version:
define dso_local <64 x i8> @foo(<8 x i8> %0) local_unnamed_addr {
Entry:
%1 = shufflevector <8 x i8> %0, <8 x i8> poison, <64 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
ret <64 x i8> %1
}Here is how it lowers on Zen 5:
.LCPI0_1:
.byte 1
foo:
vpbroadcastb zmm1, byte ptr [rip + .LCPI0_1]
vpermb zmm0, zmm1, zmm0
retHere is how I think it should lower:
foo:
vpsrlq xmm0, xmm0, 8
vpbroadcastb zmm0, xmm0
retSame applies to broadcasting into an xmm0:
.LCPI0_0:
.zero 16,1
foo:
vpshufb xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
retI would much rather avoid the trip to memory:
foo:
vpsrlq xmm0, xmm0, 8
vpbroadcastb xmm0, xmm0
ret