-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Open
Labels
Description
For code like this: Zig Godbolt
export fn foo(x: @Vector(64, u8), y: @Vector(64, u8)) @Vector(64, u8) {
return @select(u8, x < @as(@Vector(64, u8), @splat(0x80)),
y,
@as(@Vector(64, u8), @splat(0)),
);
}
LLVM version: (Godbolt)
define dso_local <64 x i8> @foo(<64 x i8> %0, <64 x i8> %1) local_unnamed_addr {
Entry:
%.inv = icmp slt <64 x i8> %0, zeroinitializer
%2 = select <64 x i1> %.inv, <64 x i8> zeroinitializer, <64 x i8> %1
ret <64 x i8> %2
}
We used to get:
vpmovb2m k0, zmm0
vpmovm2b zmm0, k0
vpandnq zmm0, zmm0, zmm1
Now we get:
vpmovb2m k0, zmm0
knotq k1, k0
vmovdqu8 zmm0 {k1} {z}, zmm1
However, I thought it might be a good idea in some situations to use this technique:
vpternlogd zmm2, zmm2, zmm2, 255
vpshufb zmm0, zmm2, zmm0
vpandq zmm0, zmm0, zmm1