Skip to content

[AVX-512] Consider using vpshufb(@splat(0xFF), x) when selecting on x < 0x80 #152900

@Validark

Description

@Validark

For code like this: Zig Godbolt

export fn foo(x: @Vector(64, u8), y: @Vector(64, u8)) @Vector(64, u8) {
    return @select(u8, x < @as(@Vector(64, u8), @splat(0x80)),
        y,
        @as(@Vector(64, u8), @splat(0)),
    );
}

LLVM version: (Godbolt)

define dso_local <64 x i8> @foo(<64 x i8> %0, <64 x i8> %1) local_unnamed_addr {
Entry:
  %.inv = icmp slt <64 x i8> %0, zeroinitializer
  %2 = select <64 x i1> %.inv, <64 x i8> zeroinitializer, <64 x i8> %1
  ret <64 x i8> %2
}

We used to get:

        vpmovb2m        k0, zmm0
        vpmovm2b        zmm0, k0
        vpandnq zmm0, zmm0, zmm1

Now we get:

        vpmovb2m        k0, zmm0
        knotq   k1, k0
        vmovdqu8        zmm0 {k1} {z}, zmm1

However, I thought it might be a good idea in some situations to use this technique:

        vpternlogd      zmm2, zmm2, zmm2, 255
        vpshufb zmm0, zmm2, zmm0
        vpandq  zmm0, zmm0, zmm1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions