Skip to content

[x86] Using select instead of pblendvb leads to very poor codegen on AVX2 #162812

@Sp00ph

Description

@Sp00ph

I tried this code:

define <32 x i8> @cond_double_blendv(<32 x i8> %a, <32 x i8> %mask) {
    %aa = add <32 x i8> %a, %a
    %ret = call <32 x i8> @llvm.x86.avx2.pblendvb(<32 x i8> %a, <32 x i8> %aa, <32 x i8> %mask)
    ret <32 x i8> %ret
}

define <32 x i8> @cond_double_select(<32 x i8> %a, <32 x i8> %mask) {
    %aa = add <32 x i8> %a, %a
    %bitmask = icmp slt <32 x i8> %mask, splat (i8 0)
    %ret = select <32 x i1> %bitmask, <32 x i8> %aa, <32 x i8> %a
    ret <32 x i8> %ret
}

Both functions have the same behavior, doubling lanes of %a if the MSB of the corresponding lane in %b is set. However, they generate wildly different assembly (using clang 21.1.0 with -O3 -march=x86-64-v3):

cond_double_blendv:
        vpaddb  ymm2, ymm0, ymm0
        vpblendvb       ymm0, ymm0, ymm2, ymm1
        ret

.LCPI1_0:
        .zero   32,252
.LCPI1_1:
        .zero   32,32
cond_double_select:
        vpsllw  ymm2, ymm0, 2
        vpand   ymm2, ymm2, ymmword ptr [rip + .LCPI1_0]
        vpsrlw  ymm1, ymm1, 2
        vpand   ymm1, ymm1, ymmword ptr [rip + .LCPI1_1]
        vpaddb  ymm1, ymm1, ymm1
        vpblendvb       ymm0, ymm0, ymm2, ymm1
        vpaddb  ymm2, ymm0, ymm0
        vpaddb  ymm1, ymm1, ymm1
        vpblendvb       ymm0, ymm0, ymm2, ymm1
        ret

The version using the @llvm.x86.avx2.pblendvb intrinsic emits the expected assembly. After staring at the version using select for a while, I can say that all instructions except vpaddb ymm2, ymm0, ymm0 and the last vpblendvb form an elaborate no-op. I have no idea however what the code generator's intent was with these instructions. I don't see any reason why these functions should not just emit the exact same assembly.

Note: I originally encountered this while using AVX2 intrinsics in Rust, where the output from rustc was much worse than the output from clang for an equivalent function, with the difference being that rustc lowers _mm256_blendv_epi8 to icmp slt + select, whereas clang lowers it to call @llvm.x86.avx2.pblendvb.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions