Skip to content

[x86-64] Broadcasting an element of a vector should not use vpermb or vpshufb #113396

@Validark

Description

@Validark

I have code like so:

export fn foo(x: @Vector(8, u8)) @Vector(64, u8) {
    return @splat(x[1]);
}

Here is the LLVM version:

define dso_local <64 x i8> @foo(<8 x i8> %0) local_unnamed_addr {
Entry:
  %1 = shufflevector <8 x i8> %0, <8 x i8> poison, <64 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  ret <64 x i8> %1
}

Here is how it lowers on Zen 5:

.LCPI0_1:
        .byte   1
foo:
        vpbroadcastb    zmm1, byte ptr [rip + .LCPI0_1]
        vpermb  zmm0, zmm1, zmm0
        ret

Here is how I think it should lower:

foo:
        vpsrlq  xmm0, xmm0, 8
        vpbroadcastb    zmm0, xmm0
        ret

Same applies to broadcasting into an xmm0:

.LCPI0_0:
        .zero   16,1
foo:
        vpshufb xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        ret

I would much rather avoid the trip to memory:

foo:
        vpsrlq  xmm0, xmm0, 8
        vpbroadcastb    xmm0, xmm0
        ret

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions