Skip to content

AVX-512 Mask registers being used when it's not needed #128237

@Whatcookie

Description

@Whatcookie

I've been running into some odd assembly generated by RPCS3's SPU LLVM backend.

In short: the AVX-512 code is slower than the AVX2 code due to compare into mask instructions being used, when the compare into vector instructions would be faster.

https://godbolt.org/z/dcjTKKaWj

In the FCGT3 function, both AVX2 and AVX-512 targets are able to use the compare into register instructions, as expected. In the FCGT2 function, where the only difference is fcmp ugt, inplace of fcmp ogt, LLVM is opting to use the mask registers, which is inconvenient since we're emulating instructions which compare into the vector registers.

        vpminud xmm0, xmm0, xmmword ptr [rdi + rcx]
        vcmpnleps       xmm0, xmm0, xmmword ptr [rdi + rax]
        vpminud xmm0, xmm0, dword ptr [rip + .LCPI1_0]{1to4}
        vcmpnleps       k0, xmm0, xmmword ptr [rdi + rax]
        vpmovm2d        xmm0, k0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions