AVX-512 Mask registers being used when it's not needed

I've been running into some odd assembly generated by RPCS3's SPU LLVM backend.

In short: the AVX-512 code is slower than the AVX2 code due to compare into mask instructions being used, when the compare into vector instructions would be faster.

https://godbolt.org/z/dcjTKKaWj

In the FCGT3 function, both AVX2 and AVX-512 targets are able to use the compare into register instructions, as expected. In the FCGT2 function, where the only difference is fcmp ugt, inplace of fcmp ogt, LLVM is opting  to use the mask registers, which is inconvenient since we're emulating instructions which compare into the vector registers.

```
        vpminud xmm0, xmm0, xmmword ptr [rdi + rcx]
        vcmpnleps       xmm0, xmm0, xmmword ptr [rdi + rax]
```

```
        vpminud xmm0, xmm0, dword ptr [rip + .LCPI1_0]{1to4}
        vcmpnleps       k0, xmm0, xmmword ptr [rdi + rax]
        vpmovm2d        xmm0, k0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AVX-512 Mask registers being used when it's not needed #128237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AVX-512 Mask registers being used when it's not needed #128237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions