Skip to content

[x86] Unneccessary mov in loop over bits #126615

@TellowKrinkle

Description

@TellowKrinkle

When compiling the following code, clang generates the following assembly: godbolt link

bool test(int num, bool* arr) {
    for (; num; num &= num - 1) {
        if (arr[__builtin_ctz(num)])
            return true;
    }
    return false;
}
test(int, bool*):
        test    edi, edi
        je      .LBB0_1
.LBB0_2:
        rep       bsf eax, edi
        movzx   eax, byte ptr [rsi + rax]
        test    al, al
        jne     .LBB0_4
        lea     ecx, [rdi - 1]
        and     ecx, edi
        mov     edi, ecx
        jne     .LBB0_2
.LBB0_4:
        ret
.LBB0_1:
        xor     eax, eax
        ret

There's two issues with this:

  1. There's no dependency breaking xor on the bsf output. This seems like it could be Clang is not aware of a false dependency of LZCNT, TZCNT, POPCNT on destination register on some Intel CPUs #33216, but when compiling with -march=haswell, clang does add the dependency breaking xor before the tzcnt, making it seem like that issue was fixed for this situation, but needs to be extended to bsf or applied to the generic processor profile, since it affects all pre-haswell processors (which decode rep bsf as bsf, not tzcnt).
  2. At the end, clang does and ecx, edi; mov edi, ecx instead of just and edi, ecx. This adds an extra cycle to the critical path (ignoring ①) for all processors without mov elimination (including ice lake processors, since they got it disabled due to an erratum).

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions