[x86] Unneccessary mov in loop over bits

When compiling the following code, clang generates the following assembly: [godbolt link](https://gcc.godbolt.org/z/onf3hEfq4)
```cpp
bool test(int num, bool* arr) {
    for (; num; num &= num - 1) {
        if (arr[__builtin_ctz(num)])
            return true;
    }
    return false;
}
```
```asm
test(int, bool*):
        test    edi, edi
        je      .LBB0_1
.LBB0_2:
        rep       bsf eax, edi
        movzx   eax, byte ptr [rsi + rax]
        test    al, al
        jne     .LBB0_4
        lea     ecx, [rdi - 1]
        and     ecx, edi
        mov     edi, ecx
        jne     .LBB0_2
.LBB0_4:
        ret
.LBB0_1:
        xor     eax, eax
        ret
```

There's two issues with this:
1. There's no dependency breaking xor on the bsf output.  This seems like it could be #33216, but when compiling with `-march=haswell`, clang *does* add the dependency breaking xor before the tzcnt, making it seem like that issue was fixed for this situation, but needs to be extended to bsf or applied to the generic processor profile, since it affects all pre-haswell processors (which decode `rep bsf` as `bsf`, not `tzcnt`).
2. At the end, clang does `and ecx, edi; mov edi, ecx` instead of just `and edi, ecx`.  This adds an extra cycle to the critical path (ignoring ①) for all processors without mov elimination (including ice lake processors, since they got it disabled due to an erratum).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[x86] Unneccessary mov in loop over bits #126615

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[x86] Unneccessary mov in loop over bits #126615

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions