-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Open
Description
When compiling the following code, clang generates the following assembly: godbolt link
bool test(int num, bool* arr) {
for (; num; num &= num - 1) {
if (arr[__builtin_ctz(num)])
return true;
}
return false;
}test(int, bool*):
test edi, edi
je .LBB0_1
.LBB0_2:
rep bsf eax, edi
movzx eax, byte ptr [rsi + rax]
test al, al
jne .LBB0_4
lea ecx, [rdi - 1]
and ecx, edi
mov edi, ecx
jne .LBB0_2
.LBB0_4:
ret
.LBB0_1:
xor eax, eax
retThere's two issues with this:
- There's no dependency breaking xor on the bsf output. This seems like it could be Clang is not aware of a false dependency of LZCNT, TZCNT, POPCNT on destination register on some Intel CPUs #33216, but when compiling with
-march=haswell, clang does add the dependency breaking xor before the tzcnt, making it seem like that issue was fixed for this situation, but needs to be extended to bsf or applied to the generic processor profile, since it affects all pre-haswell processors (which decoderep bsfasbsf, nottzcnt). - At the end, clang does
and ecx, edi; mov edi, ecxinstead of justand edi, ecx. This adds an extra cycle to the critical path (ignoring ①) for all processors without mov elimination (including ice lake processors, since they got it disabled due to an erratum).