-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Open
Description
This code:
export fn foo(a: u64, b: u64) u64 {
var s = a | b;
var ret: @TypeOf(s) = 0;
while (true) {
const iter = s;
ret |= iter;
s &= ~((iter +% (iter << 1)) | ((iter << 2) & ~a));
if (s == 0) break;
}
return ret;
}Results in this emit for Zen 4:
foo:
or rsi, rdi
not rdi
xor eax, eax
.LBB0_1:
lea rdx, [4*rsi]
lea rcx, [rsi + 2*rsi]
or rax, rsi
and rdx, rdi; we could have just used `andn`
or rdx, rcx
andn rsi, rdx, rsi
jne .LBB0_1
retAs you can see, we hoist not rdi out of the loop, even though we could have used andn. The same situation happens to the Sifive x280 (aggressive unrolling disabled via size-optimized build option):
foo:
mv a2, a0
li a0, 0
or a1, a1, a2
not a2, a2
.LBB0_1:
slli a3, a1, 2
sh1add a4, a1, a1
and a3, a3, a2; could have used `andn`
or a0, a0, a1
or a3, a3, a4
andn a1, a1, a3
bnez a1, .LBB0_1
retHowever, on the Apple M3, it actually does make sense to hoist mvn out of the loop in this case, because we can do and x11, x8, x9, lsl #2 but we can't do bic x11, x8, x9, lsl #2 (I assume).
Apple M3 emit:
foo:
mov x8, x0
mov x0, #0
orr x9, x1, x8
mvn x8, x8
.LBB0_1:
orr x0, x9, x0
add x10, x9, x9, lsl #1
and x11, x8, x9, lsl #2
orr x10, x11, x10
bics x9, x9, x10
b.ne .LBB0_1
ret