Skip to content

Don't hoist NOT's out of loop that will be folded into ANDN/ORN anyway (except sometimes on Arm) #108840

@Validark

Description

@Validark

Godbolt link

This code:

export fn foo(a: u64, b: u64) u64 {
    var s = a | b;
    var ret: @TypeOf(s) = 0;

    while (true) {
        const iter = s;
        ret |= iter;
        s &= ~((iter +% (iter << 1)) | ((iter << 2) & ~a));
        if (s == 0) break;
    }

    return ret;
}

Results in this emit for Zen 4:

foo:
        or      rsi, rdi
        not     rdi
        xor     eax, eax
.LBB0_1:
        lea     rdx, [4*rsi]
        lea     rcx, [rsi + 2*rsi]
        or      rax, rsi
        and     rdx, rdi; we could have just used `andn`
        or      rdx, rcx
        andn    rsi, rdx, rsi
        jne     .LBB0_1
        ret

As you can see, we hoist not rdi out of the loop, even though we could have used andn. The same situation happens to the Sifive x280 (aggressive unrolling disabled via size-optimized build option):

foo:
        mv      a2, a0
        li      a0, 0
        or      a1, a1, a2
        not     a2, a2
.LBB0_1:
        slli    a3, a1, 2
        sh1add  a4, a1, a1
        and     a3, a3, a2; could have used `andn`
        or      a0, a0, a1
        or      a3, a3, a4
        andn    a1, a1, a3
        bnez    a1, .LBB0_1
        ret

However, on the Apple M3, it actually does make sense to hoist mvn out of the loop in this case, because we can do and x11, x8, x9, lsl #2 but we can't do bic x11, x8, x9, lsl #2 (I assume).

Apple M3 emit:

foo:
        mov     x8, x0
        mov     x0, #0
        orr     x9, x1, x8
        mvn     x8, x8
.LBB0_1:
        orr     x0, x9, x0
        add     x10, x9, x9, lsl #1
        and     x11, x8, x9, lsl #2
        orr     x10, x11, x10
        bics    x9, x9, x10
        b.ne    .LBB0_1
        ret

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions