Skip to content

Missed optimization on in-order RISC-V #135860

@chadaustin

Description

@chadaustin

I've observed situations where clang and rustc will generate suboptimal code for in-order RISC-V cores that support multiple parallel loads.

In general, as many clobbered temporary registers as possible should be used to hoist loads before any instructions that use them.

C:

void inc2(int* __restrict a, int* __restrict b, int* __restrict c, int* __restrict d)
{
    (*a)++;
    (*b)++;
    (*c)++;
    (*d)++;
}

rv64gc gcc:

inc2(int*, int*, int*, int*):
        lw      a7,0(a0)
        lw      a6,0(a1)
        lw      a4,0(a2)
        lw      a5,0(a3)
        addiw   a7,a7,1
        addiw   a6,a6,1
        addiw   a4,a4,1
        addiw   a5,a5,1
        sw      a7,0(a0)
        sw      a6,0(a1)
        sw      a4,0(a2)
        sw      a5,0(a3)
        ret

rv64gc clang:

inc2(int*, int*, int*, int*):
        lw      a4, 0(a0)
        lw      a5, 0(a1)
        addi    a4, a4, 1
        sw      a4, 0(a0)
        lw      a0, 0(a2)
        lw      a4, 0(a3)
        addi    a5, a5, 1
        sw      a5, 0(a1)
        addi    a0, a0, 1
        addi    a4, a4, 1
        sw      a0, 0(a2)
        sw      a4, 0(a3)
        ret

You can see the same issue in rustc's output: https://gcc.godbolt.org/z/b8na96EhW

I apologize if duplicate - could not find another issue about this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions