Skip to content

[RISCV] Suboptimal code generation for LMBench rd #156646

@wangpc-pp

Description

@wangpc-pp

See: https://godbolt.org/z/Wh3sW57qa

There are two problems:

  1. Do we estimate the cost of gather/scatter too low? Because AArch64 won't vectorize this loop and the RISC-V GCC does the same.
<source>:35:6: remark: the cost-model indicates that vectorization is not beneficial [-Rpass-missed=loop-vectorize]
   35 |             while (p <= lastone) {
      |             ^
<source>:35:6: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-missed=loop-vectorize]
  1. Can we combine the loads before LoopVectorizer? As you can see, if we disable LV, then SLP will kick in and generate a better code seemingly.
.LBB0_5:
        vsetvli zero, a4, e32, m8, ta, ma
        vlse32.v        v8, (a2), a5
        vmv.s.x v16, a1
        vredsum.vs      v8, v8, v16
        addi    a2, a2, 512
        vmv.x.s a1, v8
        bgeu    a3, a2, .LBB0_5
        bnez    a0, .LBB0_4
        mv      a0, a1
        tail    use_int

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions