[RISCV] Suboptimal code generation for LMBench `rd`

See: https://godbolt.org/z/Wh3sW57qa

There are two problems:

1. Do we estimate the cost of gather/scatter too low? Because AArch64 won't vectorize this loop and the RISC-V GCC does the same.

```shell
<source>:35:6: remark: the cost-model indicates that vectorization is not beneficial [-Rpass-missed=loop-vectorize]
   35 |             while (p <= lastone) {
      |             ^
<source>:35:6: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-missed=loop-vectorize]
```

2. Can we combine the loads before LoopVectorizer? As you can see, if we disable LV, then SLP will kick in and generate a better code seemingly.

```asm
.LBB0_5:
        vsetvli zero, a4, e32, m8, ta, ma
        vlse32.v        v8, (a2), a5
        vmv.s.x v16, a1
        vredsum.vs      v8, v8, v16
        addi    a2, a2, 512
        vmv.x.s a1, v8
        bgeu    a3, a2, .LBB0_5
        bnez    a0, .LBB0_4
        mv      a0, a1
        tail    use_int
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Suboptimal code generation for LMBench `rd` #156646

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RISCV] Suboptimal code generation for LMBench rd #156646

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[RISCV] Suboptimal code generation for LMBench `rd` #156646