wasm: manual `i16x8.narrow_i32x4_s` does not optimize well

https://godbolt.org/z/TTxnMMEaM

```llvm
define <8 x i16> @narrow_manual(<4 x i32> %a, <4 x i32> %b) unnamed_addr {
bb2:
  %0 = shufflevector <4 x i32> %a, <4 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %1 = tail call <8 x i32> @llvm.smax.v8i32(<8 x i32> %0, <8 x i32> splat (i32 -32768))
  %2 = tail call <8 x i32> @llvm.smin.v8i32(<8 x i32> %1, <8 x i32> splat (i32 32767))
  %3 = trunc nsw <8 x i32> %2 to <8 x i16>
  ret <8 x i16> %3
}

declare <8 x i32> @llvm.smin.v8i32(<8 x i32>, <8 x i32>) #2

declare <8 x i32> @llvm.smax.v8i32(<8 x i32>, <8 x i32>) #2
```

should optimize into a `i16x8.narrow_i32x4_s`. In the linked godbolt, we see that `x86_64` and `aarch64` are able to make this optimization. (s390x is not, see https://github.com/llvm/llvm-project/issues/153655).

Instead for wasm we get

```wasm
narrow_manual:
        local.get       0
        v128.const      -32768, -32768, -32768, -32768
        local.tee       2
        i32x4.max_s
        v128.const      32767, 32767, 32767, 32767
        local.tee       0
        i32x4.min_s
        v128.const      65535, 65535, 65535, 65535
        local.tee       3
        v128.and
        local.get       1
        local.get       2
        i32x4.max_s
        local.get       0
        i32x4.min_s
        local.get       3
        v128.and
        i16x8.narrow_i32x4_u
        end_function

narrow_builtin:
        local.get       0
        local.get       1
        i16x8.narrow_i32x4_s
        end_function
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wasm: manual `i16x8.narrow_i32x4_s` does not optimize well #153838

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

wasm: manual i16x8.narrow_i32x4_s does not optimize well #153838

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

wasm: manual `i16x8.narrow_i32x4_s` does not optimize well #153838