-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Closed
Description
https://godbolt.org/z/TTxnMMEaM
define <8 x i16> @narrow_manual(<4 x i32> %a, <4 x i32> %b) unnamed_addr {
bb2:
%0 = shufflevector <4 x i32> %a, <4 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%1 = tail call <8 x i32> @llvm.smax.v8i32(<8 x i32> %0, <8 x i32> splat (i32 -32768))
%2 = tail call <8 x i32> @llvm.smin.v8i32(<8 x i32> %1, <8 x i32> splat (i32 32767))
%3 = trunc nsw <8 x i32> %2 to <8 x i16>
ret <8 x i16> %3
}
declare <8 x i32> @llvm.smin.v8i32(<8 x i32>, <8 x i32>) #2
declare <8 x i32> @llvm.smax.v8i32(<8 x i32>, <8 x i32>) #2
should optimize into a i16x8.narrow_i32x4_s
. In the linked godbolt, we see that x86_64
and aarch64
are able to make this optimization. (s390x is not, see #153655).
Instead for wasm we get
narrow_manual:
local.get 0
v128.const -32768, -32768, -32768, -32768
local.tee 2
i32x4.max_s
v128.const 32767, 32767, 32767, 32767
local.tee 0
i32x4.min_s
v128.const 65535, 65535, 65535, 65535
local.tee 3
v128.and
local.get 1
local.get 2
i32x4.max_s
local.get 0
i32x4.min_s
local.get 3
v128.and
i16x8.narrow_i32x4_u
end_function
narrow_builtin:
local.get 0
local.get 1
i16x8.narrow_i32x4_s
end_function