WebAssembly: Suboptimal "promotion" of wasm_f32x4_convert_i32x4 into f32x4.convert_i32x4_u

When wasm_f32x4_convert_i32x4 intrinsic gets its input from an instruction that clears top bits, the conversion gets compiled into i32x4_u instead of i32x4_s variant; for example:

```c++
v128_t plsno(v128_t x)
{
    // u32x4 here changes the convert instruction; it's a problem because u32->f32 is way slower on pre-AVX512 HW
    x = wasm_u32x4_shr(x, 1);
    return wasm_f32x4_convert_i32x4(x);
}
```

With `-msimd128 -O2` compiles into

```
        local.get       0
        i32.const       1
        i32x4.shr_u
        f32x4.convert_i32x4_u
        end_function
```

This is a problem because on x64 hardware, `convert_i32x4_u` gets lowered into a long multi instruction sequence unless the browser implements AVX512 code path and the hardware supports it. Thus this needlessly slows down efficient SIMD kernels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WebAssembly: Suboptimal "promotion" of wasm_f32x4_convert_i32x4 into f32x4.convert_i32x4_u #149457

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WebAssembly: Suboptimal "promotion" of wasm_f32x4_convert_i32x4 into f32x4.convert_i32x4_u #149457

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions