Skip to content

The f32.max instruction causes a performance degradation in WAMR Fast JIT mode #4637

@gaaraw

Description

@gaaraw

Subject of the issue

Hello, I noticed that the WAMR Fast JIT mode has poor performance when executing the instruction f32.max.

The specific time data is as follows:

time
wasmer_llvm 2.465432
wasmedge_jit 0.023468
wamr_llvm_jit 0.014776
wasmer_cranelift 1.243364
wasmtime 1.21076
wamr_fast_jit 9.855768

The data is in seconds, and each data is the result of ten executions and averages.

Test case

test_case.wat
(module
  (type (;0;) (func (param i32)))
  (type (;1;) (func))
  (import "wasi_snapshot_preview1" "proc_exit" (func (;0;) (type 0)))
  (func (;1;) (type 1)
    (local i32)
    (local.set 0
      (i32.const 0))
    (loop

      (drop
        (f32.max
          (f32.const 0x1.ef5ecep+125)
          (f32.const 0x1.ef5ecep+125)))
          
      (local.set 0
        (i32.add
          (local.get 0)
          (i32.const 1)))
      (br_if 0
        (i32.ne
          (local.get 0)
          (i32.const 0))))
    (call 0
      (i32.const 0))
    (unreachable))
  (export "_start" (func 1))
  (memory (;0;) 1)
  (export "memory" (memory 0)))

Your environment

The runtime tools are all built on release and use JIT mode.

  • WAMR: iwasm 2.4.0
  • wasmer: 6.0.1
  • wasmtime: 36.0.0 (ada802c68 2025-08-20)
  • wasmedge: 0.15.0
  • wabt: 1.0.27
  • llvm: 18.1.8
  • Host OS: Ubuntu 22.04.5 LTS x64
  • CPU: 12th Gen Intel® Core™ i7-12700 × 20

Steps to reproduce

wat2wasm test_case.wat -o test_case.wasm

# Execute the wasm file and collect data
perf stat -r 10 -e 'task-clock' /path/to/wasmer run test_case.wasm --llvm
perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmtime test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_fast_jit/iwasm test_case.wasm

Actual behavior

In the test case, I placed the f32.max instruction inside a loop to amplify the performance difference.
Perhaps the Fast JIT mode has differences in the implementation or optimization of the f32.max instruction compared to other runtime tools, leading to poor performance.

Extra Info

If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions