Skip to content

Micro-benchmark: Wasmer performs poorly when executing the memory.size instruction. #6147

@gaaraw

Description

@gaaraw

Describe the bug

Hello, in differential micro-benchmarking, I noticed that Wasmer performs poorly when executing the memory.size instruction.

The specific timing data is as follows:

runtime time
wasmer_llvm 9.74033
wasmer_cranelift 10.68514
wasmer_singlepass 12.2296
wasmedge_jit 8.7981
wasmtime 1.9795
wamr_llvm_jit 0.01739
wamr_fast_jit 1.32305

The data is in seconds, and each data is the result of ten executions and averages.

Environment

The runtime tools are all built on release and use JIT mode.

  • wasmer: 6.1.0
  • WAMR: iwasm 2.4.3
  • wasmedge: 0.16.1-18-gc457fe30
  • wasmtime: 41.0.0 (4898322a4 2025-12-18)
  • wabt: 1.0.27
  • llvm: 21.1.5
  • Host OS: Ubuntu 22.04.5 LTS x64
  • CPU: 12th Gen Intel® Core™ i7-12700 × 20
  • rustc: rustc 1.90.0 (1159e78c4 2025-09-14)
    binary: rustc
    commit-hash: 1159e78c4747b02ef996e55082b704c09b970588
    commit-date: 2025-09-14
    host: x86_64-unknown-linux-gnu
    release: 1.90.0
    LLVM version: 20.1.8

Steps to reproduce

The minimal reproducible code is as follows:

test_case.wat
(module
  (type (func (param i32)))
  (type (func))

  (import "wasi_snapshot_preview1" "proc_exit" (func (type 0)))

  (func (type 1)
    (local $i i64)
    (local.set $i (i64.const 4294967296))
    (block $exit
      (loop $body
        memory.size
        i32.const 2147483647
        i32.gt_s
        br_if $exit
        (local.set $i (i64.sub (local.get $i) (i64.const 1)))
        (br_if $body (i64.ne (local.get $i) (i64.const 0)))
      )
    )
    (call 0 (i32.const 0))
  )

  (memory $m0 12768)

  (export "_start" (func 1))
  (export "memory" (memory 0))

)
wat2wasm test_case.wat -o test_case.wasm

# Execute the wasm file and collect data
perf stat -r 10 -e 'task-clock' /path/to/wasmer run -l test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run -s test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmtime test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_fast_jit/iwasm test_case.wasm

Expected behavior

In the test cases, I placed the memory.size instruction inside a loop to amplify performance differences.

For comparison: the time data for executing an empty loop without the memory.size instruction is as follows:

time
wasmer_llvm 0.01226
wasmer_cranelift 0.88643
wasmer_ singlepass 1.95569
wasmedge_jit 0.017236
wasmtime 0.91258
wamr_llvm_jit 0.01792
wamr_fast_jit 0.88375

Actual behavior

When changing the number of memory pages, the execution time remains basically the same.

Additional context

I also submitted a related issue to wasmedge.

If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions