-
Notifications
You must be signed in to change notification settings - Fork 935
Description
Describe the bug
Hello, in differential micro-benchmarking, I noticed that Wasmer performs poorly when executing the memory.size instruction.
The specific timing data is as follows:
| runtime | time |
|---|---|
| wasmer_llvm | 9.74033 |
| wasmer_cranelift | 10.68514 |
| wasmer_singlepass | 12.2296 |
| wasmedge_jit | 8.7981 |
| wasmtime | 1.9795 |
| wamr_llvm_jit | 0.01739 |
| wamr_fast_jit | 1.32305 |
The data is in seconds, and each data is the result of ten executions and averages.
Environment
The runtime tools are all built on release and use JIT mode.
- wasmer: 6.1.0
- WAMR: iwasm 2.4.3
- wasmedge: 0.16.1-18-gc457fe30
- wasmtime: 41.0.0 (4898322a4 2025-12-18)
- wabt: 1.0.27
- llvm: 21.1.5
- Host OS: Ubuntu 22.04.5 LTS x64
- CPU: 12th Gen Intel® Core™ i7-12700 × 20
- rustc: rustc 1.90.0 (1159e78c4 2025-09-14)
binary: rustc
commit-hash: 1159e78c4747b02ef996e55082b704c09b970588
commit-date: 2025-09-14
host: x86_64-unknown-linux-gnu
release: 1.90.0
LLVM version: 20.1.8
Steps to reproduce
The minimal reproducible code is as follows:
test_case.wat
(module
(type (func (param i32)))
(type (func))
(import "wasi_snapshot_preview1" "proc_exit" (func (type 0)))
(func (type 1)
(local $i i64)
(local.set $i (i64.const 4294967296))
(block $exit
(loop $body
memory.size
i32.const 2147483647
i32.gt_s
br_if $exit
(local.set $i (i64.sub (local.get $i) (i64.const 1)))
(br_if $body (i64.ne (local.get $i) (i64.const 0)))
)
)
(call 0 (i32.const 0))
)
(memory $m0 12768)
(export "_start" (func 1))
(export "memory" (memory 0))
)
wat2wasm test_case.wat -o test_case.wasm
# Execute the wasm file and collect data
perf stat -r 10 -e 'task-clock' /path/to/wasmer run -l test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run -s test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmtime test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_fast_jit/iwasm test_case.wasm
Expected behavior
In the test cases, I placed the memory.size instruction inside a loop to amplify performance differences.
For comparison: the time data for executing an empty loop without the memory.size instruction is as follows:
| time | |
|---|---|
| wasmer_llvm | 0.01226 |
| wasmer_cranelift | 0.88643 |
| wasmer_ singlepass | 1.95569 |
| wasmedge_jit | 0.017236 |
| wasmtime | 0.91258 |
| wamr_llvm_jit | 0.01792 |
| wamr_fast_jit | 0.88375 |
Actual behavior
When changing the number of memory pages, the execution time remains basically the same.
Additional context
I also submitted a related issue to wasmedge.
If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!