-
Notifications
You must be signed in to change notification settings - Fork 999
Description
I'm investigating a comparatively poor performance of a Wasm guest compiled using TinyGo when it is executed in Wasmtime embedded into a Go service (compiled using the official Go compiler) when there is any non-trivial (megabytes) data exchange between the host and the guest. When trying to make the smallest possible example that would still have these performance issues I found that a test that just calls the exported by default functions malloc and free from the host already exhibits an unexpectedly poor performance.
Here is the test:
var allocationSizes = []int{100, 1024, 1024 * 100, 1024 * 1024, 200 * 1024 * 1024}
func BenchmarkMemAllocation(b *testing.B) {
wasmPath := "./wasm/tinygo.wasm"
wasmtimeConfig := wasmtime.NewConfig()
defer wasmtimeConfig.Close()
engine := wasmtime.NewEngineWithConfig(wasmtimeConfig)
defer engine.Close()
wasiConfig := wasmtime.NewWasiConfig()
wasiConfig.InheritStdout()
wasiConfig.InheritStderr()
defer wasiConfig.Close()
store := wasmtime.NewStore(engine)
store.SetWasi(wasiConfig)
defer store.Close()
linker := wasmtime.NewLinker(engine)
defer linker.Close()
err := linker.DefineWasi()
if err != nil {
panic(err)
}
module, err := wasmtime.NewModuleFromFile(engine, wasmPath)
if err != nil {
panic(err)
}
instance, err := linker.Instantiate(store, module)
if err != nil {
panic(err)
}
malloc := instance.GetExport(store, "malloc").Func()
free := instance.GetExport(store, "free").Func()
for _, allocationSize := range allocationSizes {
label := "size_" + strconv.Itoa(allocationSize)
b.Run(label, func(b *testing.B) {
for range b.N {
ptr, err := malloc.Call(store, allocationSize)
if err != nil {
panic(err)
}
_, err = free.Call(store, ptr)
if err != nil {
panic(err)
}
}
})
}
}I've run this benchmark with these guests:
- Compiled with
tinygo build -o $(WASM)/tinygo.wasm -target=wasi - Same but compiled with increased initial memory
--initial-memory=209715200 - Compiled with a custom gc/allocator (https://github.com/wasilibs/nottinygc)
- An AssemblyScript guest (requires changing the names of the malloc/free functions in the test)
Here are the results on my machine:
go test -bench='BenchmarkMemAllocation' ./...
# TinyGo
BenchmarkMemAllocation/size_100 603967 2223 ns/op
BenchmarkMemAllocation/size_1024 383356 3362 ns/op
BenchmarkMemAllocation/size_102400 4455 257539 ns/op
BenchmarkMemAllocation/size_1048576 276 4267167 ns/op
BenchmarkMemAllocation/size_209715200 9 300325597 ns/op
# TinyGo with initial memory set to 200 MB
BenchmarkMemAllocation/size_100-12 630244 2169 ns/op
BenchmarkMemAllocation/size_1024-12 510651 3135 ns/op
BenchmarkMemAllocation/size_102400-12 10000 111715 ns/op
BenchmarkMemAllocation/size_1048576-12 4251 1365305 ns/op
BenchmarkMemAllocation/size_209715200-12 4 484141479 ns/op
# TinyGo with custom GC
BenchmarkMemAllocation/size_100-12 706581 1940 ns/op
BenchmarkMemAllocation/size_1024-12 693205 1904 ns/op
BenchmarkMemAllocation/size_102400-12 555777 2392 ns/op
BenchmarkMemAllocation/size_1048576-12 530181 2396 ns/op
BenchmarkMemAllocation/size_209715200-12 559406 2338 ns/op
# AssemblyScript
BenchmarkMemAllocation/size_100-12 702817 1967 ns/op
BenchmarkMemAllocation/size_1024-12 618452 2024 ns/op
BenchmarkMemAllocation/size_102400-12 611690 1986 ns/op
BenchmarkMemAllocation/size_1048576-12 620326 2046 ns/op
BenchmarkMemAllocation/size_209715200-12 636948 2044 ns/op
As you can see, the TinyGo-compiled guest with the built-in allocator seems to slow down significantly when the allocated memory size increases. It takes almost 5 milliseconds for allocating 1 MB and 0.3-0.4 seconds for 200 MB which seems slow by any standard. This is orders of magnitude slower that what I get with the custom allocator/GC or in a guest compiled with with AssemblyScript.
Is this a known issue or am I doing something wrong? Maybe there is some compilation flag that could improve the performance?
https://github.com/wasilibs/nottinygc is archived so I've included it for the benchmarks but it is not a viable option.