Skip to content

Slow heap allocations in a Wasm runtime #4592

@epsylonix

Description

@epsylonix

I'm investigating a comparatively poor performance of a Wasm guest compiled using TinyGo when it is executed in Wasmtime embedded into a Go service (compiled using the official Go compiler) when there is any non-trivial (megabytes) data exchange between the host and the guest. When trying to make the smallest possible example that would still have these performance issues I found that a test that just calls the exported by default functions malloc and free from the host already exhibits an unexpectedly poor performance.

Here is the test:

var allocationSizes = []int{100, 1024, 1024 * 100, 1024 * 1024, 200 * 1024 * 1024}

func BenchmarkMemAllocation(b *testing.B) {
	wasmPath := "./wasm/tinygo.wasm"

	wasmtimeConfig := wasmtime.NewConfig()
	defer wasmtimeConfig.Close()

	engine := wasmtime.NewEngineWithConfig(wasmtimeConfig)
	defer engine.Close()

	wasiConfig := wasmtime.NewWasiConfig()
	wasiConfig.InheritStdout()
	wasiConfig.InheritStderr()
	defer wasiConfig.Close()

	store := wasmtime.NewStore(engine)
	store.SetWasi(wasiConfig)
	defer store.Close()

	linker := wasmtime.NewLinker(engine)
	defer linker.Close()

	err := linker.DefineWasi()
	if err != nil {
		panic(err)
	}

	module, err := wasmtime.NewModuleFromFile(engine, wasmPath)
	if err != nil {
		panic(err)
	}

	instance, err := linker.Instantiate(store, module)
	if err != nil {
		panic(err)
	}

	malloc := instance.GetExport(store, "malloc").Func()
	free := instance.GetExport(store, "free").Func()

	for _, allocationSize := range allocationSizes {
		label := "size_" + strconv.Itoa(allocationSize)

		b.Run(label, func(b *testing.B) {
			for range b.N {
				ptr, err := malloc.Call(store, allocationSize)
				if err != nil {
					panic(err)
				}
				_, err = free.Call(store, ptr)
                                 if err != nil {
					panic(err)
				}
			}
		})
	}
}

I've run this benchmark with these guests:

  1. Compiled with tinygo build -o $(WASM)/tinygo.wasm -target=wasi
  2. Same but compiled with increased initial memory --initial-memory=209715200
  3. Compiled with a custom gc/allocator (https://github.com/wasilibs/nottinygc)
  4. An AssemblyScript guest (requires changing the names of the malloc/free functions in the test)

Here are the results on my machine:

go test -bench='BenchmarkMemAllocation' ./...

# TinyGo
BenchmarkMemAllocation/size_100                603967              2223 ns/op
BenchmarkMemAllocation/size_1024               383356              3362 ns/op
BenchmarkMemAllocation/size_102400               4455            257539 ns/op
BenchmarkMemAllocation/size_1048576               276           4267167 ns/op
BenchmarkMemAllocation/size_209715200               9         300325597 ns/op

# TinyGo with initial memory set to 200 MB
BenchmarkMemAllocation/size_100-12             630244              2169 ns/op
BenchmarkMemAllocation/size_1024-12            510651              3135 ns/op
BenchmarkMemAllocation/size_102400-12           10000            111715 ns/op
BenchmarkMemAllocation/size_1048576-12           4251           1365305 ns/op
BenchmarkMemAllocation/size_209715200-12            4         484141479 ns/op

# TinyGo with custom GC
BenchmarkMemAllocation/size_100-12             706581              1940 ns/op
BenchmarkMemAllocation/size_1024-12            693205              1904 ns/op
BenchmarkMemAllocation/size_102400-12          555777              2392 ns/op
BenchmarkMemAllocation/size_1048576-12         530181              2396 ns/op
BenchmarkMemAllocation/size_209715200-12       559406              2338 ns/op

# AssemblyScript
BenchmarkMemAllocation/size_100-12             702817              1967 ns/op
BenchmarkMemAllocation/size_1024-12            618452              2024 ns/op
BenchmarkMemAllocation/size_102400-12          611690              1986 ns/op
BenchmarkMemAllocation/size_1048576-12         620326              2046 ns/op
BenchmarkMemAllocation/size_209715200-12       636948              2044 ns/op

As you can see, the TinyGo-compiled guest with the built-in allocator seems to slow down significantly when the allocated memory size increases. It takes almost 5 milliseconds for allocating 1 MB and 0.3-0.4 seconds for 200 MB which seems slow by any standard. This is orders of magnitude slower that what I get with the custom allocator/GC or in a guest compiled with with AssemblyScript.

Is this a known issue or am I doing something wrong? Maybe there is some compilation flag that could improve the performance?

https://github.com/wasilibs/nottinygc is archived so I've included it for the benchmarks but it is not a viable option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    wasmWebAssembly

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions