When integrating a cache in an attempt to improve performance by reducing allocations, I found it to severely reduce performance, with map operations adding milliseconds of overhead. I tried making a simple benchmark to simulate similar behavior (notably, a large-ish composite key). It's hard to isolate processing vs FFI but I added a 100-iteration loop within wasm to try to help. I had originally hoped to make it larger, but it was too slow to complete in meaningful time.
https://github.com/corazawaf/coraza-proxy-wasm/compare/main...anuraaga:coraza-1225?expand=1#diff-173fbfd8d8844658344b121461b4290d0a85230caae9825240705df8130e8b75
goos: darwin
goarch: arm64
pkg: github.com/corazawaf/coraza-proxy-wasm/mapbench
BenchmarkMapBench
BenchmarkMapBench-10 28 40528976 ns/op
BenchmarkMapBenchWasm
BenchmarkMapBenchWasm-10 1 26306643833 ns/op
PASS
There are of course many variables here, such as performance of the wasm runtime (wazero), though I noticed slow map performance in a real world usage with Envoy (v8) too. Even assuming wasm is slower than native code, this seems like many more orders of magnitude than I would expect in general.
Is this known performance behavior? I saw in #1553 that growing logic was added so hashmaps shouldn't end up with quadratic performance without an appropriate size hint, but in this benchmark the number of keys is about 2400 vs a size hint of 10000 so growing or load factor shouldn't be much of an issue either way.
I tried setting runtime_memhash_tsip but no real change.
For reference, the real-world code the benchmark attemps to reflect
corazawaf/coraza#537
When integrating a cache in an attempt to improve performance by reducing allocations, I found it to severely reduce performance, with map operations adding milliseconds of overhead. I tried making a simple benchmark to simulate similar behavior (notably, a large-ish composite key). It's hard to isolate processing vs FFI but I added a 100-iteration loop within wasm to try to help. I had originally hoped to make it larger, but it was too slow to complete in meaningful time.
https://github.com/corazawaf/coraza-proxy-wasm/compare/main...anuraaga:coraza-1225?expand=1#diff-173fbfd8d8844658344b121461b4290d0a85230caae9825240705df8130e8b75
There are of course many variables here, such as performance of the wasm runtime (wazero), though I noticed slow map performance in a real world usage with Envoy (v8) too. Even assuming wasm is slower than native code, this seems like many more orders of magnitude than I would expect in general.
Is this known performance behavior? I saw in #1553 that growing logic was added so hashmaps shouldn't end up with quadratic performance without an appropriate size hint, but in this benchmark the number of keys is about 2400 vs a size hint of 10000 so growing or load factor shouldn't be much of an issue either way.
I tried setting
runtime_memhash_tsipbut no real change.For reference, the real-world code the benchmark attemps to reflect
corazawaf/coraza#537