perf: bmt SIMD hasher by acud · Pull Request #5381 · ethersphere/bee

acud · 2026-02-25T23:53:19Z

Description

This PR tries to improve the current BMT hasher to use SIMD when available.

Motivation and Context (Optional)

The current BMT hasher is highly inefficient - it uses massive goroutine spawning to compute a single chunk address. For a full chunk, this means 255 goroutines are spawned per chunk, creating GC stress and significant scheduler stress. For the ReserveSample function used in the redistribution game, this means excessive memory and CPU stress in order to calculate the reserve sample.

The idea here was to use an existing keccak implementation, compile it to assembler and try to call it directly from our go code without having to use cgo which comes with its own set of side-effects.

I used (I==me+Claude) the XKCP project (from the keccak authors) and built a build script that builds, extracts and wraps the compiled code correctly. Currently only linux amd64 is supported. Windows and mac should fall back to the go legacy sha3 hasher.

So far the results are promising:

x1.6 faster BMT hashing on my local machine (laptop)
x2.5 faster on AVX2 supported data-center CPUs (Hetzner)
x5 faster on newer AVX512 architectures

There's a few more things to iron out and test:

is the scalar hashing fallback (to the go crypto legacy keccak hashing) reasonably fast? (I am imagining this only for dev use anyway - mac devs etc) or do we need to fallback to the previous implementation of BMT that spawned all those goroutines?
as of such - do we need a BMT factory? (to spin up the right type of hasher) do we want in this case to maintain both implementations? (copies?)
since SIMD instructions can cause CPU throttling, we must test on different configurations and providers and see that the possible throttling doesn't nullify the performance improvements that SIMD gives us. in other words we have to compare workload benchmarks against the current implementation on trunk.
figure out whether the excessive stack frame allocation can somehow be avoided
see if using bmtpool would improve anything at all
fix the linker error: /usr/bin/ld: warning: /tmp/go-link-1425710637/000001.o: missing .note.GNU-stack section implies executable stack /usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker

Worth noting:

I've also tried to port the existing go crypto library legacy sha3 implementation (written in go) to use the new go 1.26 SIMD primitives. Still needs a bit of research.
windows not supported cuz shadow space (don't ask)

Test plan

run on selected production nodes for a month and see if all is well, no panics, better performance. Only linux amd64, various configurations.

Related Issue (Optional)

#5174

References:

https://github.com/XKCP/XKCP
https://github.com/acud/XKCP <- my fork with the necessary build setup to create the necessary .syso files.
https://pkg.go.dev/simd/archsimd

acud force-pushed the simd-hashing-no-concurrency branch from 304934f to ad70c60 Compare February 25, 2026 23:56

acud changed the title ~~perf: BMT SIMD hasher~~ perf: bmt SIMD hasher Feb 26, 2026

acud marked this pull request as draft February 26, 2026 17:41

acud force-pushed the simd-hashing-no-concurrency branch from 0267d23 to 74b4d29 Compare February 26, 2026 17:48

acud added 2 commits February 26, 2026 19:23

perf: bmt simd hasher

4ac4b47

chore: remove synctest

6c29ab6

acud force-pushed the simd-hashing-no-concurrency branch from af184e8 to 7872f4e Compare February 27, 2026 01:23

chore: better cpu os detection, parallel hashing on scalar

e10454f

acud force-pushed the simd-hashing-no-concurrency branch from 7872f4e to e10454f Compare February 27, 2026 03:59

chore: lint

11f6ce9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: bmt SIMD hasher#5381

perf: bmt SIMD hasher#5381
acud wants to merge 4 commits intomasterfrom
simd-hashing-no-concurrency

acud commented Feb 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

acud commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context (Optional)

Test plan

Related Issue (Optional)

References:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

acud commented Feb 25, 2026 •

edited

Loading