Stwo is a next generation implementation of a CSTARK prover and verifier, written in Rust 🦀.
Stwo is a work in progress.
It is not recommended to use it in a production setting yet.
- Circle STARKs: Based on the latest cryptographic research and innovations in the ZK field.
- High performance: Stwo is designed to be extremely fast and efficient.
- Flexible: Adaptable for various validity proof applications.
Run poseidon_benchmark.sh
to run a single-threaded poseidon2 hash proof benchmark.
Further benchmarks can be run using cargo bench
.
Visual representation of benchmarks can be found here.
- 1 *
NVIDIA GeForce RTX 4090
- CPU:
AMD EPYC 9224 with 16 cores
- Memory:
94GB
- For SIMD Prove Test:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" MIN_LOG=16 MAX_LOG=24 RUST_LOG=info RAYON_NUM_THREADS=16 cargo test test_wide_fib_prove_with_blake_simd --release --features parallel -- --nocapture
- For GPU Prove Test:
MIN_LOG=16 MAX_LOG=23 RAYON_NUM_THREADS=16 cargo test --release test_wide_fib_prove_with_blake_cuda --features parallel -- --nocapture
Log(Size) | SIMD | 4090 GPU | Speedup |
---|---|---|---|
16 | 92 | 19 | 4.84x |
17 | 138 | 20 | 6.90x |
18 | 237 | 23 | 10.30x |
19 | 398 | 29 | 13.72x |
20 | 756 | 37 | 20.43x |
21 | 1429 | 56 | 25.52x |
22 | 2923 | 91 | 32.12x |
23 | 6132 | 164 | 37.41x |
24 | 12142 | OOM | NULL |
- For SIMD Prove Test:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" MIN_LOG=16 MAX_LOG=23 RUST_LOG=info RAYON_NUM_THREADS=16 cargo test test_simd_poseidon_prove --release --features parallel -- --nocapture
- For GPU Prove Test:
MIN_LOG=16 MAX_LOG=22 RAYON_NUM_THREADS=16 cargo test --release test_poseidon_prove_with_blake_cuda --features parallel -- --nocapture
Log(Size) | SIMD | 4090 GPU | Speedup |
---|---|---|---|
16 | 279 | 183 | 1.52x |
17 | 356 | 196 | 1.82x |
18 | 567 | 217 | 2.61x |
19 | 1233 | 248 | 4.97x |
20 | 1789 | 302 | 5.92x |
21 | 4086 | 394 | 10.37x |
22 | 8100 | 561 | 14.44x |
23 | 17480 | OOM | NULL |
- For SIMD Benchmark:
LOG_N_INSTANCES=22 RUSTFLAGS="-C target-cpu=native -C opt-level=3" RAYON_NUM_THREADS=16 cargo bench --bench poseidon --features parallel -- --nocapture
- For GPU Benchmark:
LOG_N_INSTANCES=22 RAYON_NUM_THREADS=16 cargo bench --bench poseidon_cuda --features parallel -- --nocapture
Log(Size) | SIMD | 4090 GPU | Speedup |
---|---|---|---|
16 | 174 | 1269 | 7.30x |
17 | 290 | 2192 | 7.56x |
18 | 391 | 3595 | 9.19x |
19 | 453 | 4984 | 11.00x |
20 | 537 | 6323 | 11.77x |
21 | 364 | 7299 | 20.05x |
22 | 342 | 7884 | 23.06x |
- For SIMD Benchmark:
LOG_N_INSTANCES=23 RUSTFLAGS="-C target-cpu=native -C opt-level=3" RAYON_NUM_THREADS=16 cargo bench --bench wide_fibonacci --features parallel -- --nocapture
- For GPU Benchmark:
LOG_N_INSTANCES=23 RAYON_NUM_THREADS=16 cargo bench --bench wide_fibonacci_cuda --features parallel -- --nocapture
Log(Size) | SIMD | 4090 GPU | Speedup |
---|---|---|---|
16 | 466 | 2266 | 4.86x |
17 | 597 | 4038 | 6.76x |
18 | 841 | 6694 | 7.96x |
19 | 976 | 10595 | 10.85x |
20 | 1133 | 12558 | 11.08x |
21 | 1148 | 15647 | 13.63x |
22 | 927 | 16940 | 18.28x |
23 | 818 | 18014 | 22.02x |
We would like to acknowledge the following project.
- stwo-gpu : The m31 field arithmetic and extended field operations, FRI operations and quotient accumulator are inspired by stwo-gpu.
This project is licensed under the Apache 2.0 license.
See LICENSE for more information.