Skip to content

AntChainOpenLabs/NitrooZK-stwo

 
 

Repository files navigation

Stwo

🌟 About

Stwo is a next generation implementation of a CSTARK prover and verifier, written in Rust 🦀.

Stwo is a work in progress.

It is not recommended to use it in a production setting yet.

🚀 Key Features

  • Circle STARKs: Based on the latest cryptographic research and innovations in the ZK field.
  • High performance: Stwo is designed to be extremely fast and efficient.
  • Flexible: Adaptable for various validity proof applications.

📊 Benchmarks

Run poseidon_benchmark.sh to run a single-threaded poseidon2 hash proof benchmark.

Further benchmarks can be run using cargo bench.

Visual representation of benchmarks can be found here.

GPU Performance

Reference Machine

  • 1 * NVIDIA GeForce RTX 4090
  • CPU: AMD EPYC 9224 with 16 cores
  • Memory: 94GB

Wide-Fibonacci Test

  • For SIMD Prove Test: RUSTFLAGS="-C target-cpu=native -C opt-level=3" MIN_LOG=16 MAX_LOG=24 RUST_LOG=info RAYON_NUM_THREADS=16 cargo test test_wide_fib_prove_with_blake_simd --release --features parallel -- --nocapture
  • For GPU Prove Test: MIN_LOG=16 MAX_LOG=23 RAYON_NUM_THREADS=16 cargo test --release test_wide_fib_prove_with_blake_cuda --features parallel -- --nocapture

Wide-Fibonacci Prove Time (ms)

Log(Size) SIMD 4090 GPU Speedup
16 92 19 4.84x
17 138 20 6.90x
18 237 23 10.30x
19 398 29 13.72x
20 756 37 20.43x
21 1429 56 25.52x
22 2923 91 32.12x
23 6132 164 37.41x
24 12142 OOM NULL

Poseidon Test

  • For SIMD Prove Test: RUSTFLAGS="-C target-cpu=native -C opt-level=3" MIN_LOG=16 MAX_LOG=23 RUST_LOG=info RAYON_NUM_THREADS=16 cargo test test_simd_poseidon_prove --release --features parallel -- --nocapture
  • For GPU Prove Test: MIN_LOG=16 MAX_LOG=22 RAYON_NUM_THREADS=16 cargo test --release test_poseidon_prove_with_blake_cuda --features parallel -- --nocapture

Poseidon Prove Time (ms)

Log(Size) SIMD 4090 GPU Speedup
16 279 183 1.52x
17 356 196 1.82x
18 567 217 2.61x
19 1233 248 4.97x
20 1789 302 5.92x
21 4086 394 10.37x
22 8100 561 14.44x
23 17480 OOM NULL
  • For SIMD Benchmark: LOG_N_INSTANCES=22 RUSTFLAGS="-C target-cpu=native -C opt-level=3" RAYON_NUM_THREADS=16 cargo bench --bench poseidon --features parallel -- --nocapture
  • For GPU Benchmark: LOG_N_INSTANCES=22 RAYON_NUM_THREADS=16 cargo bench --bench poseidon_cuda --features parallel -- --nocapture

Poseidon Throughput (Kelem/s)

Log(Size) SIMD 4090 GPU Speedup
16 174 1269 7.30x
17 290 2192 7.56x
18 391 3595 9.19x
19 453 4984 11.00x
20 537 6323 11.77x
21 364 7299 20.05x
22 342 7884 23.06x
  • For SIMD Benchmark: LOG_N_INSTANCES=23 RUSTFLAGS="-C target-cpu=native -C opt-level=3" RAYON_NUM_THREADS=16 cargo bench --bench wide_fibonacci --features parallel -- --nocapture
  • For GPU Benchmark: LOG_N_INSTANCES=23 RAYON_NUM_THREADS=16 cargo bench --bench wide_fibonacci_cuda --features parallel -- --nocapture

Wide-Fibonacci Throughput (Kelem/s)

Log(Size) SIMD 4090 GPU Speedup
16 466 2266 4.86x
17 597 4038 6.76x
18 841 6694 7.96x
19 976 10595 10.85x
20 1133 12558 11.08x
21 1148 15647 13.63x
22 927 16940 18.28x
23 818 18014 22.02x

🥳 Acknowledgements

We would like to acknowledge the following project.

  • stwo-gpu : The m31 field arithmetic and extended field operations, FRI operations and quotient accumulator are inspired by stwo-gpu.

📜 License

This project is licensed under the Apache 2.0 license.

See LICENSE for more information.

About

A GPU-accelerated Stwo prover by AntChain OpenLabs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 85.6%
  • Cuda 14.2%
  • Other 0.2%