High-performance C++/CUDA GPU-accelerated STARK prover for Triton VM.
This project provides:
- High-performance C++/CUDA implementation of the Triton VM STARK prover
- GPU-accelerated proof generation with zero-copy memory management
- Functional verification against test data from the Rust reference implementation
- Hybrid CPU/GPU and full GPU execution modes for optimal performance
triton-vm-prover/
├── CMakeLists.txt # Build configuration
├── include/ # Header files
│ ├── types/ # Core data types
│ │ ├── b_field_element.hpp
│ │ ├── x_field_element.hpp
│ │ └── digest.hpp
│ ├── table/ # AIR tables
│ │ └── master_table.hpp
│ ├── fri/ # FRI protocol
│ │ └── fri.hpp
│ ├── proof_stream/ # Fiat-Shamir
│ │ └── proof_stream.hpp
│ ├── stark.hpp # Main STARK prover
│ └── test_data_loader.hpp
├── src/ # Implementation files
│ ├── types/
│ ├── table/
│ ├── fri/
│ ├── proof_stream/
│ └── stark.cpp
└── tests/ # Google Test unit tests
├── test_b_field_element.cpp
├── test_x_field_element.cpp
├── test_data_loader.cpp
└── test_stark.cpp
- BFieldElement - Goldilocks prime field (2^64 - 2^32 + 1)
- XFieldElement - Degree-3 extension field
- Digest - 5-element hash digest
- Trace execution and VM integration
- Main table creation, padding, and degree lowering
- Low-degree extension (LDE) with GPU acceleration
- Fiat-Shamir challenge sampling
- Auxiliary table creation and extension
- Quotient computation and evaluation
- Out-of-domain evaluation
- FRI protocol implementation
- Merkle tree construction and authentication paths
- Proof encoding and serialization
- CUDA kernels for field arithmetic, NTT, and hash functions
- GPU-accelerated LDE and Merkle tree construction
- Zero-copy memory management for minimal host-device transfers
- Hybrid CPU/GPU and full GPU execution modes
- Multi-GPU support for large proofs
The project includes a build script that automatically configures and builds the prover. To build and run:
./run_gpu_prover.sh spin_input21.tasm 19 --multi-gpu --cpu-aux --gpu-count=2To clean and rebuild from scratch:
./run_gpu_prover.sh spin_input21.tasm 19 --multi-gpu --cpu-aux --gpu-count=2 --cleanThe script will automatically:
- Configure CMake with CUDA support
- Build all necessary components (C++ library, Rust FFI libraries, GPU prover)
- Run the prover with the specified program and input
- Verify the proof against the Rust reference implementation
To run xnt-core with GPU-accelerated proof generation, simply run:
./run_xnt_core_with_gpu.shThis script automatically:
- Configures all GPU and performance optimization settings
- Sets up the direct GPU prover integration (no separate server needed)
- Launches xnt-core with optimal settings for GPU acceleration
The script uses sensible defaults for most settings, but you can customize them by setting environment variables before running the script. See run_xnt_core_with_gpu.sh for all available configuration options.
For large padded height inputs (>= 2^22), set the following environment variable for proof-upgrader:
export TRITON_GPU_LDE_FRUGAL=1Apache 2.0 (matching Triton VM)