chore(cuda): use new backend with bn254 engine#2522
chore(cuda): use new backend with bn254 engine#2522jonathanpwang merged 8 commits intodevelop-v2.0.0-betafrom
Conversation
PR Summary:
|
| Crate | Nature of change |
|---|---|
continuations-v2 |
New RootGpuProver type alias; RootTraceGen impl retargeted to Bn254 GPU backend |
recursion |
commit_child_vk_gpu generalized; VerifierTraceGen impl made generic over GpuHashScheme |
sdk-v2 |
Switch CUDA root prover from CPU engine to BabyBearBn254Poseidon2GpuEngine |
Root Cargo.toml |
Advance stark-backend pin from chore/bn254 → develop-v2 (post-merge) |
A minor fix is also included: a method-call disambiguation in the VM CUDA program chip.
Cargo.toml — stark-backend branch update
All six stark-backend workspace dependencies (openvm-stark-backend, openvm-codec-derive,
openvm-stark-sdk, openvm-cuda-backend, openvm-cuda-builder, openvm-cuda-common) are
updated to branch develop-v2, which now contains GenericGpuBackend, GpuHashScheme,
BabyBearBn254Poseidon2HashScheme, and BabyBearBn254Poseidon2GpuEngine after the upstream
stark-backend PR was merged.
crates/sdk-v2 — wire GPU root prover to Bn254 engine
File: crates/sdk-v2/src/prover/root.rs
Under #[cfg(feature = "cuda")], two type aliases are swapped:
RootInnerProver:RootCpuProver→RootGpuProverE(the engine for the root circuit):BabyBearBn254Poseidon2CpuEngine→BabyBearBn254Poseidon2GpuEngine
ChildE (the child circuit engine) stays BabyBearPoseidon2GpuEngine — unchanged.
File: crates/sdk-v2/Cargo.toml
openvm-cuda-backend now explicitly enables features = ["baby-bear-bn254-poseidon2"] (was
declared without features).
crates/continuations-v2 — new RootGpuProver type alias and trace gen impl
src/prover/mod.rs
The previously commented-out RootGpuProver type alias is now active:
pub type RootGpuProver = RootProver<
<BabyBearBn254Poseidon2GpuEngine as openvm_stark_backend::StarkEngine>::PB,
VerifierSubCircuit<1>,
RootTraceGenImpl,
>;This resolves to RootProver<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>, ...>.
src/circuit/root/trace.rs
The existing RootTraceGen<GpuBackend> impl is retargeted to
RootTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>>. Both methods
(generate_pre_verifier_subcircuit_ctx and generate_other_proving_ctxs) delegate to the
CPU impl and then upload each AirProvingContext<CpuBackend> directly to the Bn254 GPU backend
via transport_air_proving_ctx_to_device::<BabyBearBn254Poseidon2HashScheme> — no intermediate
coercion through GpuBackend is needed because transport_air_proving_ctx_to_device is already
generic over HS: GpuHashScheme in the upstream.
File: crates/continuations-v2/Cargo.toml
openvm-cuda-backend dependency gets features = ["baby-bear-bn254-poseidon2"].
crates/recursion — generalized GPU backend in BatchConstraintModule and VerifierTraceGen
src/batch_constraint/mod.rs — generalize commit_child_vk_gpu
Previously, commit_child_vk_gpu was bound to E: StarkEngine<SC = BabyBearPoseidon2Config, PB = GpuBackend>
and returned CommittedTraceData<GpuBackend>. The signature is generalized to accept any engine
whose prover backend has Val = F and Matrix = DeviceMatrix<F>:
pub fn commit_child_vk_gpu<E>(
&self,
engine: &E,
child_vk: &MultiStarkVerifyingKey<BabyBearPoseidon2Config>,
) -> CommittedTraceData<E::PB>
where
E: StarkEngine,
E::PB: ProverBackend<Val = F, Matrix = DeviceMatrix<F>>,This allows it to be called with either BabyBearPoseidon2GpuEngine or
BabyBearBn254Poseidon2GpuEngine without any specialization.
src/system/mod.rs — blanket VerifierTraceGen impl over GpuHashScheme
The existing concrete impl VerifierTraceGen<GpuBackend, SC> is replaced by a blanket impl
that covers all GPU hash schemes at once:
impl<HS: GpuHashScheme, const MAX_NUM_PROOFS: usize>
VerifierTraceGen<GenericGpuBackend<HS>, HS::SC> for VerifierSubCircuit<MAX_NUM_PROOFS>The inner proving machinery (generate_gpu_ctxs and friends) remains bound to GpuBackend
(the existing BabyBear Poseidon2 GPU backend). To bridge backends, a local helper is introduced:
fn coerce_gpu_ctx<HS: GpuHashScheme>(
ctx: AirProvingContext<GpuBackend>,
) -> AirProvingContext<GenericGpuBackend<HS>> { ... }Inside generate_proving_ctxs, a VerifierExternalData<GpuBackend> reborrow is constructed
from the caller-supplied VerifierExternalData<GenericGpuBackend<HS>>, all module GPU contexts
are generated as AirProvingContext<GpuBackend>, and then coerced in bulk via coerce_gpu_ctx::<HS>.
The coercion is safe because GpuBackend and GenericGpuBackend<HS> share Val = BabyBear and
Matrix = DeviceMatrix<F>; the helper panics in debug builds if cached_mains is non-empty
(commitments differ between backends).
File: crates/recursion/Cargo.toml
The cuda feature now additionally enables openvm-cuda-backend/baby-bear-bn254-poseidon2 and
openvm-stark-sdk/baby-bear-bn254-poseidon2.
Minor fix
crates/vm/src/system/cuda/program.rs
device.commit(...) is replaced by TraceCommitter::<GpuBackend>::commit(device, ...) to
disambiguate the method call now that multiple backends' commit methods may be in scope.
stephenh-axiom-xyz
left a comment
There was a problem hiding this comment.
Instead of using coerce_gpu_to_bn254_ctx, I think we can implement RootTraceGen and VerifierTraceGen for a trait generic PB with Val = F and Matrix = DeviceMatrix<F>. Consider making GenericGpuBackend generic in StarkProtocolConfig
@stephenh-axiom-xyz But a generic trait in |
| /// | ||
| /// Safe because all GPU backends share `Val = BabyBear` and `Matrix = DeviceMatrix<F>`. | ||
| /// Panics in debug builds if `cached_mains` is non-empty (commitments differ by hash scheme). | ||
| fn coerce_gpu_ctx<HS: GpuHashScheme>( |
There was a problem hiding this comment.
transmute might be a slightly more descriptive name, but I'll leave it
This reverts commit b11e2f8.
Closes INT-6475
Relies on openvm-org/stark-backend#288
Wire
BabyBearBn254Poseidon2GpuEngineas the root prover engine for CUDA buildsPreviously the root proving step used
BabyBearPoseidon2GpuEngine(BabyBear Poseidon2 hash) even in GPU builds. This PR switches the root prover toBabyBearBn254Poseidon2GpuEngine(BN254 Poseidon2 hash) when the cuda feature is enabled, enabling native BN254 merkle hashing in the root circuit.Changes:
Cargo.toml: Updated all stark-backend git dependency branches from chore/generic-gpuengine → chore/bn254, which provides BabyBearBn254Poseidon2GpuEngine, GenericGpuBackend, and related types.crates/recursion: Addedbaby-bear-bn254-poseidon2sub-features to the cuda feature flag. Added a newVerifierTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>, BabyBearBn254Poseidon2Config>impl forVerifierSubCircuit<N>— it runs the GPU preflight on the BabyBear backend then coerces contexts to the BN254 backend (safe: both share Val = BabyBear, Matrix = DeviceMatrix; cached_mains are always empty at this point). Fixedcommit_child_vk_gpuwhere clause to useopenvm_stark_backend::prover::ProverBackend.crates/continuations-v2: AddedRootTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>>impl using the same coercion pattern. SwitchedRootGpuProvertype alias to useBabyBearBn254Poseidon2GpuEngine::PB.crates/sdk-v2:cfg_ifselectsBabyBearBn254Poseidon2GpuEnginefor CUDA root proving, CPU engine otherwise.