Skip to content

chore(cuda): use new backend with bn254 engine#2522

Merged
jonathanpwang merged 8 commits intodevelop-v2.0.0-betafrom
test/generic-gpu-engine
Mar 10, 2026
Merged

chore(cuda): use new backend with bn254 engine#2522
jonathanpwang merged 8 commits intodevelop-v2.0.0-betafrom
test/generic-gpu-engine

Conversation

@gaxiom
Copy link
Contributor

@gaxiom gaxiom commented Mar 9, 2026

Closes INT-6475
Relies on openvm-org/stark-backend#288

Wire BabyBearBn254Poseidon2GpuEngine as the root prover engine for CUDA builds

Previously the root proving step used BabyBearPoseidon2GpuEngine (BabyBear Poseidon2 hash) even in GPU builds. This PR switches the root prover to BabyBearBn254Poseidon2GpuEngine (BN254 Poseidon2 hash) when the cuda feature is enabled, enabling native BN254 merkle hashing in the root circuit.

Changes:

  • Cargo.toml: Updated all stark-backend git dependency branches from chore/generic-gpuengine → chore/bn254, which provides BabyBearBn254Poseidon2GpuEngine, GenericGpuBackend, and related types.
  • crates/recursion: Added baby-bear-bn254-poseidon2 sub-features to the cuda feature flag. Added a new VerifierTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>, BabyBearBn254Poseidon2Config> impl for VerifierSubCircuit<N> — it runs the GPU preflight on the BabyBear backend then coerces contexts to the BN254 backend (safe: both share Val = BabyBear, Matrix = DeviceMatrix; cached_mains are always empty at this point). Fixed commit_child_vk_gpu where clause to use openvm_stark_backend::prover::ProverBackend.
  • crates/continuations-v2: Added RootTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>> impl using the same coercion pattern. Switched RootGpuProver type alias to use BabyBearBn254Poseidon2GpuEngine::PB.
  • crates/sdk-v2: cfg_if selects BabyBearBn254Poseidon2GpuEngine for CUDA root proving, CPU engine otherwise.

@gaxiom
Copy link
Contributor Author

gaxiom commented Mar 9, 2026

PR Summary: test/generic-gpu-engine

Overview

This PR wires up the GPU-accelerated root prover to use the BabyBear × Bn254 Poseidon2 hash
scheme (via GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>) instead of the CPU fallback.
Previously the RootGpuProver type alias was commented out and the CUDA path in sdk-v2 was
falling back to RootCpuProver; after this PR the GPU path is fully wired end-to-end.

The changes span four crates:

Crate Nature of change
continuations-v2 New RootGpuProver type alias; RootTraceGen impl retargeted to Bn254 GPU backend
recursion commit_child_vk_gpu generalized; VerifierTraceGen impl made generic over GpuHashScheme
sdk-v2 Switch CUDA root prover from CPU engine to BabyBearBn254Poseidon2GpuEngine
Root Cargo.toml Advance stark-backend pin from chore/bn254develop-v2 (post-merge)

A minor fix is also included: a method-call disambiguation in the VM CUDA program chip.


Cargo.toml — stark-backend branch update

All six stark-backend workspace dependencies (openvm-stark-backend, openvm-codec-derive,
openvm-stark-sdk, openvm-cuda-backend, openvm-cuda-builder, openvm-cuda-common) are
updated to branch develop-v2, which now contains GenericGpuBackend, GpuHashScheme,
BabyBearBn254Poseidon2HashScheme, and BabyBearBn254Poseidon2GpuEngine after the upstream
stark-backend PR was merged.


crates/sdk-v2 — wire GPU root prover to Bn254 engine

File: crates/sdk-v2/src/prover/root.rs

Under #[cfg(feature = "cuda")], two type aliases are swapped:

  • RootInnerProver: RootCpuProverRootGpuProver
  • E (the engine for the root circuit): BabyBearBn254Poseidon2CpuEngineBabyBearBn254Poseidon2GpuEngine

ChildE (the child circuit engine) stays BabyBearPoseidon2GpuEngine — unchanged.

File: crates/sdk-v2/Cargo.toml

openvm-cuda-backend now explicitly enables features = ["baby-bear-bn254-poseidon2"] (was
declared without features).


crates/continuations-v2 — new RootGpuProver type alias and trace gen impl

src/prover/mod.rs

The previously commented-out RootGpuProver type alias is now active:

pub type RootGpuProver = RootProver<
    <BabyBearBn254Poseidon2GpuEngine as openvm_stark_backend::StarkEngine>::PB,
    VerifierSubCircuit<1>,
    RootTraceGenImpl,
>;

This resolves to RootProver<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>, ...>.

src/circuit/root/trace.rs

The existing RootTraceGen<GpuBackend> impl is retargeted to
RootTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>>. Both methods
(generate_pre_verifier_subcircuit_ctx and generate_other_proving_ctxs) delegate to the
CPU impl and then upload each AirProvingContext<CpuBackend> directly to the Bn254 GPU backend
via transport_air_proving_ctx_to_device::<BabyBearBn254Poseidon2HashScheme> — no intermediate
coercion through GpuBackend is needed because transport_air_proving_ctx_to_device is already
generic over HS: GpuHashScheme in the upstream.

File: crates/continuations-v2/Cargo.toml

openvm-cuda-backend dependency gets features = ["baby-bear-bn254-poseidon2"].


crates/recursion — generalized GPU backend in BatchConstraintModule and VerifierTraceGen

src/batch_constraint/mod.rs — generalize commit_child_vk_gpu

Previously, commit_child_vk_gpu was bound to E: StarkEngine<SC = BabyBearPoseidon2Config, PB = GpuBackend>
and returned CommittedTraceData<GpuBackend>. The signature is generalized to accept any engine
whose prover backend has Val = F and Matrix = DeviceMatrix<F>:

pub fn commit_child_vk_gpu<E>(
    &self,
    engine: &E,
    child_vk: &MultiStarkVerifyingKey<BabyBearPoseidon2Config>,
) -> CommittedTraceData<E::PB>
where
    E: StarkEngine,
    E::PB: ProverBackend<Val = F, Matrix = DeviceMatrix<F>>,

This allows it to be called with either BabyBearPoseidon2GpuEngine or
BabyBearBn254Poseidon2GpuEngine without any specialization.

src/system/mod.rs — blanket VerifierTraceGen impl over GpuHashScheme

The existing concrete impl VerifierTraceGen<GpuBackend, SC> is replaced by a blanket impl
that covers all GPU hash schemes at once:

impl<HS: GpuHashScheme, const MAX_NUM_PROOFS: usize>
    VerifierTraceGen<GenericGpuBackend<HS>, HS::SC> for VerifierSubCircuit<MAX_NUM_PROOFS>

The inner proving machinery (generate_gpu_ctxs and friends) remains bound to GpuBackend
(the existing BabyBear Poseidon2 GPU backend). To bridge backends, a local helper is introduced:

fn coerce_gpu_ctx<HS: GpuHashScheme>(
    ctx: AirProvingContext<GpuBackend>,
) -> AirProvingContext<GenericGpuBackend<HS>> { ... }

Inside generate_proving_ctxs, a VerifierExternalData<GpuBackend> reborrow is constructed
from the caller-supplied VerifierExternalData<GenericGpuBackend<HS>>, all module GPU contexts
are generated as AirProvingContext<GpuBackend>, and then coerced in bulk via coerce_gpu_ctx::<HS>.
The coercion is safe because GpuBackend and GenericGpuBackend<HS> share Val = BabyBear and
Matrix = DeviceMatrix<F>; the helper panics in debug builds if cached_mains is non-empty
(commitments differ between backends).

File: crates/recursion/Cargo.toml

The cuda feature now additionally enables openvm-cuda-backend/baby-bear-bn254-poseidon2 and
openvm-stark-sdk/baby-bear-bn254-poseidon2.


Minor fix

crates/vm/src/system/cuda/program.rs

device.commit(...) is replaced by TraceCommitter::<GpuBackend>::commit(device, ...) to
disambiguate the method call now that multiple backends' commit methods may be in scope.

Copy link
Contributor

@stephenh-axiom-xyz stephenh-axiom-xyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using coerce_gpu_to_bn254_ctx, I think we can implement RootTraceGen and VerifierTraceGen for a trait generic PB with Val = F and Matrix = DeviceMatrix<F>. Consider making GenericGpuBackend generic in StarkProtocolConfig

@jonathanpwang
Copy link
Contributor

jonathanpwang commented Mar 9, 2026

Consider making GenericGpuBackend generic in StarkProtocolConfig

@stephenh-axiom-xyz
This is too big a change and specifically something we did not want to do. The GPU is not generic in F, EF at all.

But a generic trait in PB with where clauses for the present situation may still be possible.

///
/// Safe because all GPU backends share `Val = BabyBear` and `Matrix = DeviceMatrix<F>`.
/// Panics in debug builds if `cached_mains` is non-empty (commitments differ by hash scheme).
fn coerce_gpu_ctx<HS: GpuHashScheme>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transmute might be a slightly more descriptive name, but I'll leave it

@jonathanpwang jonathanpwang merged commit 38a6574 into develop-v2.0.0-beta Mar 10, 2026
9 checks passed
@jonathanpwang jonathanpwang deleted the test/generic-gpu-engine branch March 10, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants