chore(cuda): use new backend with bn254 engine by gaxiom · Pull Request #2522 · openvm-org/openvm

gaxiom · 2026-03-09T19:29:48Z

Closes INT-6475
Relies on openvm-org/stark-backend#288

Wire BabyBearBn254Poseidon2GpuEngine as the root prover engine for CUDA builds

Previously the root proving step used BabyBearPoseidon2GpuEngine (BabyBear Poseidon2 hash) even in GPU builds. This PR switches the root prover to BabyBearBn254Poseidon2GpuEngine (BN254 Poseidon2 hash) when the cuda feature is enabled, enabling native BN254 merkle hashing in the root circuit.

Changes:

Cargo.toml: Updated all stark-backend git dependency branches from chore/generic-gpuengine → chore/bn254, which provides BabyBearBn254Poseidon2GpuEngine, GenericGpuBackend, and related types.
crates/recursion: Added baby-bear-bn254-poseidon2 sub-features to the cuda feature flag. Added a new VerifierTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>, BabyBearBn254Poseidon2Config> impl for VerifierSubCircuit<N> — it runs the GPU preflight on the BabyBear backend then coerces contexts to the BN254 backend (safe: both share Val = BabyBear, Matrix = DeviceMatrix; cached_mains are always empty at this point). Fixed commit_child_vk_gpu where clause to use openvm_stark_backend::prover::ProverBackend.
crates/continuations-v2: Added RootTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>> impl using the same coercion pattern. Switched RootGpuProver type alias to use BabyBearBn254Poseidon2GpuEngine::PB.
crates/sdk-v2: cfg_if selects BabyBearBn254Poseidon2GpuEngine for CUDA root proving, CPU engine otherwise.

gaxiom · 2026-03-09T19:39:46Z

PR Summary: `test/generic-gpu-engine`

Overview

This PR wires up the GPU-accelerated root prover to use the BabyBear × Bn254 Poseidon2 hash
scheme (via GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>) instead of the CPU fallback.
Previously the RootGpuProver type alias was commented out and the CUDA path in sdk-v2 was
falling back to RootCpuProver; after this PR the GPU path is fully wired end-to-end.

The changes span four crates:

Crate	Nature of change
`continuations-v2`	New `RootGpuProver` type alias; `RootTraceGen` impl retargeted to Bn254 GPU backend
`recursion`	`commit_child_vk_gpu` generalized; `VerifierTraceGen` impl made generic over `GpuHashScheme`
`sdk-v2`	Switch CUDA root prover from CPU engine to `BabyBearBn254Poseidon2GpuEngine`
Root `Cargo.toml`	Advance stark-backend pin from `chore/bn254` → `develop-v2` (post-merge)

A minor fix is also included: a method-call disambiguation in the VM CUDA program chip.

`Cargo.toml` — stark-backend branch update

All six stark-backend workspace dependencies (openvm-stark-backend, openvm-codec-derive,
openvm-stark-sdk, openvm-cuda-backend, openvm-cuda-builder, openvm-cuda-common) are
updated to branch develop-v2, which now contains GenericGpuBackend, GpuHashScheme,
BabyBearBn254Poseidon2HashScheme, and BabyBearBn254Poseidon2GpuEngine after the upstream
stark-backend PR was merged.

`crates/sdk-v2` — wire GPU root prover to Bn254 engine

File: crates/sdk-v2/src/prover/root.rs

Under #[cfg(feature = "cuda")], two type aliases are swapped:

RootInnerProver: RootCpuProver → RootGpuProver
E (the engine for the root circuit): BabyBearBn254Poseidon2CpuEngine → BabyBearBn254Poseidon2GpuEngine

ChildE (the child circuit engine) stays BabyBearPoseidon2GpuEngine — unchanged.

File: crates/sdk-v2/Cargo.toml

openvm-cuda-backend now explicitly enables features = ["baby-bear-bn254-poseidon2"] (was
declared without features).

`crates/continuations-v2` — new `RootGpuProver` type alias and trace gen impl

`src/prover/mod.rs`

The previously commented-out RootGpuProver type alias is now active:

pub type RootGpuProver = RootProver<
    <BabyBearBn254Poseidon2GpuEngine as openvm_stark_backend::StarkEngine>::PB,
    VerifierSubCircuit<1>,
    RootTraceGenImpl,
>;

This resolves to RootProver<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>, ...>.

`src/circuit/root/trace.rs`

The existing RootTraceGen<GpuBackend> impl is retargeted to
RootTraceGen<GenericGpuBackend<BabyBearBn254Poseidon2HashScheme>>. Both methods
(generate_pre_verifier_subcircuit_ctx and generate_other_proving_ctxs) delegate to the
CPU impl and then upload each AirProvingContext<CpuBackend> directly to the Bn254 GPU backend
via transport_air_proving_ctx_to_device::<BabyBearBn254Poseidon2HashScheme> — no intermediate
coercion through GpuBackend is needed because transport_air_proving_ctx_to_device is already
generic over HS: GpuHashScheme in the upstream.

File: crates/continuations-v2/Cargo.toml

openvm-cuda-backend dependency gets features = ["baby-bear-bn254-poseidon2"].

`crates/recursion` — generalized GPU backend in `BatchConstraintModule` and `VerifierTraceGen`

`src/batch_constraint/mod.rs` — generalize `commit_child_vk_gpu`

Previously, commit_child_vk_gpu was bound to E: StarkEngine<SC = BabyBearPoseidon2Config, PB = GpuBackend>
and returned CommittedTraceData<GpuBackend>. The signature is generalized to accept any engine
whose prover backend has Val = F and Matrix = DeviceMatrix<F>:

pub fn commit_child_vk_gpu<E>(
    &self,
    engine: &E,
    child_vk: &MultiStarkVerifyingKey<BabyBearPoseidon2Config>,
) -> CommittedTraceData<E::PB>
where
    E: StarkEngine,
    E::PB: ProverBackend<Val = F, Matrix = DeviceMatrix<F>>,

This allows it to be called with either BabyBearPoseidon2GpuEngine or
BabyBearBn254Poseidon2GpuEngine without any specialization.

`src/system/mod.rs` — blanket `VerifierTraceGen` impl over `GpuHashScheme`

The existing concrete impl VerifierTraceGen<GpuBackend, SC> is replaced by a blanket impl
that covers all GPU hash schemes at once:

impl<HS: GpuHashScheme, const MAX_NUM_PROOFS: usize>
    VerifierTraceGen<GenericGpuBackend<HS>, HS::SC> for VerifierSubCircuit<MAX_NUM_PROOFS>

The inner proving machinery (generate_gpu_ctxs and friends) remains bound to GpuBackend
(the existing BabyBear Poseidon2 GPU backend). To bridge backends, a local helper is introduced:

fn coerce_gpu_ctx<HS: GpuHashScheme>(
    ctx: AirProvingContext<GpuBackend>,
) -> AirProvingContext<GenericGpuBackend<HS>> { ... }

Inside generate_proving_ctxs, a VerifierExternalData<GpuBackend> reborrow is constructed
from the caller-supplied VerifierExternalData<GenericGpuBackend<HS>>, all module GPU contexts
are generated as AirProvingContext<GpuBackend>, and then coerced in bulk via coerce_gpu_ctx::<HS>.
The coercion is safe because GpuBackend and GenericGpuBackend<HS> share Val = BabyBear and
Matrix = DeviceMatrix<F>; the helper panics in debug builds if cached_mains is non-empty
(commitments differ between backends).

File: crates/recursion/Cargo.toml

The cuda feature now additionally enables openvm-cuda-backend/baby-bear-bn254-poseidon2 and
openvm-stark-sdk/baby-bear-bn254-poseidon2.

Minor fix

`crates/vm/src/system/cuda/program.rs`

device.commit(...) is replaced by TraceCommitter::<GpuBackend>::commit(device, ...) to
disambiguate the method call now that multiple backends' commit methods may be in scope.

stephenh-axiom-xyz

Instead of using coerce_gpu_to_bn254_ctx, I think we can implement RootTraceGen and VerifierTraceGen for a trait generic PB with Val = F and Matrix = DeviceMatrix<F>. Consider making GenericGpuBackend generic in StarkProtocolConfig

jonathanpwang · 2026-03-09T20:43:13Z

Consider making GenericGpuBackend generic in StarkProtocolConfig

@stephenh-axiom-xyz
This is too big a change and specifically something we did not want to do. The GPU is not generic in F, EF at all.

But a generic trait in PB with where clauses for the present situation may still be possible.

crates/continuations-v2/src/circuit/root/trace.rs

crates/recursion/src/system/mod.rs

jonathanpwang · 2026-03-10T00:43:10Z

crates/recursion/src/system/mod.rs

+    ///
+    /// Safe because all GPU backends share `Val = BabyBear` and `Matrix = DeviceMatrix<F>`.
+    /// Panics in debug builds if `cached_mains` is non-empty (commitments differ by hash scheme).
+    fn coerce_gpu_ctx<HS: GpuHashScheme>(


transmute might be a slightly more descriptive name, but I'll leave it

This reverts commit b11e2f8.

gaxiom added 3 commits March 6, 2026 20:20

stark-backend ref to

51b7abc

fix program

81ffe08

support bn254 cuda

a65ea79

gaxiom requested review from jonathanpwang and stephenh-axiom-xyz March 9, 2026 19:39

stephenh-axiom-xyz reviewed Mar 9, 2026

View reviewed changes

jonathanpwang reviewed Mar 9, 2026

View reviewed changes

crates/continuations-v2/src/circuit/root/trace.rs Outdated Show resolved Hide resolved

removed upstream concrete

571bc4a

jonathanpwang reviewed Mar 9, 2026

View reviewed changes

crates/continuations-v2/src/circuit/root/trace.rs Show resolved Hide resolved

chore: rename to type alias

b11e2f8

jonathanpwang reviewed Mar 10, 2026

View reviewed changes

crates/recursion/src/system/mod.rs Outdated Show resolved Hide resolved

remove unnecessary generic from VerifierExternalData

c7e300e

jonathanpwang approved these changes Mar 10, 2026

View reviewed changes

remove unnecessary bn254 feature from recursion

777372f

jonathanpwang reviewed Mar 10, 2026

View reviewed changes

Revert "chore: rename to type alias"

0af724c

This reverts commit b11e2f8.

jonathanpwang merged commit 38a6574 into develop-v2.0.0-beta Mar 10, 2026
9 checks passed

jonathanpwang deleted the test/generic-gpu-engine branch March 10, 2026 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(cuda): use new backend with bn254 engine#2522

chore(cuda): use new backend with bn254 engine#2522
jonathanpwang merged 8 commits intodevelop-v2.0.0-betafrom
test/generic-gpu-engine

gaxiom commented Mar 9, 2026 •

edited by jonathanpwang

Loading

Uh oh!

gaxiom commented Mar 9, 2026 •

edited

Loading

Uh oh!

stephenh-axiom-xyz left a comment

Uh oh!

jonathanpwang commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathanpwang Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gaxiom commented Mar 9, 2026 • edited by jonathanpwang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaxiom commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary: test/generic-gpu-engine

Overview

Cargo.toml — stark-backend branch update

crates/sdk-v2 — wire GPU root prover to Bn254 engine

crates/continuations-v2 — new RootGpuProver type alias and trace gen impl

src/prover/mod.rs

src/circuit/root/trace.rs

crates/recursion — generalized GPU backend in BatchConstraintModule and VerifierTraceGen

src/batch_constraint/mod.rs — generalize commit_child_vk_gpu

src/system/mod.rs — blanket VerifierTraceGen impl over GpuHashScheme

Minor fix

crates/vm/src/system/cuda/program.rs

Uh oh!

stephenh-axiom-xyz left a comment

Choose a reason for hiding this comment

Uh oh!

jonathanpwang commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathanpwang Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gaxiom commented Mar 9, 2026 •

edited by jonathanpwang

Loading

gaxiom commented Mar 9, 2026 •

edited

Loading

PR Summary: `test/generic-gpu-engine`

`Cargo.toml` — stark-backend branch update

`crates/sdk-v2` — wire GPU root prover to Bn254 engine

`crates/continuations-v2` — new `RootGpuProver` type alias and trace gen impl

`src/prover/mod.rs`

`src/circuit/root/trace.rs`

`crates/recursion` — generalized GPU backend in `BatchConstraintModule` and `VerifierTraceGen`

`src/batch_constraint/mod.rs` — generalize `commit_child_vk_gpu`

`src/system/mod.rs` — blanket `VerifierTraceGen` impl over `GpuHashScheme`

`crates/vm/src/system/cuda/program.rs`

jonathanpwang commented Mar 9, 2026 •

edited

Loading