Skip to content

feat: make ShaVmAir's degree 3#2422

Draft
Golovanov399 wants to merge 268 commits intodevelop-v2.0.0-betafrom
feat/sha-lower-degree
Draft

feat: make ShaVmAir's degree 3#2422
Golovanov399 wants to merge 268 commits intodevelop-v2.0.0-betafrom
feat/sha-lower-degree

Conversation

@Golovanov399
Copy link
Contributor

This will resolve INT-6189

stephenh-axiom-xyz and others added 30 commits February 11, 2026 05:23
- replace `SegmentationLimits` with `SegmentationConfig` in
`SystemConfig`
- add a `interaction_cell_weight` parameter to `SegmentationConfig` that
specifies how much cells does an interaction contribute at each row
`main_cells_used` is inaccurate because trace height calculations are
not accurate when cuda tracegen is enabled. to avoid confusion, don't
emit these metrics
For large metrics, mermaid is too much text.

Also switched to outputting detailed metrics in a separate markdown
file.
In the benchmark CI, we still cat the detailed metrics back into the
main markdown file.
The svg chart is uploaded to public s3 similar to flamegraphs so they
can be viewed from the markdown.

For later:
- I feel like we can switch to having the detailed metrics be stored in
a sqlite file. That way it can be downloaded and processed more easily
for complex metrics. For now I just split it into a separate markdown
for simplicity.
segment_ctx.rs:
  - DEFAULT_MAX_CELLS → DEFAULT_MAX_MEMORY = 15gb
  - max_cells → max_memory in SegmentationLimits
  - set_max_cells → set_max_memory

  ctx.rs:
  - with_max_cells → with_max_memory

  metered_cost.rs:
  - Updated import to use DEFAULT_MAX_MEMORY

  cli/src/commands/prove.rs:
  - Updated import to DEFAULT_MAX_MEMORY
  - segment_max_cells → segment_max_memory
  - with_max_cells → with_max_memory

  benchmarks/prove/src/bin/async_regex.rs:
  - segment_max_cells → segment_max_memory
  - set_max_cells → set_max_memory

  benchmarks/prove/src/util.rs:
  - segment_max_cells → segment_max_memory
  - set_max_cells → set_max_memory

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Due to the difference in error types, switched to using `should_panic`
for now. We should go through later and switch everything back to the
precise error types.

closes INT-5904
Also shows delta for "Parallel Proof Time (N provers)" column
…and ECC (#2372)

- **Replace expensive BigUint computation during preflight with fast
native field arithmetic** (halo2curves/blstrs) for all known field types
(K256, P256, BN254, BLS12-381) and ECC curve operations. The trace
filler already re-executes with BigUint for constraint generation, so
preflight only needs to compute outputs for memory writes.
- **Cache modulus constants** with `once_cell::Lazy<BigUint>` to
eliminate repeated hex string parsing in
`get_field_type()`/`get_fp2_field_type()` and `get_curve_type()`
(previously called on every instruction).
- **Cache `FieldType`/`CurveType` on executor structs** at construction
time, eliminating per-instruction BigUint comparisons in preflight.
- **Remove `DynArray` heap allocations** in preflight by using
stack-allocated typed arrays directly from adapter read/write, with
`as_flattened()` for zero-cost conversions.
- **Add `adapter()` accessor** to `FieldExpressionExecutor` for use by
custom `PreflightExecutor` implementations. SETUP operations and unknown
field types fall back to `run_field_expression_precomputed`.

- [x] `cargo nextest run -p openvm-algebra-circuit` — all 18
non-pre-existing-failure tests pass (8 modular addsub/muldiv, 2 is_equal
positive, 8 fp2_chip)
- [x] `cargo nextest run -p openvm-ecc-circuit` — all 8 tests pass (3
add_ne, 5 double including nonzero_a)
- [x] `cargo clippy -p openvm-algebra-circuit -p openvm-ecc-circuit
--all-targets` — no new warnings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
keccak (`p3_inner_tracegen`) - force `__noinline__` - get -92% stack
size
BigUintGPU::mod_div - force `__noinline__` - get -85% stack size
sha256 (first and second pass) - -76% & -90%:
- `generate_block_trace`, `generate_missing_cells` → `__noinline__`
- `generate_carry_ae`, `generate_intermed_4`, `generate_intermed_12` →
Compute on-the-fly

The goal: reduce memory peak (and get close to mem tracker report)
Was +2 GB -> reduced to +0.9 GB
Should be tested on various blocks (was tested on 21M)
stephenh-axiom-xyz and others added 28 commits February 15, 2026 22:59
Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
This PR tunes the leaf aggregation parameters by changing `n_stack` in
`default_leaf_params` from 18 to 19.

**Configuration:**
- `l_skip = 2` (unchanged)
- `n_stack = 19` (changed from 18)
- **Total: l_skip + n_stack = 21**

This PR is part of a series testing different values for l_skip +
n_stack.

Related to #419

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Zach Langley <zlangley@users.noreply.github.com>
Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com>
include was using internal thrust file
docker build failed
…486)

## Summary
- Replace rayon iteration with `std::thread::scope` for preflight in the
verifier sub-circuit
- With only 3-4 proofs max (`MAX_NUM_CHILDREN_LEAF=4`,
`MAX_NUM_CHILDREN_INTERNAL=3`), this avoids Rayon's thread pool overhead
(wake-up, work stealing, synchronization) while still getting
parallelism with minimal overhead from direct thread spawning
- CPU module tracegen still uses `par_iter` for parallelism across
modules

## Test plan
- [x] Existing tests pass
- [ ] Verify aggregation performance is unchanged or improved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…#489)

Summary metrics for `run_preflight` have disappeared, adding them back
Adds a second internal-recursive layer before the compression layer. We
found that on occasion the last internal-recursive layer may have >
131000 Poseidon2 trace row columns, which contributes significantly to
the `Proof` size. We mitigate this risk by adding an additional layer.

---------

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
Before merging, I will force push to rebase `openvm:develop-v2` and
`stark-backend:develop-v2` with the current `develop-v2-new` branches.

Updated reth-bench to use `openvm-eth.git` which updates reth to v1.10
so there are slight guest program changes (improvements).

Closes INT-6109

## Claude Summary

Upgrade Plonky3 from git revision to version 0.4.1 and update
stark-backend to `develop-v2-new` branch.

This PR applies all necessary API renames and fixes to make `crates/`
compile with the new Plonky3 0.4.1 release.

  ## Changes

  ### Dependency Updates
  - Plonky3: git rev `539bbc8` → version `0.4.1` (crates.io)
  - stark-backend: branch `develop-v2` → `develop-v2-new`
  - Removed `nightly-features` feature flag from sdk

  ### Plonky3 API Renames

  **Trait renames:**
  - `FieldAlgebra` → `PrimeCharacteristicRing`
  - `FieldExtensionAlgebra` → `BasedVectorSpace`

  **Associated type renames:**
- `PrimeCharacteristicRing::F` →
`PrimeCharacteristicRing::PrimeSubfield`

  **Method renames:**
  | Old | New |
  |-----|-----|
  | `from_canonical_u8/u32/usize` | `from_u8/u32/usize` |
  | `from_wrapped_u32/u64` | `from_u32/u64` |
  | `from_base_slice` | `from_basis_coefficients_slice` |
  | `from_base_iter` | `from_basis_coefficients_iter` |
  | `from_base_fn` | `from_basis_coefficients_fn` |
  | `as_base_slice` | `as_basis_coefficients_slice` |
  | `sample_ext_element` | `sample_algebra_element` |
  | `from_f` | `from` |
  | `Bn254Fr` | `Bn254` |

  ### API Signature Changes

  **Methods now return `Option`:**
  - `from_basis_coefficients_slice` → added `.unwrap()` calls
  - `from_basis_coefficients_iter` → added `.unwrap()` calls
- `row_slice(n)` → added `.expect("window should have two elements")`
calls
- `ith_basis_element(n)` → added `.expect("basis element index out of
bounds")` calls

  **New conversion method:**
- Use `from_prime_subfield()` instead of `from()` when converting from
`PrimeSubfield` type

  ### Additional Trait Bounds
- Added `InjectiveMonomial<BABY_BEAR_POSEIDON2_SBOX_DEGREE>` bound where
`Poseidon2SubChip` is used

  ## Test Plan
  - [x] `cargo build` compiles all default crates
  - [x] `cargo build --features cuda` compiles with CUDA support
  - [ ] Run test suite

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This resolves INT-6035.

All changes are about using the flattened opening claims properly.
Merge openvm-org/stark-backend#246 first and
then update stark-backend branch before merging this PR.

closes INT-5862 INT-5824

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zach Langley <zach@axiom.xyz>
Update paths and code usage for sdk-v2, recursion-v2, continuations-v2
Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
- Comment out v1 crates (continuations, verify_stark) from workspace
- Remove openvm-native-compiler dev-dep from openvm-circuit, replace v1
native opcodes with rv32im equivalents in program tests
- Fix sdk-v2 cuda feature missing dep:openvm-cuda-backend
- Migrate extensions/ecc/tests from openvm-sdk to sdk-v2
- Fix FriParameters -> SystemParams in pairing tests
- Fix air_test_impl type inference in ruint tests
@Golovanov399 Golovanov399 changed the base branch from main to develop-v2.0.0-beta February 18, 2026 13:39
@jpw-axiom jpw-axiom force-pushed the develop-v2.0.0-beta branch from 1952632 to c9f04db Compare March 5, 2026 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants