feat: make ShaVmAir's degree 3 by Golovanov399 · Pull Request #2422 · openvm-org/openvm

Golovanov399 · 2026-02-18T13:38:58Z

This will resolve INT-6189

- replace `SegmentationLimits` with `SegmentationConfig` in `SystemConfig` - add a `interaction_cell_weight` parameter to `SegmentationConfig` that specifies how much cells does an interaction contribute at each row

`main_cells_used` is inaccurate because trace height calculations are not accurate when cuda tracegen is enabled. to avoid confusion, don't emit these metrics

Resolves INT-5611

For large metrics, mermaid is too much text. Also switched to outputting detailed metrics in a separate markdown file. In the benchmark CI, we still cat the detailed metrics back into the main markdown file. The svg chart is uploaded to public s3 similar to flamegraphs so they can be viewed from the markdown. For later: - I feel like we can switch to having the detailed metrics be stored in a sqlite file. That way it can be downloaded and processed more easily for complex metrics. For now I just split it into a separate markdown for simplicity.

closes INT-5391

segment_ctx.rs: - DEFAULT_MAX_CELLS → DEFAULT_MAX_MEMORY = 15gb - max_cells → max_memory in SegmentationLimits - set_max_cells → set_max_memory ctx.rs: - with_max_cells → with_max_memory metered_cost.rs: - Updated import to use DEFAULT_MAX_MEMORY cli/src/commands/prove.rs: - Updated import to DEFAULT_MAX_MEMORY - segment_max_cells → segment_max_memory - with_max_cells → with_max_memory benchmarks/prove/src/bin/async_regex.rs: - segment_max_cells → segment_max_memory - set_max_cells → set_max_memory benchmarks/prove/src/util.rs: - segment_max_cells → segment_max_memory - set_max_cells → set_max_memory --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Due to the difference in error types, switched to using `should_panic` for now. We should go through later and switch everything back to the precise error types. closes INT-5904

Resolves INT-5850.

Also shows delta for "Parallel Proof Time (N provers)" column

Resolves INT-5851.

) Resolves INT-5905.

Resolves INT-5853 and INT-5906.

…n] (#2366) Resolves INT-5852.

…and ECC (#2372) - **Replace expensive BigUint computation during preflight with fast native field arithmetic** (halo2curves/blstrs) for all known field types (K256, P256, BN254, BLS12-381) and ECC curve operations. The trace filler already re-executes with BigUint for constraint generation, so preflight only needs to compute outputs for memory writes. - **Cache modulus constants** with `once_cell::Lazy<BigUint>` to eliminate repeated hex string parsing in `get_field_type()`/`get_fp2_field_type()` and `get_curve_type()` (previously called on every instruction). - **Cache `FieldType`/`CurveType` on executor structs** at construction time, eliminating per-instruction BigUint comparisons in preflight. - **Remove `DynArray` heap allocations** in preflight by using stack-allocated typed arrays directly from adapter read/write, with `as_flattened()` for zero-cost conversions. - **Add `adapter()` accessor** to `FieldExpressionExecutor` for use by custom `PreflightExecutor` implementations. SETUP operations and unknown field types fall back to `run_field_expression_precomputed`. - [x] `cargo nextest run -p openvm-algebra-circuit` — all 18 non-pre-existing-failure tests pass (8 modular addsub/muldiv, 2 is_equal positive, 8 fp2_chip) - [x] `cargo nextest run -p openvm-ecc-circuit` — all 8 tests pass (3 add_ne, 5 double including nonzero_a) - [x] `cargo clippy -p openvm-algebra-circuit -p openvm-ecc-circuit --all-targets` — no new warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

keccak (`p3_inner_tracegen`) - force `__noinline__` - get -92% stack size BigUintGPU::mod_div - force `__noinline__` - get -85% stack size sha256 (first and second pass) - -76% & -90%: - `generate_block_trace`, `generate_missing_cells` → `__noinline__` - `generate_carry_ae`, `generate_intermed_4`, `generate_intermed_12` → Compute on-the-fly The goal: reduce memory peak (and get close to mem tracker report) Was +2 GB -> reduced to +0.9 GB Should be tested on various blocks (was tested on 21M)

)

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>

This PR tunes the leaf aggregation parameters by changing `n_stack` in `default_leaf_params` from 18 to 19. **Configuration:** - `l_skip = 2` (unchanged) - `n_stack = 19` (changed from 18) - **Total: l_skip + n_stack = 21** This PR is part of a series testing different values for l_skip + n_stack. Related to #419 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Zach Langley <zlangley@users.noreply.github.com> Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com>

include was using internal thrust file docker build failed

…486) ## Summary - Replace rayon iteration with `std::thread::scope` for preflight in the verifier sub-circuit - With only 3-4 proofs max (`MAX_NUM_CHILDREN_LEAF=4`, `MAX_NUM_CHILDREN_INTERNAL=3`), this avoids Rayon's thread pool overhead (wake-up, work stealing, synchronization) while still getting parallelism with minimal overhead from direct thread spawning - CPU module tracegen still uses `par_iter` for parallelism across modules ## Test plan - [x] Existing tests pass - [ ] Verify aggregation performance is unchanged or improved 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…#489) Summary metrics for `run_preflight` have disappeared, adding them back

)

Adds a second internal-recursive layer before the compression layer. We found that on occasion the last internal-recursive layer may have > 131000 Poseidon2 trace row columns, which contributes significantly to the `Proof` size. We mitigate this risk by adding an additional layer. --------- Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>

)

Before merging, I will force push to rebase `openvm:develop-v2` and `stark-backend:develop-v2` with the current `develop-v2-new` branches. Updated reth-bench to use `openvm-eth.git` which updates reth to v1.10 so there are slight guest program changes (improvements). Closes INT-6109 ## Claude Summary Upgrade Plonky3 from git revision to version 0.4.1 and update stark-backend to `develop-v2-new` branch. This PR applies all necessary API renames and fixes to make `crates/` compile with the new Plonky3 0.4.1 release. ## Changes ### Dependency Updates - Plonky3: git rev `539bbc8` → version `0.4.1` (crates.io) - stark-backend: branch `develop-v2` → `develop-v2-new` - Removed `nightly-features` feature flag from sdk ### Plonky3 API Renames **Trait renames:** - `FieldAlgebra` → `PrimeCharacteristicRing` - `FieldExtensionAlgebra` → `BasedVectorSpace` **Associated type renames:** - `PrimeCharacteristicRing::F` → `PrimeCharacteristicRing::PrimeSubfield` **Method renames:** | Old | New | |-----|-----| | `from_canonical_u8/u32/usize` | `from_u8/u32/usize` | | `from_wrapped_u32/u64` | `from_u32/u64` | | `from_base_slice` | `from_basis_coefficients_slice` | | `from_base_iter` | `from_basis_coefficients_iter` | | `from_base_fn` | `from_basis_coefficients_fn` | | `as_base_slice` | `as_basis_coefficients_slice` | | `sample_ext_element` | `sample_algebra_element` | | `from_f` | `from` | | `Bn254Fr` | `Bn254` | ### API Signature Changes **Methods now return `Option`:** - `from_basis_coefficients_slice` → added `.unwrap()` calls - `from_basis_coefficients_iter` → added `.unwrap()` calls - `row_slice(n)` → added `.expect("window should have two elements")` calls - `ith_basis_element(n)` → added `.expect("basis element index out of bounds")` calls **New conversion method:** - Use `from_prime_subfield()` instead of `from()` when converting from `PrimeSubfield` type ### Additional Trait Bounds - Added `InjectiveMonomial<BABY_BEAR_POSEIDON2_SBOX_DEGREE>` bound where `Poseidon2SubChip` is used ## Test Plan - [x] `cargo build` compiles all default crates - [x] `cargo build --features cuda` compiles with CUDA support - [ ] Run test suite --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…hts (#508)

This resolves INT-6035. All changes are about using the flattened opening claims properly.

Merge openvm-org/stark-backend#246 first and then update stark-backend branch before merging this PR. closes INT-5862 INT-5824 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zach Langley <zach@axiom.xyz>

Update paths and code usage for sdk-v2, recursion-v2, continuations-v2

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>

- Comment out v1 crates (continuations, verify_stark) from workspace - Remove openvm-native-compiler dev-dep from openvm-circuit, replace v1 native opcodes with rv32im equivalents in program tests - Fix sdk-v2 cuda feature missing dep:openvm-cuda-backend - Migrate extensions/ecc/tests from openvm-sdk to sdk-v2 - Fix FriParameters -> SystemParams in pairing tests - Fix air_test_impl type inference in ruint tests

…racegen

stephenh-axiom-xyz and others added 30 commits February 11, 2026 05:23

chore: generate_blob aggregate metrics in prof

bf5f649

feat: separate aggregated metrics by phase (#2291)

b49d45d

chore: aggregate more prover spans (#2294)

173e52f

chore: fix span name (#2295)

46d1d83

chore: rename metric spans to reflect nesting (#2305)

e977ed0

feat: add parameter for interaction contribution to cell count (#2309)

501eb79

- replace `SegmentationLimits` with `SegmentationConfig` in `SystemConfig` - add a `interaction_cell_weight` parameter to `SegmentationConfig` that specifies how much cells does an interaction contribute at each row

chore: remove incorrect metrics when cuda enabled (#2310)

5174f8d

`main_cells_used` is inaccurate because trace height calculations are not accurate when cuda tracegen is enabled. to avoid confusion, don't emit these metrics

fix: total cell calculation (#2311)

ee6dd8f

chore: clippy and update stark-backend (#2312)

6dd9004

fix: avoid implicit copies in segmentation structs (#2313)

ceb0ddb

feat: add parallel proof time for fixed number of devices (#2314)

6a417f1

Resolves INT-5611

fix: hybrid gpu chips should transpose on gpu (#2316)

c216842

fix(develop-v2): byte conversion in openvm-prof (#2317)

7449671

chore: reduce number of threads in divrem kernel (#2319)

a86f671

feat: add back debug_proving_ctx (#2329)

6bf1fe4

closes INT-5391

fix: test-utils for aot

72b0dda

chore: remove rebase artifact

820b407

chore(segmentation): add secondary weight for main cells (#2343)

cb0a3ff

fix(v2): circuit primitives unit tests (#2347)

74d81de

Due to the difference in error types, switched to using `should_panic` for now. We should go through later and switch everything back to the precise error types. closes INT-5904

perf: remove preprocessed trace from VmConnectorAir (#2350)

95ed89a

Resolves INT-5850.

fix: clean up metrics reporting for cuda & recursion (#2355)

a520811

Also shows delta for "Parallel Proof Time (N provers)" column

perf: remove preprocessed trace from VarRangeCheckerAIR (#2357)

1b1a05e

Resolves INT-5851.

perf: remove preprocessed trace varrangecheckerair [gpu tracegen] (#2359

41713da

) Resolves INT-5905.

chore: update interaction cell weight (#2367)

9d4c848

perf: remove preprocessed trace from RangeTupleCheckerAir (#2358)

545689d

Resolves INT-5853 and INT-5906.

perf: remove preprocessed trace bitwiseoplookupair [cpu & gpu tracege…

30ec05e

…n] (#2366) Resolves INT-5852.

stephenh-axiom-xyz and others added 28 commits February 15, 2026 22:59

fix: stacking reduction tracegen kernels fill one too many columns (#451

6713e3e

)

fix: remove par_iter in batch constraints module (#450)

db0ca86

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>

feat: CUDA tracegen for Transcript AIR (#454)

b41a6d1

fix: don't clone Proof or Preflight in tracegen when CUDA enabled (#462)

eefe82e

feat: constraint folding CUDA tracegen (#461)

0b1f4d7

feat: InteractionsFoldingAir CUDA tracegen (#472)

db55e3f

fix: cuda compilation (#473)

2686350

include was using internal thrust file docker build failed

fix: add span guards back to parallel run_preflight block for metrics (…

d51a824

…#489) Summary metrics for `run_preflight` have disappeared, adding them back

perf: parallelize CPU tracegen for GKR, WHIR, and stacking modules (#491

f461266

)

feat: AIRs to handle public values at the root continuations level (#497

1998fcb

)

perf: integrate Poseidon2 compress bus into recursion circuit (#501)

f957671

chore: fix warnings (#506)

44d137b

feat: root layer integration into continuations + SDK crates (#503)

dd6f016

perf: remove user_pvs_commit public value from non-root pvs (#505)

62d508d

feat: modify ModuleChip to enable forced recursion circuit trace heig…

82ff536

…hts (#508)

feat: do not compute rotation when not needed (#495)

e8684f8

This resolves INT-6035. All changes are about using the flattened opening claims properly.

feat: set and enforce root layer constant trace heights (#509)

e5538ee

feat: use eval-to-coeff RS encoding (#512)

b63a0da

Merge openvm-org/stark-backend#246 first and then update stark-backend branch before merging this PR. closes INT-5862 INT-5824 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zach Langley <zach@axiom.xyz>

fix(sdk-v2): update import paths for stark-backend(v2) (#2413)

48108aa

Update paths and code usage for sdk-v2, recursion-v2, continuations-v2

feat: add stacked columns batching PoW (#2414)

9d42e2a

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>

chore: Cargo.toml fixes (#2416)

3ef19c0

ci(temp): turn off failing CI (#2417)

7aac401

Hopefully make the degree of sha airs lower while adjusting the cpu t…

172fdf4

…racegen

Golovanov399 changed the base branch from main to develop-v2.0.0-beta February 18, 2026 13:39

jpw-axiom force-pushed the develop-v2.0.0-beta branch from 1952632 to c9f04db Compare March 5, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make ShaVmAir's degree 3#2422

feat: make ShaVmAir's degree 3#2422
Golovanov399 wants to merge 268 commits intodevelop-v2.0.0-betafrom
feat/sha-lower-degree

Golovanov399 commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

Golovanov399 commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants