Skip to content

Parallelism, CRT-NTT generics, extension field arithmetic, sumcheck refactoring#3

Open
quangvdao wants to merge 4 commits intomainfrom
dev
Open

Parallelism, CRT-NTT generics, extension field arithmetic, sumcheck refactoring#3
quangvdao wants to merge 4 commits intomainfrom
dev

Conversation

@quangvdao
Copy link

Summary

Four commits bringing Hachi from single-threaded base-field-only to parallel execution with full extension field support:

1. Rayon parallelism (10e53dd)

  • Add parallel feature flag (enabled by default) with cfg_iter!/cfg_into_iter!/cfg_chunks! dispatch macros
  • Parallelize protocol hot paths: ring polynomial division, w_evals construction, M_alpha evaluation, quadratic equation folding, sumcheck round computation

2. Generic commitment config + e2e benchmark (0ed2736)

  • Make HachiCommitmentScheme generic over <const D, Cfg> so different configs can be used without code duplication
  • Remove hardcoded DefaultCommitmentConfig::D from ring_switch; flow D generically
  • Add benches/hachi_e2e.rs sweeping nv=10,14,18,20

3. CRT-NTT backend refactor (96a9ccd)

  • Generalize NTT primitives (NttPrime, NttTwiddles, MontCoeff, CyclotomicCrtNtt) over PrimeWidth trait (i16/i32) instead of hardcoding i16
  • Replace monolithic QData struct with separate GarnerData + per-prime NttPrime arrays
  • Add Q128 parameter set (5 × i32 NTT primes, D ≤ 1024) alongside existing Q32
  • Simplify ScalarBackend by removing const-generic limb count from to_ring_with_backend

4. Extension field arithmetic + sumcheck refactoring (e4f9836)

  • Trait split: CanonicalFieldFromSmallInt (from_{u,i}{8,16,32,64}, implemented by all fields including extensions) + CanonicalField (u128 repr, base fields only)
  • Extension fields: FromSmallInt, Eq, Debug for Fp2/Fp4; ExtField<F> trait with EXT_DEGREE and from_base_slice; concrete configs TwoNr, NegOneNr, UnitNr; type aliases Ext2<F>, Ext4<F>
  • Optimized arithmetic: Karatsuba multiplication (3 base muls instead of 4), specialized squaring (2 base muls for Fp2), IS_NEG_ONE non-residue specialization
  • Packed extension fields: transpose-based PackedFp2/PackedFp4 for SIMD acceleration (Plonky3's approach)
  • Sumcheck bounds relaxed: E: CanonicalFieldE: FromSmallInt (or E: FieldCore where spurious); sample_ext_challenge transcript helper
  • Extension field sumcheck tests included

Stats

44 files changed, +2179, −789

Test plan

  • cargo test --lib — 82 tests pass
  • cargo test — 56 integration tests pass (3 known Q128 Garner reconstruction failures — pre-existing, tracked separately)
  • cargo clippy --all-targets -- -D warnings — zero warnings
  • cargo fmt — clean

Made with Cursor

…ult)

- New src/parallel.rs with cfg_iter!/cfg_into_iter!/cfg_chunks! macros
  that dispatch to rayon parallel iterators when `parallel` is enabled
- Parallelize protocol hot paths: ring polynomial division, w_evals
  construction, M_alpha evaluation, ring vector evaluation, packed ring
  poly evaluation, coefficients-to-ring reduction, quadratic equation
  folding, and sumcheck round polynomial computation
- All 174 tests pass with and without the parallel feature

Made-with: Cursor
- Make HachiCommitmentScheme generic over <const D, Cfg> so different
  configs (and thus num_vars) can be used without code duplication.
- Remove hardcoded DefaultCommitmentConfig::D from ring_switch.rs;
  WCommitmentConfig and commit_w now flow D generically.
- Add benches/hachi_e2e.rs with configs sweeping nv=10,14,18,20.

Made-with: Cursor
Make NTT primitives (NttPrime, NttTwiddles, MontCoeff, CyclotomicCrtNtt)
generic over PrimeWidth (i16/i32) instead of hardcoding i16. Replace the
monolithic QData struct with separate GarnerData and per-prime NttPrime
arrays. Add Q128 parameter set (5 × i32 primes, D ≤ 1024) alongside the
existing Q32 set. Simplify ScalarBackend by removing the const-generic
limb count from to_ring_with_backend.

Made-with: Cursor
Split CanonicalField into FromSmallInt (from_{u,i}{8,16,32,64} for all
fields) and CanonicalField (u128 repr, base fields only). Implement
FromSmallInt, Eq, Debug for Fp2/Fp4. Add ExtField<F> trait with
EXT_DEGREE and from_base_slice.

Optimize extension field arithmetic: Karatsuba multiplication for Fp2
and Fp4 (3 base muls instead of 4), specialized squaring (2 base muls
for Fp2), non-residue IS_NEG_ONE specialization. Add concrete configs
(TwoNr, NegOneNr, UnitNr) and type aliases Ext2<F>, Ext4<F>.

Add transpose-based packed extension fields (PackedFp2, PackedFp4)
for SIMD acceleration, following Plonky3's approach.

Relax sumcheck bounds from E: CanonicalField to E: FromSmallInt (or
E: FieldCore where spurious). Add sample_ext_challenge transcript
helper. Includes tests for extension field sumcheck execution.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant