Conversation
|
To compare against the numbers in #2869 for the recursive verifier (verifying a program executing in 2^20 cycles) This is a 4x improvement in the above case. |
da140ac to
77773d9
Compare
| let w1: AB::Expr = local.chiplets[MEMORY_WORD_ADDR_HI_COL_IDX - CHIPLETS_OFFSET].clone().into(); | ||
| let w1_mul4: AB::Expr = w1.clone() * AB::Expr::from_u16(4); | ||
|
|
||
| let den0: AB::ExprEF = alpha.clone() + Into::<AB::ExprEF>::into(w0); |
There was a problem hiding this comment.
Should this add protocol-level domain separation before v_wiring can safely carry ACE wires, raw memory range-check values, and the new hasher perm-link messages together? Right now the memory side uses plain alpha + w0/w1/4*w1, ACE uses encode([clk, ctx, id, ...]), and the perm-link uses encode([0|1, h0..h11]) on the same LogUp column.
If any of those encodings was to alias, could one subsystem cancel another on the shared sum? #1614 explicitly called out adding an op-label when reusing the wiring bus, and I don't see that namespace implemented here yet.
Nashtare
left a comment
There was a problem hiding this comment.
I would need to do another pass because this is pretty dense, but left a couple commetns while familiarizing myself with it
| let hs0 = main_trace.chiplet_selector_1(row); | ||
| let hs1 = main_trace.chiplet_selector_2(row); |
There was a problem hiding this comment.
nit: the indexing shift is a bit confusing when looking below, maybe renamed to
| let hs0 = main_trace.chiplet_selector_1(row); | |
| let hs1 = main_trace.chiplet_selector_2(row); | |
| let hs1 = main_trace.chiplet_selector_1(row); | |
| let hs2 = main_trace.chiplet_selector_2(row); |
with according updates later in the code would be clearer?
| /// TODO: These naive labels (0 and 1) risk collisions with other messages on the shared | ||
| /// v_wiring column (ACE wiring and memory range checks). Revisit when refactoring the buses. | ||
| const LABEL_IN: Felt = Felt::ZERO; | ||
| const LABEL_OUT: Felt = Felt::ONE; |
There was a problem hiding this comment.
See my comment below in this file, but I was just wondering if we should not treat this now rather than deferring it?
| } | ||
| } else { | ||
| // Permutation segment. | ||
| // This works because the hasher is always the first chiplet (rows start at 0) |
There was a problem hiding this comment.
minor: do we have an invariant check that this is actually the case? Just out of precaution
There was a problem hiding this comment.
I’m pretty sure we always have 1 hash, and the chiplet selectors make sure that it comes before any other chiplet
| /// Maps input state -> multiplicity for permutation deduplication. | ||
| /// During finalize_trace(), one 16-row perm cycle is emitted per entry. | ||
| perm_request_map: BTreeMap<StateKey, u64>, |
There was a problem hiding this comment.
nit: does ordering matter? I think perf wise a HashMap may be more efficient here
There was a problem hiding this comment.
I think this is mainly because of no-std, but I don't think the ordering matters.
There was a problem hiding this comment.
HashMap can be pulled from alloc / hashbrown though (if perf matters here)
This PR bundles several closely related changes in the hasher / chiplets area:
mrupdate_idcolumn domain-separates sibling-table entries, preventing cross-operation sibling swapping.w0/w1address-limb columns, with 16-bit range-check lookups routed through the wiring bus.These changes all touch the chiplets trace layout, bus plumbing, and AIR structure, so landing them together keeps the transition coherent.
Why
1. Deduplicate repeated permutations
The old monolithic hasher consumed 32 rows per permutation request, even if the same input state appeared repeatedly.
With the new design:
(input, output)pair,For
Mrequests withUunique input states, the rough cost changes from:32M2M + pad_to_16 + 16UThis is a clear win whenever states repeat (Merkle workloads, identical MAST roots, ...).
2. Fix sibling-table soundness
The old sibling-table encoding was vulnerable to cross-operation sibling reuse. Adding
mrupdate_iddomain-separates entries so sibling-table balance is enforced per MRUPDATE instance, not globally across unrelated operations.3. Add memory address decomposition checks
The memory chiplet now decomposes word addresses into two 16-bit limbs and proves the decomposition using range-check lookups. This closes an important missing piece in memory soundness while reusing the existing wiring-bus infrastructure.
Design
Hasher: two-region trace
The hasher trace is split into two contiguous regions:
Controller (
perm_seg = 0)Compact input/output row pairs, one pair per permutation request.
Permutation segment (
perm_seg = 1)One packed 16-row Poseidon2 cycle per unique input state.
A LogUp permutation-link on the shared
V_WIRINGauxiliary column ties controller requests to the corresponding permutation cycles.Packed 16-row Poseidon2 schedule
The 31-step Poseidon2 schedule is packed as:
init + ext1ext2..ext47 × (3 packed internal rounds)int22 + ext5ext6..ext8Packed internal rows use
s0/s1/s2as witness columns on permutation rows in order to keep constraints degree bounded. Unused witness slots are explicitly zero-constrained (out of caution) though this could be relaxed.Column layout
Hasher: 16 -> 20
New / newly significant columns:
mrupdate_id-- domain separator for sibling-table entriesis_boundary-- marks first controller input / last controller outputdirection_bit-- propagated Merkle routing bit on controller rowsperm_seg-- explicit controller vs permutation-region flagMemory: 15 -> 17
Two new columns:
w0w1These decompose the word address into 16-bit limbs. The wiring bus carries the corresponding range-check lookups.
Constraints
Hasher constraints now total 100.
Constraint group breakdown
s0,s1,s2binary on controller rowsperm_segconfinement, booleanity, monotonicity, cycle alignmentis_boundary/direction_bitto valid row typesnode_indexruleTrace width impact
The new main trace width is 72
No new auxiliary columns were added:
V_WIRING