Hasher chiplet redesign by Al-Kindi-0 · Pull Request #2927 · 0xMiden/miden-vm

Al-Kindi-0 · 2026-03-27T15:13:16Z

This PR bundles several closely related changes in the hasher / chiplets area:

Hasher controller/permutation split -- the hasher trace is split into a compact controller region and a separate permutation segment, enabling permutation deduplication.
Packed 16-row Poseidon2 permutation segment -- the 31-step Poseidon2 schedule is packed from 32 rows down to 16 rows per unique permutation.
Sibling table soundness fix (#2220) -- a new mrupdate_id column domain-separates sibling-table entries, preventing cross-operation sibling swapping.
Memory address range checks (#1614) -- the memory chiplet gets w0/w1 address-limb columns, with 16-bit range-check lookups routed through the wiring bus.

These changes all touch the chiplets trace layout, bus plumbing, and AIR structure, so landing them together keeps the transition coherent.

Why

1. Deduplicate repeated permutations

The old monolithic hasher consumed 32 rows per permutation request, even if the same input state appeared repeatedly.

With the new design:

the controller records each request as a 2-row (input, output) pair,
the permutation segment executes one packed 16-row cycle per unique input state,
a multiplicity counter records how many controller pairs map to the same cycle.

For M requests with U unique input states, the rough cost changes from:

old: 32M
new: 2M + pad_to_16 + 16U

This is a clear win whenever states repeat (Merkle workloads, identical MAST roots, ...).

2. Fix sibling-table soundness

The old sibling-table encoding was vulnerable to cross-operation sibling reuse. Adding mrupdate_id domain-separates entries so sibling-table balance is enforced per MRUPDATE instance, not globally across unrelated operations.

3. Add memory address decomposition checks

The memory chiplet now decomposes word addresses into two 16-bit limbs and proves the decomposition using range-check lookups. This closes an important missing piece in memory soundness while reusing the existing wiring-bus infrastructure.

Design

Hasher: two-region trace

The hasher trace is split into two contiguous regions:

Controller (perm_seg = 0)
Compact input/output row pairs, one pair per permutation request.
Permutation segment (perm_seg = 1)
One packed 16-row Poseidon2 cycle per unique input state.

A LogUp permutation-link on the shared V_WIRING auxiliary column ties controller requests to the corresponding permutation cycles.

Packed 16-row Poseidon2 schedule

The 31-step Poseidon2 schedule is packed as:

row 0: init + ext1
rows 1-3: ext2..ext4
rows 4-10: 7 × (3 packed internal rounds)
row 11: int22 + ext5
rows 12-14: ext6..ext8
row 15: boundary / final state

Packed internal rows use s0/s1/s2 as witness columns on permutation rows in order to keep constraints degree bounded. Unused witness slots are explicitly zero-constrained (out of caution) though this could be relaxed.

Column layout

Hasher: 16 -> 20

s0 s1 s2 | h0..h11 | node_index | mrupdate_id | is_boundary | direction_bit | perm_seg
   3          12          1             1             1              1             1      = 20

New / newly significant columns:

mrupdate_id -- domain separator for sibling-table entries
is_boundary -- marks first controller input / last controller output
direction_bit -- propagated Merkle routing bit on controller rows
perm_seg -- explicit controller vs permutation-region flag

Memory: 15 -> 17

Two new columns:

w0
w1

These decompose the word address into 16-bit limbs. The wiring bus carries the corresponding range-check lookups.

Constraints

Hasher constraints now total 100.

Constraint group breakdown

Group	Count	Purpose
Selector booleanity	3	`s0,s1,s2` binary on controller rows
Perm segment	7	`perm_seg` confinement, booleanity, monotonicity, cycle alignment
Structural	7	Confine `is_boundary` / `direction_bit` to valid row types
Lifecycle	2	Operation lifecycle invariants
Controller adjacency	2	Input row must be followed by output row
Controller pairing	4	First-row constraint, output non-adjacency, padding stability
Perm witness-shape	3	Zero witness slots when unused
Perm init+ext	12	Row 0 packed transition
Perm external	12	External-round transitions
Perm packed internal	15	3 witness checks + 12 next-state constraints
Perm int+ext	13	1 witness check + 12 next-state constraints
MRUPDATE ID	2	Increment / zero-on-perm rules
Sponge capacity	4	Preserve capacity across continuations
Output index	1	Output-row `node_index` rule
Merkle index	4	Index decomposition / continuity / direction bit
Merkle input state	4	Zero capacity on Merkle input rows
Merkle routing	5	Route digest into correct rate half
Total	100

Trace width impact

Chiplet	Before	After	Delta
Hasher	16	20	+4
Memory	15	17	+2
Net main trace impact			+1

The new main trace width is 72

No new auxiliary columns were added:

the permutation-link bus shares V_WIRING
memory address range checks also use the existing wiring-bus path

Al-Kindi-0 · 2026-03-27T15:18:53Z

To compare against the numbers in #2869 for the recursive verifier (verifying a program executing in 2^20 cycles)

  ┌────────────────────────────┬──────────────┬──────────────┬───────────┬──────────┐
  │         Component          │   Old        │   New.       │  Change   │ Savings  │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Core trace (decoder+stack) │ 41,652       │ 41,516       │ -136      │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Range checker              │ 5,129        │ 5,217        │ +88       │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Chiplets total             │ 273,769      │ 118,657      │ -155,112  │ -57%     │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - Hasher                   │ 250,816      │ 96,256       │ -154,560  │ -62%     │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - Bitwise                  │ 3,104        │ 3,104        │ 0         │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - Memory                   │ 13,758       │ 13,406       │ -352      │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - ACE                      │ 6,090        │ 5,890        │ -200      │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - Kernel ROM               │ 0            │ 0            │ 0         │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Padded trace length        │524,288 (2^19)│131,072 (2^17)│           │ -4x      │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Padding                    │ 47%          │ 9%           │           │          │
  └────────────────────────────┴──────────────┴──────────────┴───────────┴──────────┘

This is a 4x improvement in the above case.
(Note that the changes in the decoder+stack number of rows is due to a change in the constraints which affects ACE circuit loading)

huitseeker · 2026-03-30T13:14:36Z

air/src/constraints/chiplets/bus/wiring.rs

+    let w1: AB::Expr = local.chiplets[MEMORY_WORD_ADDR_HI_COL_IDX - CHIPLETS_OFFSET].clone().into();
+    let w1_mul4: AB::Expr = w1.clone() * AB::Expr::from_u16(4);
+
+    let den0: AB::ExprEF = alpha.clone() + Into::<AB::ExprEF>::into(w0);


Should this add protocol-level domain separation before v_wiring can safely carry ACE wires, raw memory range-check values, and the new hasher perm-link messages together? Right now the memory side uses plain alpha + w0/w1/4*w1, ACE uses encode([clk, ctx, id, ...]), and the perm-link uses encode([0|1, h0..h11]) on the same LogUp column.

If any of those encodings was to alias, could one subsystem cancel another on the shared sum? #1614 explicitly called out adding an op-label when reusing the wiring bus, and I don't see that namespace implemented here yet.

Nashtare

I would need to do another pass because this is pretty dense, but left a couple commetns while familiarizing myself with it

Nashtare · 2026-04-01T04:25:15Z

processor/src/trace/chiplets/aux_trace/hasher_perm.rs

+        let hs0 = main_trace.chiplet_selector_1(row);
+        let hs1 = main_trace.chiplet_selector_2(row);


nit: the indexing shift is a bit confusing when looking below, maybe renamed to

Suggested change

let hs0 = main_trace.chiplet_selector_1(row);

let hs1 = main_trace.chiplet_selector_2(row);

let hs1 = main_trace.chiplet_selector_1(row);

let hs2 = main_trace.chiplet_selector_2(row);

with according updates later in the code would be clearer?

processor/src/trace/chiplets/aux_trace/hasher_perm.rs

Nashtare · 2026-04-01T04:36:34Z

processor/src/trace/chiplets/aux_trace/hasher_perm.rs

+/// TODO: These naive labels (0 and 1) risk collisions with other messages on the shared
+/// v_wiring column (ACE wiring and memory range checks). Revisit when refactoring the buses.
+const LABEL_IN: Felt = Felt::ZERO;
+const LABEL_OUT: Felt = Felt::ONE;


See my comment below in this file, but I was just wondering if we should not treat this now rather than deferring it?

Nashtare · 2026-04-01T04:37:41Z

processor/src/trace/chiplets/aux_trace/hasher_perm.rs

+            }
+        } else {
+            // Permutation segment.
+            // This works because the hasher is always the first chiplet (rows start at 0)


minor: do we have an invariant check that this is actually the case? Just out of precaution

I’m pretty sure we always have 1 hash, and the chiplet selectors make sure that it comes before any other chiplet

Nashtare · 2026-04-01T11:49:13Z

processor/src/trace/chiplets/hasher/mod.rs

+    /// Maps input state -> multiplicity for permutation deduplication.
+    /// During finalize_trace(), one 16-row perm cycle is emitted per entry.
+    perm_request_map: BTreeMap<StateKey, u64>,


nit: does ordering matter? I think perf wise a HashMap may be more efficient here

I think this is mainly because of no-std, but I don't think the ordering matters.

HashMap can be pulled from alloc / hashbrown though (if perf matters here)

Al-Kindi-0 added 10 commits March 27, 2026 19:15

wip: all tests passing but ace-codegen needs updating

c6b1e95

trim down ace-codegen crate

a3c2450

minor fixes and cleanups

055a394

address feedback

ac59e22

wip: all tests passing but ace-codegen needs updating

713c132

wip: initial simple design

3c6d224

wip: packing permutation rows

589a760

Improve and harden constraints and their description

1ec225c

minor updates to hasher.md

3be55b5

fix post rebase issues

77773d9

Al-Kindi-0 force-pushed the al-hasher-chiplet-redesign branch from da140ac to 77773d9 Compare March 27, 2026 15:19

Al-Kindi-0 changed the title ~~Al hasher chiplet redesign~~ Hasher chiplet redesign Mar 27, 2026

adr1anh self-requested a review March 28, 2026 09:28

adr1anh mentioned this pull request Mar 29, 2026

fix(air): enforce word_addr ∈ [0, 2³²) in the memory chiplet AIR (soundness fix) #2935

Closed

12 tasks

Al-Kindi-0 added 2 commits March 30, 2026 16:16

add sum propagating constraint w_bus

0569c70

update design docs

15dae86

huitseeker reviewed Mar 30, 2026

View reviewed changes

Nashtare reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hasher chiplet redesign#2927

Hasher chiplet redesign#2927
Al-Kindi-0 wants to merge 12 commits intonextfrom
al-hasher-chiplet-redesign

Al-Kindi-0 commented Mar 27, 2026

Uh oh!

Al-Kindi-0 commented Mar 27, 2026

Uh oh!

huitseeker Mar 30, 2026

Uh oh!

Nashtare left a comment

Uh oh!

Nashtare Apr 1, 2026

Uh oh!

Uh oh!

Nashtare Apr 1, 2026

Uh oh!

Nashtare Apr 1, 2026

Uh oh!

adr1anh Apr 1, 2026

Uh oh!

Nashtare Apr 1, 2026

Uh oh!

adr1anh Apr 1, 2026

Uh oh!

Nashtare Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		let hs0 = main_trace.chiplet_selector_1(row);
		let hs1 = main_trace.chiplet_selector_2(row);

Conversation

Al-Kindi-0 commented Mar 27, 2026

Why

1. Deduplicate repeated permutations

2. Fix sibling-table soundness

3. Add memory address decomposition checks

Design

Hasher: two-region trace

Packed 16-row Poseidon2 schedule

Column layout

Hasher: 16 -> 20

Memory: 15 -> 17

Constraints

Trace width impact

Uh oh!

Al-Kindi-0 commented Mar 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nashtare left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants