Fall back to software after trace generation #3449

Schaeff · 2025-11-19T12:09:39Z

Enable the prover to fall back to software after apc trace generation for rows which do not satisfy the constraints.

detect rejected rows: this seems to require checking all constraints on all rows, which seems like a lot. Ideally, the apc comes with a simple expression which can be evaluated on the final row to know if the row should be rejected. This expression could be the conjunction of the specialization constraints added during apc generation.
decide if we are happy with the changes to stark-backend: now it's clean, as program tracegen happens last based on the final frequencies instead of hacking into the final trace
support GPU

Schaeff · 2025-11-20T13:12:10Z

autoprecompiles/src/trace_handler.rs

-pub struct OriginalRowReference<'a, D> {
+pub struct OriginalRowReference<'a, D, I> {
+    pub air_id: &'a I,
+    pub row_index: usize,


Added this because when an apc row fails tracegen, we need to know which original rows to add to the software tables.

Original rows aka rejected rows of the APC dummy traces?

Software tables aka non-APC traces?

Schaeff · 2025-11-20T13:12:23Z

autoprecompiles/src/trace_handler.rs

+    pub air_id: &'a I,
+    pub row_index: usize,
    pub data: &'a D,
-    pub start: usize,


Removed this because it can be derived from the rest

Schaeff · 2025-11-20T13:13:03Z

autoprecompiles/src/trace_handler.rs

                .zip_eq(original_instruction_table_offsets.iter())
                .map(|(air_id, dummy_table_offset)| {
-                    let trace = air_id_to_dummy_trace.get(air_id).unwrap();
+                    let (air_id, trace) = air_id_to_dummy_trace.get_key_value(air_id).unwrap();


Interesting case here, air_id has the same value, but a different lifetime! We need the one with the longer lifetime.

Schaeff · 2025-11-20T13:14:54Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

-    ) -> DenseMatrix<BabyBear> {
+    ) -> (
+        DenseMatrix<BabyBear>,
+        HashMap<String, (Vec<usize>, Arc<DenseMatrix<BabyBear>>)>,


For each air name, a trace and the rows of that trace which were rejected

Very nice :)

qwang98 · 2025-11-28T10:28:28Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

-            return RowMajorMatrix::new(vec![], width);
+            return AirProvingContext::simple_no_pis(Arc::new(RowMajorMatrix::new(vec![], width)))
+                .into();


Default case with empty Rejected.

qwang98 · 2025-11-28T10:34:45Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

+            // Just for testing, reject the first call of each apc
+            .zip(once(false).chain(repeat(true)))


In the actual case once optimistic APC is ready, how would we inject/compute the data on which APC call is valid?

tbd, you could check all constraints but that seems like a lot? hopefully we can check only the specialization constraints, the ones that are added by optimistic apc and remove completeness

qwang98 · 2025-11-28T10:38:38Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

-                    .map(|r| &r.data[r.start..r.start + r.length])
+                    .map(|r| &r.data[r.start()..r.start() + r.length])


Length is the width of the original AIR?

qwang98 · 2025-11-28T10:41:13Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

+                    // set the whole row to zero
+                    // TODO: this generates a gap in the table. Instead, reuse the row in the next iteration.
+                    for cell in row_slice {
+                        *cell = BabyBear::ZERO;
+                    }


I wonder why this still passes the test?

Is it because it's essentially like the padding row of zeros, just that it appears in the middle?

Works via setting all columns, including is_valid, to zero?

So it's just slower than it should be in the proving for now but at least is correct?

zero rows are always a valid witness in ovm. we could also just set is_valid to zero here. Ideally we pack the rows together and truncate the table if we allocated too much. This can wait imo.

qwang98 · 2025-11-28T10:47:14Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

+                        // replay the side effects of this row on the real periphery
+                        self.periphery.replay_bus_interactions(machine, &evaluator);


Ok, so because APC is rejected, the dummy trace becomes real trace and we need to replay (or really put back) their bus interactions to the real periphery.

Which makes me wonder if it's possible to "selectively" play the side effects on the dummy vs real periphery depending on whether the APC row is rejected, so the question remains when and where will this information be injected.

Afaik, we might already know this information before running APC trace gen, because empirical constraints are computed from original traces run during program compilation, and are applied to APC before optimizations, so technically at APC compile time.

However, we might only know these PGO information for training data Ethereum blocks, so on actual APC execution, we only trace gen once (instead of once at APC compilation and once at actual trace gen)?

Yeah in the general case, we only know if a row is rejected after full row tracegen, as the specialization can add a constraint on any of the columns. I don't see how to do better than this here.

qwang98 · 2025-11-28T10:58:02Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

        // go through the final table and fill in the values
        values
            // a record is `width` values
            // TODO: optimize by parallelizing on chunks of rows, currently fails because `dyn AnyChip<MatrixRecordArena<Val<SC>>>` is not `Send`


Not related to this PR but should we fix this eventually, or we care more about GPU now and use this as a PoC?

I tried fixing it but something was still failing. We should fix it yes, but unrelated.

qwang98 · 2025-11-28T11:01:18Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

+                        // find the concrete value of the received pc
+                        rejected_pcs.push(
+                            machine
+                                .bus_interactions
+                                .iter()
+                                .find_map(|interaction| {
+                                    let ConcreteBusInteraction { id, mut args, .. } =
+                                        evaluator.eval_bus_interaction(interaction);
+                                    (id == 2).then(|| args.next().unwrap())
+                                })
+                                .unwrap()
+                                .as_canonical_u32(),


Probably a next PR optimization but it looks like here we evaluate all bus interactions again after doing so in replay_bus_interactions, so I wonder if we can create a helper that pass in evaluated bus interactions and then reuse them here.

Alternatively, can we somehow get this PC:

From the APC, because I'm assuming it's just the start_idx? OR

From the dummy trace of the first original instruction. I think it should always happen for a fixed column index, either 0 or 1, as I remember Collect empirical constraints #3461 using a similar "hack"?

On another thought, because we are now still within the per APC row loop, and PC shouldn't change across APC executions, does it mean that we technically know this set of rejected_pcs at APC compile time, which is basically (0..apc.statements.len()).map(|idx| apc.start_idx + idx * 4)?

Yes I had the same thought, we could cache some stuff here. We shouldnt assume contiguous pcs though as we may merge some blocks. But we could cache the bus interaction which receives the pc instead of going through all interactions.

qwang98 · 2025-11-28T11:13:00Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

+                        // replay the side effects of this row on the real periphery
+                        self.periphery.replay_bus_interactions(machine, &evaluator);
+
+                        // find the concrete value of the received pc


Typo. received -> rejected?

received as in received from the program bus by this software row

qwang98 · 2025-11-28T11:16:14Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

+                        rejected_rows_per_air
+                            .get_mut(original_row_reference.air_id)
+                            .unwrap()
+                            .push(original_row_reference.row_index);


Probably a next PR but if we care about speed, this technically can also be computed outside of the APC row loop, because:

row_index is originally computed as let row_index = trace_row * occurrences_per_record + dummy_table_offset;

We know the vector of rejected APC indices, which are trace_row here.

occurrence_per_record and dummy_table_offset is also already computed by the trace generator.

Right, I think we should benchmark before going into more things like this, as the rejected path is supposed not to happen too often, so maybe it's fine for it not to be too fast.

qwang98 · 2025-11-28T11:17:37Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

-        RowMajorMatrix::new(values, width)
+        // merge the rejected indices with the traces
+        let rejected = Rejected {
+            pcs: rejected_pcs,


This is basically all PCs of the APC?

Of the original instructions which were rejected, and how many times for each.

qwang98 · 2025-11-28T11:18:44Z

openvm/src/powdr_extension/trace_generator/cpu/mod.rs

+                    let original_trace = dummy_trace_by_air_name.remove(&name).unwrap().matrix;
+                    (name, (original_trace, indices))
+                })
+                .collect(),


Looks like these will be shipped to the real trace! :)

yep this gets shipped to software

qwang98

Very nice! I left some potential optimization ideas, but probably mostly for another PR.

I see how the APC trace with zero rows in the middle PLUS rejected dummy traces are created, so I guess the next step probably happens in OVM, where we merge the rejected dummy traces of the original AIR with the non-APC traces of those AIRs.

I also wonder how rejected PCs are used :)

Schaeff and others added 4 commits November 19, 2025 13:07

wip

3717ee9

wip

0c595a1

add original row indices to rejected sets

3b751da

simplify apc tracegen

483e5a5

Schaeff commented Nov 20, 2025

View reviewed changes

Schaeff added 3 commits November 25, 2025 20:04

clean up

ab93fee

clean up

f8a14e7

deps

82ddadb

Schaeff force-pushed the fallback-to-software branch from ead7e15 to 82ddadb Compare November 26, 2025 14:59

Schaeff added 3 commits November 26, 2025 16:49

update reth

ff54de8

fix k256

6c0b48d

simplify

5b7ba91

qwang98 reviewed Nov 28, 2025

View reviewed changes

		// Just for testing, reject the first call of each apc
		.zip(once(false).chain(repeat(true)))

		.map(\|r\| &r.data[r.start..r.start + r.length])
		.map(\|r\| &r.data[r.start()..r.start() + r.length])

		// replay the side effects of this row on the real periphery
		self.periphery.replay_bus_interactions(machine, &evaluator);

Fall back to software after trace generation #3449

Are you sure you want to change the base?

Fall back to software after trace generation #3449

Uh oh!

Conversation

Schaeff commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Schaeff Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qwang98 Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qwang98 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Schaeff commented Nov 19, 2025 •

edited

Loading

Schaeff Nov 28, 2025 •

edited

Loading

qwang98 Nov 28, 2025 •

edited

Loading

qwang98 left a comment •

edited

Loading