Skip to content

Conversation

@Schaeff
Copy link
Collaborator

@Schaeff Schaeff commented Nov 19, 2025

Enable the prover to fall back to software after apc trace generation for rows which do not satisfy the constraints.

Related:

TODO:

  • detect rejected rows: this seems to require checking all constraints on all rows, which seems like a lot. Ideally, the apc comes with a simple expression which can be evaluated on the final row to know if the row should be rejected. This expression could be the conjunction of the specialization constraints added during apc generation.
  • decide if we are happy with the changes to stark-backend: now it's clean, as program tracegen happens last based on the final frequencies instead of hacking into the final trace
  • support GPU

pub struct OriginalRowReference<'a, D> {
pub struct OriginalRowReference<'a, D, I> {
pub air_id: &'a I,
pub row_index: usize,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this because when an apc row fails tracegen, we need to know which original rows to add to the software tables.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original rows aka rejected rows of the APC dummy traces?

Software tables aka non-APC traces?

pub air_id: &'a I,
pub row_index: usize,
pub data: &'a D,
pub start: usize,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this because it can be derived from the rest

.zip_eq(original_instruction_table_offsets.iter())
.map(|(air_id, dummy_table_offset)| {
let trace = air_id_to_dummy_trace.get(air_id).unwrap();
let (air_id, trace) = air_id_to_dummy_trace.get_key_value(air_id).unwrap();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting case here, air_id has the same value, but a different lifetime! We need the one with the longer lifetime.

) -> DenseMatrix<BabyBear> {
) -> (
DenseMatrix<BabyBear>,
HashMap<String, (Vec<usize>, Arc<DenseMatrix<BabyBear>>)>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each air name, a trace and the rows of that trace which were rejected

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice :)

@Schaeff Schaeff force-pushed the fallback-to-software branch from ead7e15 to 82ddadb Compare November 26, 2025 14:59
Comment on lines -105 to +123
return RowMajorMatrix::new(vec![], width);
return AirProvingContext::simple_no_pis(Arc::new(RowMajorMatrix::new(vec![], width)))
.into();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default case with empty Rejected.

Comment on lines +190 to +191
// Just for testing, reject the first call of each apc
.zip(once(false).chain(repeat(true)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the actual case once optimistic APC is ready, how would we inject/compute the data on which APC call is valid?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbd, you could check all constraints but that seems like a lot? hopefully we can check only the specialization constraints, the ones that are added by optimistic apc and remove completeness

Comment on lines -172 to +198
.map(|r| &r.data[r.start..r.start + r.length])
.map(|r| &r.data[r.start()..r.start() + r.length])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Length is the width of the original AIR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines +237 to +241
// set the whole row to zero
// TODO: this generates a gap in the table. Instead, reuse the row in the next iteration.
for cell in row_slice {
*cell = BabyBear::ZERO;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this still passes the test?

Is it because it's essentially like the padding row of zeros, just that it appears in the middle?

Works via setting all columns, including is_valid, to zero?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's just slower than it should be in the proving for now but at least is correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero rows are always a valid witness in ovm. we could also just set is_valid to zero here. Ideally we pack the rows together and truncate the table if we allocated too much. This can wait imo.

Comment on lines +253 to +254
// replay the side effects of this row on the real periphery
self.periphery.replay_bus_interactions(machine, &evaluator);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so because APC is rejected, the dummy trace becomes real trace and we need to replay (or really put back) their bus interactions to the real periphery.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which makes me wonder if it's possible to "selectively" play the side effects on the dummy vs real periphery depending on whether the APC row is rejected, so the question remains when and where will this information be injected.

Afaik, we might already know this information before running APC trace gen, because empirical constraints are computed from original traces run during program compilation, and are applied to APC before optimizations, so technically at APC compile time.

However, we might only know these PGO information for training data Ethereum blocks, so on actual APC execution, we only trace gen once (instead of once at APC compilation and once at actual trace gen)?

Copy link
Collaborator Author

@Schaeff Schaeff Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in the general case, we only know if a row is rejected after full row tracegen, as the specialization can add a constraint on any of the columns. I don't see how to do better than this here.

// go through the final table and fill in the values
values
// a record is `width` values
// TODO: optimize by parallelizing on chunks of rows, currently fails because `dyn AnyChip<MatrixRecordArena<Val<SC>>>` is not `Send`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR but should we fix this eventually, or we care more about GPU now and use this as a PoC?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried fixing it but something was still failing. We should fix it yes, but unrelated.

Comment on lines +256 to +267
// find the concrete value of the received pc
rejected_pcs.push(
machine
.bus_interactions
.iter()
.find_map(|interaction| {
let ConcreteBusInteraction { id, mut args, .. } =
evaluator.eval_bus_interaction(interaction);
(id == 2).then(|| args.next().unwrap())
})
.unwrap()
.as_canonical_u32(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a next PR optimization but it looks like here we evaluate all bus interactions again after doing so in replay_bus_interactions, so I wonder if we can create a helper that pass in evaluated bus interactions and then reuse them here.

Copy link
Collaborator

@qwang98 qwang98 Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, can we somehow get this PC:

  • From the APC, because I'm assuming it's just the start_idx? OR
  • From the dummy trace of the first original instruction. I think it should always happen for a fixed column index, either 0 or 1, as I remember Collect empirical constraints #3461 using a similar "hack"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another thought, because we are now still within the per APC row loop, and PC shouldn't change across APC executions, does it mean that we technically know this set of rejected_pcs at APC compile time, which is basically (0..apc.statements.len()).map(|idx| apc.start_idx + idx * 4)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I had the same thought, we could cache some stuff here. We shouldnt assume contiguous pcs though as we may merge some blocks. But we could cache the bus interaction which receives the pc instead of going through all interactions.

// replay the side effects of this row on the real periphery
self.periphery.replay_bus_interactions(machine, &evaluator);

// find the concrete value of the received pc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo. received -> rejected?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

received as in received from the program bus by this software row

Comment on lines +271 to +274
rejected_rows_per_air
.get_mut(original_row_reference.air_id)
.unwrap()
.push(original_row_reference.row_index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a next PR but if we care about speed, this technically can also be computed outside of the APC row loop, because:

  1. row_index is originally computed as let row_index = trace_row * occurrences_per_record + dummy_table_offset;
  2. We know the vector of rejected APC indices, which are trace_row here.
  3. occurrence_per_record and dummy_table_offset is also already computed by the trace generator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think we should benchmark before going into more things like this, as the rejected path is supposed not to happen too often, so maybe it's fine for it not to be too fast.

RowMajorMatrix::new(values, width)
// merge the rejected indices with the traces
let rejected = Rejected {
pcs: rejected_pcs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically all PCs of the APC?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the original instructions which were rejected, and how many times for each.

Comment on lines +288 to +291
let original_trace = dummy_trace_by_air_name.remove(&name).unwrap().matrix;
(name, (original_trace, indices))
})
.collect(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these will be shipped to the real trace! :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this gets shipped to software

Copy link
Collaborator

@qwang98 qwang98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I left some potential optimization ideas, but probably mostly for another PR.

I see how the APC trace with zero rows in the middle PLUS rejected dummy traces are created, so I guess the next step probably happens in OVM, where we merge the rejected dummy traces of the original AIR with the non-APC traces of those AIRs.

I also wonder how rejected PCs are used :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants