-
Notifications
You must be signed in to change notification settings - Fork 121
Fall back to software after trace generation #3449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| pub struct OriginalRowReference<'a, D> { | ||
| pub struct OriginalRowReference<'a, D, I> { | ||
| pub air_id: &'a I, | ||
| pub row_index: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this because when an apc row fails tracegen, we need to know which original rows to add to the software tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original rows aka rejected rows of the APC dummy traces?
Software tables aka non-APC traces?
| pub air_id: &'a I, | ||
| pub row_index: usize, | ||
| pub data: &'a D, | ||
| pub start: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this because it can be derived from the rest
| .zip_eq(original_instruction_table_offsets.iter()) | ||
| .map(|(air_id, dummy_table_offset)| { | ||
| let trace = air_id_to_dummy_trace.get(air_id).unwrap(); | ||
| let (air_id, trace) = air_id_to_dummy_trace.get_key_value(air_id).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting case here, air_id has the same value, but a different lifetime! We need the one with the longer lifetime.
| ) -> DenseMatrix<BabyBear> { | ||
| ) -> ( | ||
| DenseMatrix<BabyBear>, | ||
| HashMap<String, (Vec<usize>, Arc<DenseMatrix<BabyBear>>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each air name, a trace and the rows of that trace which were rejected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice :)
ead7e15 to
82ddadb
Compare
| return RowMajorMatrix::new(vec![], width); | ||
| return AirProvingContext::simple_no_pis(Arc::new(RowMajorMatrix::new(vec![], width))) | ||
| .into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default case with empty Rejected.
| // Just for testing, reject the first call of each apc | ||
| .zip(once(false).chain(repeat(true))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the actual case once optimistic APC is ready, how would we inject/compute the data on which APC call is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbd, you could check all constraints but that seems like a lot? hopefully we can check only the specialization constraints, the ones that are added by optimistic apc and remove completeness
| .map(|r| &r.data[r.start..r.start + r.length]) | ||
| .map(|r| &r.data[r.start()..r.start() + r.length]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Length is the width of the original AIR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
| // set the whole row to zero | ||
| // TODO: this generates a gap in the table. Instead, reuse the row in the next iteration. | ||
| for cell in row_slice { | ||
| *cell = BabyBear::ZERO; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this still passes the test?
Is it because it's essentially like the padding row of zeros, just that it appears in the middle?
Works via setting all columns, including is_valid, to zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's just slower than it should be in the proving for now but at least is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zero rows are always a valid witness in ovm. we could also just set is_valid to zero here. Ideally we pack the rows together and truncate the table if we allocated too much. This can wait imo.
| // replay the side effects of this row on the real periphery | ||
| self.periphery.replay_bus_interactions(machine, &evaluator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so because APC is rejected, the dummy trace becomes real trace and we need to replay (or really put back) their bus interactions to the real periphery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which makes me wonder if it's possible to "selectively" play the side effects on the dummy vs real periphery depending on whether the APC row is rejected, so the question remains when and where will this information be injected.
Afaik, we might already know this information before running APC trace gen, because empirical constraints are computed from original traces run during program compilation, and are applied to APC before optimizations, so technically at APC compile time.
However, we might only know these PGO information for training data Ethereum blocks, so on actual APC execution, we only trace gen once (instead of once at APC compilation and once at actual trace gen)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in the general case, we only know if a row is rejected after full row tracegen, as the specialization can add a constraint on any of the columns. I don't see how to do better than this here.
| // go through the final table and fill in the values | ||
| values | ||
| // a record is `width` values | ||
| // TODO: optimize by parallelizing on chunks of rows, currently fails because `dyn AnyChip<MatrixRecordArena<Val<SC>>>` is not `Send` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR but should we fix this eventually, or we care more about GPU now and use this as a PoC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried fixing it but something was still failing. We should fix it yes, but unrelated.
| // find the concrete value of the received pc | ||
| rejected_pcs.push( | ||
| machine | ||
| .bus_interactions | ||
| .iter() | ||
| .find_map(|interaction| { | ||
| let ConcreteBusInteraction { id, mut args, .. } = | ||
| evaluator.eval_bus_interaction(interaction); | ||
| (id == 2).then(|| args.next().unwrap()) | ||
| }) | ||
| .unwrap() | ||
| .as_canonical_u32(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a next PR optimization but it looks like here we evaluate all bus interactions again after doing so in replay_bus_interactions, so I wonder if we can create a helper that pass in evaluated bus interactions and then reuse them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, can we somehow get this PC:
- From the APC, because I'm assuming it's just the
start_idx? OR - From the dummy trace of the first original instruction. I think it should always happen for a fixed column index, either 0 or 1, as I remember Collect empirical constraints #3461 using a similar "hack"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On another thought, because we are now still within the per APC row loop, and PC shouldn't change across APC executions, does it mean that we technically know this set of rejected_pcs at APC compile time, which is basically (0..apc.statements.len()).map(|idx| apc.start_idx + idx * 4)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I had the same thought, we could cache some stuff here. We shouldnt assume contiguous pcs though as we may merge some blocks. But we could cache the bus interaction which receives the pc instead of going through all interactions.
| // replay the side effects of this row on the real periphery | ||
| self.periphery.replay_bus_interactions(machine, &evaluator); | ||
|
|
||
| // find the concrete value of the received pc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo. received -> rejected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
received as in received from the program bus by this software row
| rejected_rows_per_air | ||
| .get_mut(original_row_reference.air_id) | ||
| .unwrap() | ||
| .push(original_row_reference.row_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a next PR but if we care about speed, this technically can also be computed outside of the APC row loop, because:
row_indexis originally computed aslet row_index = trace_row * occurrences_per_record + dummy_table_offset;- We know the vector of rejected APC indices, which are
trace_rowhere. occurrence_per_recordanddummy_table_offsetis also already computed by the trace generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think we should benchmark before going into more things like this, as the rejected path is supposed not to happen too often, so maybe it's fine for it not to be too fast.
| RowMajorMatrix::new(values, width) | ||
| // merge the rejected indices with the traces | ||
| let rejected = Rejected { | ||
| pcs: rejected_pcs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically all PCs of the APC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of the original instructions which were rejected, and how many times for each.
| let original_trace = dummy_trace_by_air_name.remove(&name).unwrap().matrix; | ||
| (name, (original_trace, indices)) | ||
| }) | ||
| .collect(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these will be shipped to the real trace! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep this gets shipped to software
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I left some potential optimization ideas, but probably mostly for another PR.
I see how the APC trace with zero rows in the middle PLUS rejected dummy traces are created, so I guess the next step probably happens in OVM, where we merge the rejected dummy traces of the original AIR with the non-APC traces of those AIRs.
I also wonder how rejected PCs are used :)
Enable the prover to fall back to software after apc trace generation for rows which do not satisfy the constraints.
Related:
TODO: