Conversation
|
@adr1anh I just saw your upstream Plonky3/Plonky3#1452 PR, overall savings seem much more interesting focusing on heap-allocation so perhaps not worth considering this local PR and apply instead a similar approach to the upstream one. |
|
One of the goals of Plonky3/Plonky3#1452 was indeed to improve the memory pressure by avoiding the Vec allocation for each row. However, I think @Al-Kindi-0 came up with this when he started testing the Miden VM constraints. I'll let him chime in with more details, but I think it was either due to the interleaving of extension field constraints, but maybe more because the initial optimized |
huitseeker
left a comment
There was a problem hiding this comment.
RE: @adr1anh 's comment: this was part of https://github.com/0xMiden/p3-miden/compare/al-optimize-constraints-eval, specifically e990a00, which AFAICT is unmerged.
So while this LGTM, both of you @Nashtare and @Al-Kindi-0 working on make me interpret this PR's perf delta as evidence that the Miden VM workload stresses the older design in a bad way, not necessarily as proof that “fold on the fly” is the only right long-term fix.
| base_alpha_powers: &base_alpha_powers, | ||
| ext_alpha_powers: &ext_alpha_powers, | ||
| constraint_index: 0, | ||
| base_acc: Default::default(), |
There was a problem hiding this comment.
Nit: I think this could start from PE::ZERO instead of Default::default(). PE already comes from Algebra, so zero is part of the contract here, while Default is only a convention on the current packed types.
There was a problem hiding this comment.
I would favor the solution in this PR for its simplicity unless the more involved solution with buffers provides a clear and consistent advantage.
Al-Kindi-0
left a comment
There was a problem hiding this comment.
LGTM
As mentioned in the other comment, unless we are gaining consistently, I would just go with the current simplification
huitseeker
left a comment
There was a problem hiding this comment.
Works for me, thanks @Nashtare
|
Agreed, I'm still investigating this on the plonky3-side, and if there's any meaningful change it would be easier to implement here. I would add this override that was lost on the way which should improve extension field constraints a bit. |
934fb35 to
7329a02
Compare
|
@adr1anh I've added the |
| for (j, x) in array.into_iter().enumerate() { | ||
| let val: P = x.into(); | ||
| let term = PE::from_basis_coefficients_fn(|d| val * self.base_alpha_powers[d][idx + j]); | ||
| delta += term; | ||
| } | ||
| self.base_acc += delta; |
There was a problem hiding this comment.
Can we do a packed linear combination here?
There was a problem hiding this comment.
Ah right good point!
|
I do really like the changes, but I was wondering if we could try benchmarking the VM against this branch. It might be a bit annoying due to the braking changes to the lmcs, but such a setup would also help us figure out if Plonky3/Plonky3#1452 can be of any use on our side. I was also wondering if we would be able to get some benefits from the current (pre-PR) approach by caching the constraint accumulation vectors as was recently done on plonky3. What would you think about keeping this one on ice until we get some kind of benchmarking setup going? |
|
Sure let's hold off! |
Summary
Tweak constraint folding to perform folding on the fly. The
finalize_constraints()method then basically just adds both accumulators, for base constraints (lifted to PE) and extension constraints.This seems to yield about ~4% improvement on my M4 pro when running the
lifted_midenexample for theeval_instancedebug span.