[Perf] Only calculate the hash of circuit commitments once for the VK#2964
[Perf] Only calculate the hash of circuit commitments once for the VK#2964ljedrz wants to merge 2 commits intoProvableHQ:stagingfrom
Conversation
…cuitVerifyingKey Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
|
|
||
| let mut pks = Vec::with_capacity(circuit_batch_size); | ||
| let mut all_circuits = Vec::with_capacity(circuit_batch_size); | ||
| #[allow(clippy::mutable_key_type)] |
There was a problem hiding this comment.
note: this is needed, because the CircuitVerifyingKey has interior mutability now; however, in its case it is perfectly fine, as the new member has no impact on the Ord impl (which only considers the id), and the fact that the key implements it is the reason why the warning is raised; see the corresponding clippy lint link.
| /// Commitments to the indexed polynomials. | ||
| pub circuit_commitments: Vec<sonic_pc::Commitment<E>>, | ||
| pub id: CircuitId, | ||
| pub circuit_commitments_hash: OnceLock<E::Fq>, |
There was a problem hiding this comment.
I think this is expensive enough that we should store it to disk - and it would be great if we can get rid of the OnceLock (which obfuscates when initialization happens, making performance analysis harder). O:)
You correctly observed that we don't have to transmit it over the wire though.
There was a problem hiding this comment.
The issue was that at the moment of creation, the fs_params is not available, and at later stages the VKs are immutable - hence the OnceLock.
Storing to the disk would probably work around this, but retrieving it would be expensive, perhaps to the point of offsetting any performance gains from caching it, unless the hashing is really computationally expensive.
There was a problem hiding this comment.
Storing to the disk would probably work around this, but retrieving it would be expensive, perhaps to the point of offsetting any performance gains from caching it, unless the hashing is really computationally expensive.
To be clear, we would retrieve it from disk only when we retrieve the VK from disk. And the hashes are very expensive.
However, a big downside I do see is the sheer amount of work to adjust the database logic. So for the first version you can also compute it during construction.
The issue was that at the moment of creation, the fs_params is not available
Looks to me it's always available in N::varuna_fs_parameters() ? Or is there a scoping issue? Have fun with that :")
There was a problem hiding this comment.
Looks to me it's always available in N::varuna_fs_parameters()?
That's correct; however, there is no notion of the Network - or even snarkvm-console - in algorithms. Would it be acceptable to alter SNARK::circuit_setup to require FSParameters, like the other VarunaSNARK functions do?
There was a problem hiding this comment.
However, a big downside I do see is the sheer amount of work to adjust the database logic. So for the first version you can also compute it during construction.
For my future self reading this: we should just store to disk, but we can write out that logic when we have a definite timeline for landing this feature.
|
Can you report the performance improvements for the existing
|
| fs_parameters: &FS::Parameters, | ||
| inputs_and_batch_sizes: &BTreeMap<CircuitId, (usize, &[Vec<E::Fr>])>, | ||
| circuit_commitments: impl Iterator<Item = &'a [crate::polycommit::sonic_pc::Commitment<E>]>, | ||
| circuit_commitments_hashes: Vec<E::Fq>, |
There was a problem hiding this comment.
Can you make a 3rd VarunaVersion, and guard the changes by it to preserve backwards compatibility? You can peek at where we use VarunaVersion::V2 for inspiration.
|
|
Thank you for the benchmarks, looks like I made a mistake... This PR is only consequential when we have a very large amount of different circuits - but in the current system and even with dynamic dispatch we're never planning to have more than 32 circuits to batch - and the most common case is just 3 circuits. Can you do one more benchmark with circuit_size: 3, instance_size: 10, before and after this PR? If the difference is inconsequential we can close this PR and stop this line of work. |
|
sure: |
Could you maybe run this comparison again? Perhaps even try instance_size 4000? If it is indeed a negligible difference, could you reproduce the single-threaded flamegraph I made to revisit whether initi_sponge hashing is really a bottleneck? staging...perf/verify_batch#diff-9ae02456754b316d993bd81a0e311bff5e432156b69d28b867d5c98027d0b79bR137 |
|
Done; it does show that |
|
TIL again that dynamic dispatch does create potentially lots of circuits, we can keep this open as a draft. |
|
@ljedrz do you want to build and benchmark another feature to this draft. Currently we hash |
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
|
The most recent commit provides the following benchmark wins compared with the previous one:
|
This addresses the 1st part of this comment on the linked issue.
CC #2871.