Skip to content

[Perf] Only calculate the hash of circuit commitments once for the VK#2964

Draft
ljedrz wants to merge 2 commits intoProvableHQ:stagingfrom
ljedrz:perf/vk_circuit_commitments_hash
Draft

[Perf] Only calculate the hash of circuit commitments once for the VK#2964
ljedrz wants to merge 2 commits intoProvableHQ:stagingfrom
ljedrz:perf/vk_circuit_commitments_hash

Conversation

@ljedrz
Copy link
Copy Markdown
Collaborator

@ljedrz ljedrz commented Oct 16, 2025

This addresses the 1st part of this comment on the linked issue.

CC #2871.

…cuitVerifyingKey

Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
@ljedrz ljedrz requested a review from vicsn October 16, 2025 14:13

let mut pks = Vec::with_capacity(circuit_batch_size);
let mut all_circuits = Vec::with_capacity(circuit_batch_size);
#[allow(clippy::mutable_key_type)]
Copy link
Copy Markdown
Collaborator Author

@ljedrz ljedrz Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is needed, because the CircuitVerifyingKey has interior mutability now; however, in its case it is perfectly fine, as the new member has no impact on the Ord impl (which only considers the id), and the fact that the key implements it is the reason why the warning is raised; see the corresponding clippy lint link.

/// Commitments to the indexed polynomials.
pub circuit_commitments: Vec<sonic_pc::Commitment<E>>,
pub id: CircuitId,
pub circuit_commitments_hash: OnceLock<E::Fq>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is expensive enough that we should store it to disk - and it would be great if we can get rid of the OnceLock (which obfuscates when initialization happens, making performance analysis harder). O:)

You correctly observed that we don't have to transmit it over the wire though.

Copy link
Copy Markdown
Collaborator Author

@ljedrz ljedrz Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was that at the moment of creation, the fs_params is not available, and at later stages the VKs are immutable - hence the OnceLock.

Storing to the disk would probably work around this, but retrieving it would be expensive, perhaps to the point of offsetting any performance gains from caching it, unless the hashing is really computationally expensive.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing to the disk would probably work around this, but retrieving it would be expensive, perhaps to the point of offsetting any performance gains from caching it, unless the hashing is really computationally expensive.

To be clear, we would retrieve it from disk only when we retrieve the VK from disk. And the hashes are very expensive.

However, a big downside I do see is the sheer amount of work to adjust the database logic. So for the first version you can also compute it during construction.

The issue was that at the moment of creation, the fs_params is not available

Looks to me it's always available in N::varuna_fs_parameters() ? Or is there a scoping issue? Have fun with that :")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me it's always available in N::varuna_fs_parameters()?

That's correct; however, there is no notion of the Network - or even snarkvm-console - in algorithms. Would it be acceptable to alter SNARK::circuit_setup to require FSParameters, like the other VarunaSNARK functions do?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, a big downside I do see is the sheer amount of work to adjust the database logic. So for the first version you can also compute it during construction.

For my future self reading this: we should just store to disk, but we can write out that logic when we have a definite timeline for landing this feature.

@vicsn
Copy link
Copy Markdown
Collaborator

vicsn commented Oct 16, 2025

Can you report the performance improvements for the existing snark_batch_verify benchmark with:

  • circuit_batch_size:1, instance_size:5
  • circuit_batch_size:1, instance_size:1000

fs_parameters: &FS::Parameters,
inputs_and_batch_sizes: &BTreeMap<CircuitId, (usize, &[Vec<E::Fr>])>,
circuit_commitments: impl Iterator<Item = &'a [crate::polycommit::sonic_pc::Commitment<E>]>,
circuit_commitments_hashes: Vec<E::Fq>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a 3rd VarunaVersion, and guard the changes by it to preserve backwards compatibility? You can peek at where we use VarunaVersion::V2 for inspiration.

@vicsn vicsn marked this pull request as draft October 16, 2025 14:40
@ljedrz
Copy link
Copy Markdown
Collaborator Author

ljedrz commented Oct 17, 2025

circuit_batch_size:1, instance_size:5

snark_batch_verify      time:   [4.6137 ms 4.6239 ms 4.6343 ms]
                        change: [−5.4575% −5.1305% −4.8178%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

circuit_batch_size:1, instance_size:1000

snark_batch_verify      time:   [149.98 ms 150.09 ms 150.22 ms]
                        change: [−0.1857% −0.0640% +0.0545%] (p = 0.32 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

@vicsn
Copy link
Copy Markdown
Collaborator

vicsn commented Oct 17, 2025

Thank you for the benchmarks, looks like I made a mistake...

This PR is only consequential when we have a very large amount of different circuits - but in the current system and even with dynamic dispatch we're never planning to have more than 32 circuits to batch - and the most common case is just 3 circuits.

Can you do one more benchmark with circuit_size: 3, instance_size: 10, before and after this PR? If the difference is inconsequential we can close this PR and stop this line of work.

@ljedrz
Copy link
Copy Markdown
Collaborator Author

ljedrz commented Oct 17, 2025

sure:

snark_batch_verify      time:   [10.654 ms 10.669 ms 10.683 ms]
                        change: [−6.3521% −6.1469% −5.9447%] (p = 0.00 < 0.05)
                        Performance has improved.

@vicsn
Copy link
Copy Markdown
Collaborator

vicsn commented Oct 23, 2025

circuit_batch_size:1, instance_size:1000

Could you maybe run this comparison again? Perhaps even try instance_size 4000?

If it is indeed a negligible difference, could you reproduce the single-threaded flamegraph I made to revisit whether initi_sponge hashing is really a bottleneck? staging...perf/verify_batch#diff-9ae02456754b316d993bd81a0e311bff5e432156b69d28b867d5c98027d0b79bR137

@ljedrz
Copy link
Copy Markdown
Collaborator Author

ljedrz commented Oct 23, 2025

Perhaps even try instance_size 4000?

snark_batch_verify      time:   [586.57 ms 586.99 ms 587.43 ms]
                        change: [+0.2159% +0.3150% +0.4199%] (p = 0.00 < 0.05)
                        Change within noise threshold.

@ljedrz
Copy link
Copy Markdown
Collaborator Author

ljedrz commented Oct 23, 2025

If it is indeed a negligible difference, could you reproduce the single-threaded flamegraph I made to revisit whether initi_sponge hashing is really a bottleneck?

Done; it does show that init_sponge takes ~43%, but this PR doesn't seem to impact it (only ~0.4% decrease in that value).

@vicsn vicsn closed this Oct 24, 2025
@vicsn vicsn reopened this Oct 29, 2025
@vicsn
Copy link
Copy Markdown
Collaborator

vicsn commented Oct 29, 2025

TIL again that dynamic dispatch does create potentially lots of circuits, we can keep this open as a draft.

@vicsn
Copy link
Copy Markdown
Collaborator

vicsn commented Jan 12, 2026

@ljedrz do you want to build and benchmark another feature to this draft. Currently we hash inputs_and_batch_sizes inside of fn init_sponge using the expensive Poseidon hash. Instead, the prover and verifier can hash those to a single field element using hash_sha3_256, and then pass that single value into fn {prove_verify}_batch and into fn init_sponge to be hashed by Poseidon

Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
@ljedrz
Copy link
Copy Markdown
Collaborator Author

ljedrz commented Jan 14, 2026

The most recent commit provides the following benchmark wins compared with the previous one:

circuit_batch_size:1, instance_size:5

nark_batch_verify      time:   [7.2229 ms 7.2641 ms 7.3056 ms]
                        change: [−5.8052% −5.1281% −4.4218%] (p = 0.00 < 0.05)
                        Performance has improved.

circuit_batch_size:1, instance_size:1000

snark_batch_verify      time:   [162.82 ms 163.37 ms 163.95 ms]
                        change: [−27.720% −27.366% −27.010%] (p = 0.00 < 0.05)
                        Performance has improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants