Skip to content

[Perf] Compact subdag#2841

Closed
ljedrz wants to merge 7 commits intoProvableHQ:stagingfrom
ljedrz:perf/compact_subdag
Closed

[Perf] Compact subdag#2841
ljedrz wants to merge 7 commits intoProvableHQ:stagingfrom
ljedrz:perf/compact_subdag

Conversation

@ljedrz
Copy link
Copy Markdown
Collaborator

@ljedrz ljedrz commented Aug 1, 2025

A revival of #2304.

Filing as a draft for now, as due to the age of the aforelinked PR it was a challenging rebase, and further adjustments may be needed.

Design

The goal of this PR is to decrease the bandwidth usage during block sync. If we scale to large amounts of validators (e.g. 100), besides signatures, duplicate transmissions may take up a significant amount of space.

To achieve this:

  • Prior transmission IDs and new transmissions are stored in a Block just once.
  • CompactHeaders, CompactCertificates and Compact Subdags reference these with ordered u32 indices
  • New Quorum blocks are created, stored and broadcasted as Compact.
  • Recipients must convert back to full transmissions on the ocassions:
    • For every received block to recompute the batch_id, which in turn ensures the signatures are actually valid.
    • For validators who do at-tip Sync with full certificates or who boot up with full certificates. This is not needed in theory, but require a large refactor to change. Conversion to full certificates can be unchecked.

TODOs

  • Replace bitset by index vec to retain ordering
  • Make sure Block contains Optional {prior, aborted}_{transaction,solution}_transmission_ids, and stores/retrieves them from disk appropriately
  • Out of Scope: Consider re-use the SubdagTransmissions object (or something similar) to unify passing in transmissions for all the conversions to-from compact subdag. This should not stand in the way of passing iterators, we must avoid Cloning these large amounts of data wherever we can...

Related past PRs

#1999, ProvableHQ/snarkOS#3000.

@ljedrz ljedrz requested a review from vicsn August 1, 2025 13:28
Copy link
Copy Markdown
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, I think its close to finished.

In check_next_block we should disallow the Full subdag from ConsensusVersion::V10, and disallow Compact before it, to avoid forking and complexity.

Do you also want to draft the snarkOS PR? Make sure to do it on top of Kai's recent snarkOS PR which uses the new PendingBlock.

I'll give another security review after the current round of fixes.

@ljedrz ljedrz force-pushed the perf/compact_subdag branch from b6abc09 to 202f5ae Compare August 5, 2025 09:40
/// The transactions in this block.
transactions: Transactions<N>,
/// The IDs of both accepted and aborted transactions in the previous block.
prior_transaction_ids: Vec<N::TransactionID>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prior_* is a bit too ambiguous. Our current usage of aborted_or_existing_transaction_ids is also quite verbose, but I'd still lean more towards verbosity for the sake of clarity.

Other options could be:
existing_*
past_*

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for backwards compatibility, an Option<Vec<...>> would be better than just an empty Vec for 2 reasons.

  1. Services (wallets and explorers) have already serialized blocks without this field and might break if this is a required field.
  2. Previous blocks will have an empty vec here; however, in reality they might actually have transactions or solutions that belong there based on the subdag state.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I wouldn't mind adding Option wrappers here, there is an alternative approach that we could take, i.e. adding the version (which is already stored in the database and sent over the network) to the Block object, which IMO would make reasoning about the contents more explicit and futureproof.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prior_* is a bit too ambiguous. Our current usage of aborted_or_existing_transaction_ids is also quite verbose, but I'd still lean more towards verbosity for the sake of clarity.
Other options could be:
existing_*
past_*

I'm happy to make things verbose if it makes things more clear, and excuse this non-native speaker commenting on this, but prior is a synonym of previous and past right?

@vicsn vicsn requested a review from kaimast August 11, 2025 15:58
@kaimast
Copy link
Copy Markdown
Collaborator

kaimast commented Aug 12, 2025

This looks good to me, but I will do another pass before approving. Hopefully, merging/rebasing to staging will make CI pass.

However, I am wondering if it makes sense to add three new crates for this? The compact-header crate is only 800LOC and the compact-batch even less. This functionality could be added to the existing snarkvm-ledger-narwhal-batch-certificate and snarkvm-ledger-narwhal-batch-header crates without problems, unless I am missing something.

I know this mostly boils down to preference, but I feel like these many small crates add a lot of bloat. The most extreme case is the traits crate which is <50 LOC but over 200 lines of Cargo.toml, README, and LICENSE files.
Are there any cases where someone would want to use the regular header crate but not the compact-header crate? If they are always used together, it makes a lot of sense to merge them, IMHO.

@vicsn
Copy link
Copy Markdown
Collaborator

vicsn commented Aug 13, 2025

However, I am wondering if it makes sense to add three new crates for this?

If we can find a good reason for it, e.g. faster compile times, we can refactor the crates in the future.

}
// From ConsensusVersion::V10 onwards, use `check_solution` instead of `check_solution_mut`.
let res = if N::CONSENSUS_VERSION(self.latest_height()).unwrap() >= ConsensusVersion::V10 {
self.puzzle().check_solution(solution, latest_epoch_hash, latest_proof_target)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not reintroduce the forking risk?

Where a malicious validator forcibly changes the target in some of the solutions and keeps the correct one in the original. Then at block production, some would accept and some would abort the solution.

@vicsn

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduced check_solutions_mut before we had transmission checksum. And I believe it was technically not needed anymore ever since we did introduce transmission checksums.

Reason for changing this now, is to make it clear why we're not carrying around accepted_solution_checksums in the CompactBlock: because we can recompute the checksum from solutions.

It does not hurt for us to add a unit test that a solution with false target will be aborted.

@vicsn vicsn changed the title [Perf] Store a bitset of transmissions in a compact subdag [Perf] Compact subdag Aug 18, 2025
@ljedrz ljedrz marked this pull request as ready for review August 21, 2025 09:53
@ljedrz ljedrz requested a review from vicsn August 21, 2025 09:53
New types:
- CompactCertificate
- CompactHeader
- Subdag enum can hold either
- NarwhalCertificate trait provides an interface
- We use BitSet, which used to be part of standard library

Co-authored-by ljedrz <ljedrz@users.noreply.github.com>
@ljedrz ljedrz force-pushed the perf/compact_subdag branch from 4374442 to 2410531 Compare August 21, 2025 11:05
subdag: Subdag<N>,
transmissions: IndexMap<TransmissionID<N>, Transmission<N>>,
mut subdag: Subdag<N>,
subdag_transmissions: SubdagTransmissions<N>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is hella ugly now but we can create a new issue for cleaning it up

ljedrz added 3 commits August 21, 2025 21:04
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
@ljedrz ljedrz requested a review from vicsn August 22, 2025 08:30
Copy link
Copy Markdown
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit, otherwise LGTM!

ljedrz added 2 commits August 22, 2025 13:04
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@kaimast kaimast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few comments I left a while ago. Hopefully we can merge this soon

}

/// Performs sanity checks on the subdag.
macro_rules! sanity_check_subdag {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be a macro, or could we also write something like the following?

Suggested change
macro_rules! sanity_check_subdag {
fn sanity_check_subdag<T>(subdag BTreeMap<u64, IndexSet<T>>) -> Result<()> {

I always try to avoid macros as they harder to read, at least for me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not my go-to either, but at least this one is rather simple; I tried to change it into a function again now, but wasn't able to

let mut transmission_ids = IndexSet::new();
transactions.chain(aborted_transactions).chain(prior_transactions).enumerate().try_for_each(
|(index, transmission_id)| {
if self.transaction_indices.contains(&u32::try_from(index)?) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this preserve the order? To me, it reads like the transactions are added in the order they are in the block not the order of transaction_indices.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vicsn can you please advise here? I'm not sure

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I see that's a bit confusing. Though note how we're using the same logic in CompactHeader::new to create the indices. This is quite brittle/unclear, so I think after this PR we could consolidate a bit (see comment in PR message about a better/new SubdagTransmissions abstraction.

default = [
"batch-certificate",
"batch-header",
"compact-certificate",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have all these feature flags? Mabye @vicsn knows if there are instances where we don't use the default features?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a chore to figure out, and we can tackle that separate when we're focusing on improving the dependency/feature tree to improve build times, starting with what gives us the biggest bang for our buck, and maybe we'll find automated tools to do most of the work for us.

aborted_solution_transmission_ids: Option<Vec<TransmissionID<N>>>,
/// The aborted solution IDs in this block.
aborted_solution_ids: Vec<SolutionID<N>>,
aborted_solution_ids: Option<Vec<SolutionID<N>>>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document when this can be None?

Also, is it always the case that if aborted_solution_ids is Some aborted_solution_transmission_ids is also Some? If so, maybe they should be contained in the same Option?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added extra docs; as for improved structure, it would indeed be best to even have something like an

enum AssociatedIDs {
    FullSubdag { aborted_solution_ids, aborted_transaction_ids },
    CompactSubdag { prior_solution_transmission_ids, aborted_solution_transmission_ids, prior_transaction_transmission_ids, aborted_transaction_transmission_ids },
}

but I'd recommend leaving it for a follow-up, as it is functionally the same, and this PR is a bitrot minefield already.

Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
@kaimast
Copy link
Copy Markdown
Collaborator

kaimast commented Oct 27, 2025

Can you run benchmarks on the PR once it is rebased (or staging has been merged into it)? In case you do not have permissions to trigger them from the "Actions" tab, ping me and I will do it.

@kaimast
Copy link
Copy Markdown
Collaborator

kaimast commented Nov 21, 2025

After more discussion, we decided to not merge compact subDAGs right now. I will leave #3000 open as a draft and close this PR, and we can revisit it in the future.

@kaimast kaimast closed this Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants