Skip to content

feat: CON-1665 Include NiDKG dealings into blocks based on priority#9766

Open
eichhorl wants to merge 5 commits intomasterfrom
eichhorl/sort-dealings
Open

feat: CON-1665 Include NiDKG dealings into blocks based on priority#9766
eichhorl wants to merge 5 commits intomasterfrom
eichhorl/sort-dealings

Conversation

@eichhorl
Copy link
Copy Markdown
Contributor

@eichhorl eichhorl commented Apr 8, 2026

Background

At the start of each interval, nodes broadcast NiDKG dealings for key generation and resharing. These dealings are included into data blocks. Once enough dealings exist on chain, they may be aggregated into a full NiDKG transcript. #8469 allows the subnet to aggregate dealings not only in the following summary block, but already as part of data blocks, as soon as enough dealings exist on the parent chain.

Problem

Due to size and validation time, the number of dealings to be included on chain is limited (currently to 1 dealing per block). Additionally, the current behavior of selecting dealings to be included is sub-optimal:

  1. The subnet attempts to include all (up to N) dealings on chain, even if only f+1 (fresh key) or 2f+1 (high threshold re-sharing) dealings are required to create the transcript.
  2. The dealings are selected pseudo-randomly (based on the order of their hashes). This means, on average, all transcripts will complete at roughly the same time, once enough dealings for all of them exist on chain. This has an adverse effect on latency.

Proposed Changes

This PR proposes to stop including more dealings for a config once the config's collection_threshold has been reached. Additionally, dealings receive priority to be included based on the following conditions (in order of precedence):

  1. While there are less than MAX_REMOTE_DKGS_PER_INTERVAL (currently set to 1) completed remote targets in the current interval, prioritize remote dealings. Otherwise, prioritize local dealings. This is to improve the latency of remote DKG requests such as SetupInitialDKG and ReshareChainKey. The number of prioritized remote DKGs is limited to avoid starving local DKG. Note that currently, the number of remote DKG configs per interval is also limited to 1 (by the same constant). This may however disappear in the future.
  2. Among remote targets, prioritize dealings for targets that are closer to their collection threshold. This ensures progress is being made for one specific target (instead of all of them at once), which improves best- and average-case latency of remote DKG requests.

Future Work

Instead of only starting to work on remote DKG requests at the beginning of an interval, we will do so also as part of data blocks, as soon as the remote DKG request appears in the replicated state.

@eichhorl eichhorl added the CI_ALL_BAZEL_TARGETS Runs all bazel targets label Apr 8, 2026
@github-actions github-actions bot added the feat label Apr 8, 2026
@eichhorl eichhorl removed the CI_ALL_BAZEL_TARGETS Runs all bazel targets label Apr 9, 2026
@eichhorl eichhorl changed the title feat: sort dealings by priority feat: CON-1665 Include NiDKG dealings into blocks based on priority Apr 9, 2026
@eichhorl eichhorl marked this pull request as ready for review April 9, 2026 09:09
@eichhorl eichhorl requested a review from a team as a code owner April 9, 2026 09:09
return false;
}
if *cap > 0 {
*cap -= 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be worth documenting that this line is responsible for including at most collection_threshold dealings for a given DKG ID.

In general (but I think this could be done later), I feel like it would be easier to read and be more efficient to stop validating more dealings after collecting the threshold, rather than "artificially" limiting the number of included dealings in the payload builder here. Then we should not need to perform this if-check nor this *cap -= 1, while also saving work by validating less dealings (i.e. optimizing at the source).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be easier to read and be more efficient to stop validating more dealings after collecting the threshold

It's a good idea, unfortunately there isn't really a guarantee that all nodes on the subnet will validate the same (2)f+1 dealings. So we would probably still have to do this accounting, otherwise it could be that one node includes one set of f+1 dealings, and another node includes a different set of f+1 dealings.

Asynchronously validating all dealings in the DKG client is also beneficial for the finalization rate. When validating a block, we will first check if we already have the same dealing in our own validated DKG pool. If so, we will not validate it again as part of the block payload.

Comment on lines +2167 to +2170
let selected = select_dealings_for_payload(&configs, &dealers_from_chain, &pool, 1);

assert_eq!(selected.len(), 1);
assert_eq!(selected[0].content.dkg_id, remote_low_remaining_id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

Suggested change
let selected = select_dealings_for_payload(&configs, &dealers_from_chain, &pool, 1);
assert_eq!(selected.len(), 1);
assert_eq!(selected[0].content.dkg_id, remote_low_remaining_id);
let selected = select_dealings_for_payload(&configs, &dealers_from_chain, &pool, 10);
assert_eq!(selected.len(), 2);
assert_eq!(selected[0].content.dkg_id, remote_low_remaining_id);
assert_eq!(selected[1].content.dkg_id, remote_high_remaining_id);

assert_eq!(selected.len(), 1);
assert_eq!(selected[0].content.dkg_id, remote_target_0);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas for further tests:

  • 1 Local Low + 1 Remote Low + 1 Remote High, both remotes with the same target ID. Remote Low has reached capacity but not Remote High. Expected result: Remote High should be prioritized even if MAX_REMOTE_DKGS_PER_INTERVAL = 1 and one remote instance was completed.
    Effectively, this tests that the variable remaining_remote_target_capacity is indexed by NiDkgTargetId and not by NiDkgId to correctly deduce remote_target_ids_with_no_capacity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants