Skip to content

feat(consensus): support subnet-splitting in consensus & orchestrator#9748

Draft
kpop-dfinity wants to merge 12 commits intomasterfrom
kpop/final_ss
Draft

feat(consensus): support subnet-splitting in consensus & orchestrator#9748
kpop-dfinity wants to merge 12 commits intomasterfrom
kpop/final_ss

Conversation

@kpop-dfinity
Copy link
Copy Markdown
Contributor

@kpop-dfinity kpop-dfinity commented Apr 7, 2026

This PR adds support for subnet splitting to the consensus protocol and the orchestrator.

High level overview

On a high level, from the consensus point of view, subnet splitting is conceptually similar to subnet upgrades, in the sense that once a replica sees an instruction in the registry that the subnet they belong should be split, it will schedule the subnet splitting at the nearest summary block, then will create blocks leading to the summary block normally*, and then will keep creating empty blocks (without delivering them to DSM) until the summary block is finalized. If the replica still belongs to the old subnet, it will continue producing the blocks as before, otherwise it will halt until the orchestrator restart the replica process.

diagram

Suppose a subnet splitting instruction was created at registry version r and at the time of creating a block at height H-2, a replica update the local registry to version >= r.

    ↓
+--------+
|  DATA  |
|  H-2   |    For the data blocks leading to the summary block at which the split will happen
|        |    nothing changes except, that we make sure the `validation_context.registry_version < r`    
+--------+
    ↓
+--------+
|  DATA  |
|  H-1   |    For the data blocks leading to the summary block at which the split will happen
|        |    nothing changes except, that we make sure the `validation_context.registry_version < r`  
+--------+
    ↓
+--------+
|SUMMARY |    The summary block is created as usual, but we populate the `subnet_splitting_status` field 
|   H    |    appropriately and we set the `validation_context.registry_version` to exactly `r`.
|        |    When delivering the block to DSM, we indicate that the state should be split and only some
+--------+    canisters should be retained 
    ↓
+--------+
|  DATA  |
|  H+1   |    Until the summary block at height `H` is finalized, we create empty data blocks and we
|        |    don't deliver them to DSM
+--------+
    ↓
+--------+
|  DATA  |
|  H+2   |    Until the summary block at height `H` is finalized, we create empty data blocks
|        |    don't deliver them to DSM
+--------+

Once the summary block at height `H` is finalized we create a CUP at height `H+500`, meaning that 
the chain will be broken, in the sense that there won't be blocks between heights `H+d` and `H+500`,
where `d` is the latest block produced before the summary block at height `H` was finalized.

+--------+
|  CUP   | 
| H+500  |
|        |
+--------+
    ↓
+--------+
|  DATA  |    If the replica still belongs to the old subnet, continue as normal. 
| H+501  |    Otherwise, wait until the orchestrator restarts the `replica` process before    
|        |    producing new blocks.
+--------+
    ↓

Detailed overview

Certification

The only change in the certification is that we stop creating/validating certification shares in two cases:

  1. If the subnet has been split and the replica is waiting for the orchestrator to restart the replica process
  2. if the height is in the splitting dkg interval // FIXME

Registry version freezing

If a subnet splitting is supposed to happen at registry version r, we make sure that the registry version in the validation context of the blocks is below r, until the subnet splitting finally happens at the summary block. This is in order to prevent DSM from prematurely thinking that subnet has been split

CUP creation/validation/aggregation

@github-actions github-actions bot added the feat label Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant