Skip to content

Conversation

@mraszyk
Copy link
Contributor

@mraszyk mraszyk commented Dec 8, 2025

This PR adds a node rejoin test with a slow catch-up provoked by long DSM rounds:

  • creating many canisters (100,000) - so that iterating over all canisters is slow;
  • deploying a few "busy" canisters - so that executing those canisters is slow.

The test can be run using the following command:

ict t //rs/tests/message_routing:rejoin_test_long_rounds

Runbook:

  • setup the testnet of 3f + 1 nodes with f = 4 (like on mainnet);
  • pick a random node and install 4 "seed" canisters through it (the state sync test canister is used as "seed");
  • create 100,000 canisters via the "seed" canisters (in parallel);
  • deploy 8 "busy" canisters (universal canister with heartbeats executing 1.8B instructions);
  • pick another random node and kill that node;
  • wait for the subnet producing a CUP;
  • start the killed node.

Success: the restarted node catches up w.r.t. its certified height and becomes healthy until the next CUP.

In the attached screenshot showing how much the restarted node is lagging behind, we see that the restarted node is catching up only very slowly at the moment

Screenshot from 2025-12-14 23-34-39

or (in another test run) does not catch up at all

Screenshot from 2025-12-19 15-11-59

@github-actions github-actions bot added the test label Dec 8, 2025
@mraszyk mraszyk force-pushed the mraszyk/rejoin-test-many-canisters branch from 6134f3b to 1547467 Compare December 14, 2025 18:29
@mraszyk mraszyk changed the title test: rejoin test with many canisters test: rejoin test with slow catch-up Dec 16, 2025
@mraszyk mraszyk marked this pull request as ready for review January 5, 2026 07:38
@mraszyk mraszyk requested review from a team as code owners January 5, 2026 07:38
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):

  1. Update unreleased_changelog.md (if there are behavior changes, even if they are
    non-breaking).

  2. Are there BREAKING changes?

  3. Is a data migration needed?

  4. Security review?

How to Satisfy This Automatic Review

  1. Go to the bottom of the pull request page.

  2. Look for where it says this bot is requesting changes.

  3. Click the three dots to the right.

  4. Select "Dismiss review".

  5. In the text entry box, respond to each of the numbered items in the previous
    section, declare one of the following:

  • Done.

  • $REASON_WHY_NO_NEED. E.g. for unreleased_changelog.md, "No
    canister behavior changes.", or for item 2, "Existing APIs
    behave as before.".

Brief Guide to "Externally Visible" Changes

"Externally visible behavior change" is very often due to some NEW canister API.

Changes to EXISTING APIs are more likely to be "breaking".

If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.

If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.

Reference(s)

For a more comprehensive checklist, see here.

GOVERNANCE_CHECKLIST_REMINDER_DEDUP

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request modifies the IC-OS configuration types library (rs/ic_os/config_types/src/lib.rs).

Please ensure you have followed the Configuration Update Protocol guidelines——particularly if adding a new enum or enum variants:

Enum Variant Forward Compatibility Guidelines: If adding a new enum or new variants to an enum, ensure older versions can handle unknown variants gracefully by using #[serde(other)] on a fallback variant. See examples: GuestVMType::Unknown and Ipv6Config::Unknown.

To acknowledge this reminder and unblock the PR, dismiss this code review by:

  • Going to the bottom of the pull request page
  • Finding where this bot is requesting changes
  • Clicking the three dots on the right
  • Selecting "Dismiss review"

For complete guidelines, see the documentation at the top of rs/ic_os/config_types/src/lib.rs.

CONFIG_TYPES_COMPATIBILITY_REMINDER_DEDUP

@mraszyk mraszyk force-pushed the mraszyk/rejoin-test-many-canisters branch from 394d023 to 562e289 Compare January 5, 2026 09:27
@mraszyk mraszyk dismissed github-actions[bot]’s stale review January 5, 2026 09:28

No canister behavior changes.

@mraszyk mraszyk added this pull request to the merge queue Jan 5, 2026
Merged via the queue into master with commit e8cb0a5 Jan 5, 2026
38 checks passed
@mraszyk mraszyk deleted the mraszyk/rejoin-test-many-canisters branch January 5, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants