Skip to content

fix(governance): correct canister upgrade order in golden-state test to prevent node rewards distribution failure#9126

Merged
pietrodimarco-dfinity merged 2 commits intomasterfrom
pmarco/fix-golden-state-test-split-time-advancement
Mar 4, 2026
Merged

fix(governance): correct canister upgrade order in golden-state test to prevent node rewards distribution failure#9126
pietrodimarco-dfinity merged 2 commits intomasterfrom
pmarco/fix-golden-state-test-split-time-advancement

Conversation

@pietrodimarco-dfinity
Copy link
Contributor

@pietrodimarco-dfinity pietrodimarco-dfinity commented Mar 2, 2026

Motivation

The Node Rewards canister updates node metrics immediately upon startup.
However, at that time the Registry canister may not yet be running, since it is often upgraded after the Node Rewards canister.
This creates a failure scenario where metric updates cannot proceed:

2026-03-15 08:09:07.483294864 UTC: [Canister sgymv-uiaaa-aaaaa-aaaia-cai] Failed to sync local registry: "Request to get_changes_since failed with code 5 and message: IC0508: Canister rwlgt-iiaaa-aaaaa-aaaaa-cai is not running"

As a result, node metrics are not updated, which blocks reward distribution.

Fix

Update the Node Rewards canister after upgrading the Registry canister to ensure the Registry remains available when metrics are refreshed.

@pietrodimarco-dfinity pietrodimarco-dfinity requested a review from a team as a code owner March 2, 2026 16:46
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):

  1. Update unreleased_changelog.md (if there are behavior changes, even if they are
    non-breaking).

  2. Are there BREAKING changes?

  3. Is a data migration needed?

  4. Security review?

How to Satisfy This Automatic Review

  1. Go to the bottom of the pull request page.

  2. Look for where it says this bot is requesting changes.

  3. Click the three dots to the right.

  4. Select "Dismiss review".

  5. In the text entry box, respond to each of the numbered items in the previous
    section, declare one of the following:

  • Done.

  • $REASON_WHY_NO_NEED. E.g. for unreleased_changelog.md, "No
    canister behavior changes.", or for item 2, "Existing APIs
    behave as before.".

Brief Guide to "Externally Visible" Changes

"Externally visible behavior change" is very often due to some NEW canister API.

Changes to EXISTING APIs are more likely to be "breaking".

If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.

If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.

Reference(s)

For a more comprehensive checklist, see here.

GOVERNANCE_CHECKLIST_REMINDER_DEDUP

@github-actions github-actions bot added the fix label Mar 2, 2026
Copy link
Contributor

@jasonz-dfinity jasonz-dfinity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but in the PR description, I think you meant upgrading node rewards AFTER registry

@daniel-wong-dfinity-org
Copy link
Contributor

The title is too generic.

@daniel-wong-dfinity-org
Copy link
Contributor

The title is too generic.

It should say what was broken about it, and/or how that brokenness is addressed. E.g.

Change the order of upgrades in golden state upgrade test.

(This explains how the brokenness was addressed, but not what was broken.)

@daniel-wong-dfinity-org
Copy link
Contributor

Shouldn't there be some retry (so that a transient issue like Registry being upgraded) does not become a permanent issue?

@pietrodimarco-dfinity
Copy link
Contributor Author

The title is too generic.

It should say what was broken about it, and/or how that brokenness is addressed. E.g.

Change the order of upgrades in golden state upgrade test.

(This explains how the brokenness was addressed, but not what was broken.)

Noice make sense will change it :)

@pietrodimarco-dfinity
Copy link
Contributor Author

Shouldn't there be some retry (so that a transient issue like Registry being upgraded) does not become a permanent issue?

I believe this should be a permanent fix as the registry will be for sure running when node-rewards canister update its metrics. But happy to add retries if necessary

@pietrodimarco-dfinity pietrodimarco-dfinity changed the title fix(Governance): Fix golden state tests node-rewards canister fix(governance): Prevent node rewards distribution failure when registry canister is not running by adjusting canisters upgrade order in golden-state test Mar 4, 2026
@pietrodimarco-dfinity pietrodimarco-dfinity changed the title fix(governance): Prevent node rewards distribution failure when registry canister is not running by adjusting canisters upgrade order in golden-state test fix(governance): correct canister upgrade order in golden-state test to prevent node rewards distribution failure Mar 4, 2026
@pietrodimarco-dfinity pietrodimarco-dfinity added this pull request to the merge queue Mar 4, 2026
Merged via the queue into master with commit 3341ebe Mar 4, 2026
39 checks passed
@pietrodimarco-dfinity pietrodimarco-dfinity deleted the pmarco/fix-golden-state-test-split-time-advancement branch March 4, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants