fix(governance): correct canister upgrade order in golden-state test to prevent node rewards distribution failure#9126
Conversation
There was a problem hiding this comment.
This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):
-
Update
unreleased_changelog.md(if there are behavior changes, even if they are
non-breaking). -
Are there BREAKING changes?
-
Is a data migration needed?
-
Security review?
How to Satisfy This Automatic Review
-
Go to the bottom of the pull request page.
-
Look for where it says this bot is requesting changes.
-
Click the three dots to the right.
-
Select "Dismiss review".
-
In the text entry box, respond to each of the numbered items in the previous
section, declare one of the following:
-
Done.
-
$REASON_WHY_NO_NEED. E.g. for
unreleased_changelog.md, "No
canister behavior changes.", or for item 2, "Existing APIs
behave as before.".
Brief Guide to "Externally Visible" Changes
"Externally visible behavior change" is very often due to some NEW canister API.
Changes to EXISTING APIs are more likely to be "breaking".
If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.
If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.
Reference(s)
For a more comprehensive checklist, see here.
GOVERNANCE_CHECKLIST_REMINDER_DEDUP
jasonz-dfinity
left a comment
There was a problem hiding this comment.
LGTM, but in the PR description, I think you meant upgrading node rewards AFTER registry
|
The title is too generic. |
It should say what was broken about it, and/or how that brokenness is addressed. E.g.
(This explains how the brokenness was addressed, but not what was broken.) |
|
Shouldn't there be some retry (so that a transient issue like Registry being upgraded) does not become a permanent issue? |
Noice make sense will change it :) |
I believe this should be a permanent fix as the registry will be for sure running when node-rewards canister update its metrics. But happy to add retries if necessary |
Motivation
The Node Rewards canister updates node metrics immediately upon startup.
However, at that time the Registry canister may not yet be running, since it is often upgraded after the Node Rewards canister.
This creates a failure scenario where metric updates cannot proceed:
As a result, node metrics are not updated, which blocks reward distribution.
Fix
Update the Node Rewards canister after upgrading the Registry canister to ensure the Registry remains available when metrics are refreshed.