-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Summary
cosmovisor doesn't check for an existing upgrade-info.json before starting the daemon. This causes problems when a node restarts after an upgrade height has been reached - the old binary is launched instead of the upgrade binary.
Steps to Reproduce
- Have a chain with a scheduled upgrade at height N
- Pre-stage the upgrade binary in
cosmovisor/upgrades/<name>/bin/ - Let the chain reach height N - upgrade-info.json gets written
- Restart the node (crash, manual restart, etc.)
- cosmovisor launches the OLD binary (from
currentsymlink, still pointing to genesis) - The old binary can't proceed past the upgrade height
Expected Behavior
On startup, cosmovisor should:
- Check if
upgrade-info.jsonexists - Compare the upgrade name with the current symlink target
- If different AND the upgrade binary exists, switch the symlink BEFORE starting the daemon
Actual Behavior
cosmovisor immediately launches whatever binary current points to. Upgrade detection only happens via file watcher WHILE the daemon is running (in WaitForUpgradeOrExit). There's no pre-startup check.
Code Analysis
In run.go, the Run() function creates the launcher and immediately calls launcher.Run() without checking for pending upgrades:
func run(args []string, ...) error {
// ... config loading ...
launcher := cosmovisor.NewLauncher(logger, cfg)
return launcher.Run(args, ...) // no pre-check here
}In process.go, Run() starts the daemon immediately:
func (l Launcher) Run(args []string, ...) error {
cmd := exec.Command(l.cfg.CurrentBin(), args...)
cmd.Start() // launches without checking upgrade-info.json
// ...
l.WaitForUpgradeOrExit(...) // upgrade detection happens here, AFTER daemon starts
}Proposed Fix
Add a pre-startup check in the Run() function before launching the daemon:
func (l Launcher) Run(args []string, ...) error {
// NEW: Check for pending upgrade before starting
if err := l.checkPendingUpgrade(); err != nil {
return err
}
cmd := exec.Command(l.cfg.CurrentBin(), args...)
// ...
}
func (l Launcher) checkPendingUpgrade() error {
upgradeInfo, err := l.cfg.UpgradeInfo()
if err != nil || upgradeInfo.Name == "" {
return nil // no pending upgrade
}
currentUpgrade, _ := l.cfg.CurrentUpgrade()
if upgradeInfo.Name == currentUpgrade.Name {
return nil // already on correct binary
}
// Switch to upgrade binary if it exists
return l.cfg.SetCurrentUpgrade(upgradeInfo)
}Impact
This issue caused significant downtime during the Sei v6.3.0 upgrade. Nodes that restarted during the upgrade window got stuck because cosmovisor kept launching the old binary.
Environment
- cosmovisor version: latest from main
- Chain: Sei pacific-1
- Upgrade mechanism: scheduled upgrade via governance