Skip to content

cosmovisor: doesn't check for pending upgrades before starting daemon on restart #25861

@joeabbey

Description

@joeabbey

Summary

cosmovisor doesn't check for an existing upgrade-info.json before starting the daemon. This causes problems when a node restarts after an upgrade height has been reached - the old binary is launched instead of the upgrade binary.

Steps to Reproduce

  1. Have a chain with a scheduled upgrade at height N
  2. Pre-stage the upgrade binary in cosmovisor/upgrades/<name>/bin/
  3. Let the chain reach height N - upgrade-info.json gets written
  4. Restart the node (crash, manual restart, etc.)
  5. cosmovisor launches the OLD binary (from current symlink, still pointing to genesis)
  6. The old binary can't proceed past the upgrade height

Expected Behavior

On startup, cosmovisor should:

  1. Check if upgrade-info.json exists
  2. Compare the upgrade name with the current symlink target
  3. If different AND the upgrade binary exists, switch the symlink BEFORE starting the daemon

Actual Behavior

cosmovisor immediately launches whatever binary current points to. Upgrade detection only happens via file watcher WHILE the daemon is running (in WaitForUpgradeOrExit). There's no pre-startup check.

Code Analysis

In run.go, the Run() function creates the launcher and immediately calls launcher.Run() without checking for pending upgrades:

func run(args []string, ...) error {
    // ... config loading ...
    launcher := cosmovisor.NewLauncher(logger, cfg)
    return launcher.Run(args, ...) // no pre-check here
}

In process.go, Run() starts the daemon immediately:

func (l Launcher) Run(args []string, ...) error {
    cmd := exec.Command(l.cfg.CurrentBin(), args...)
    cmd.Start() // launches without checking upgrade-info.json
    // ...
    l.WaitForUpgradeOrExit(...) // upgrade detection happens here, AFTER daemon starts
}

Proposed Fix

Add a pre-startup check in the Run() function before launching the daemon:

func (l Launcher) Run(args []string, ...) error {
    // NEW: Check for pending upgrade before starting
    if err := l.checkPendingUpgrade(); err != nil {
        return err
    }
    
    cmd := exec.Command(l.cfg.CurrentBin(), args...)
    // ...
}

func (l Launcher) checkPendingUpgrade() error {
    upgradeInfo, err := l.cfg.UpgradeInfo()
    if err != nil || upgradeInfo.Name == "" {
        return nil // no pending upgrade
    }
    
    currentUpgrade, _ := l.cfg.CurrentUpgrade()
    if upgradeInfo.Name == currentUpgrade.Name {
        return nil // already on correct binary
    }
    
    // Switch to upgrade binary if it exists
    return l.cfg.SetCurrentUpgrade(upgradeInfo)
}

Impact

This issue caused significant downtime during the Sei v6.3.0 upgrade. Nodes that restarted during the upgrade window got stuck because cosmovisor kept launching the old binary.

Environment

  • cosmovisor version: latest from main
  • Chain: Sei pacific-1
  • Upgrade mechanism: scheduled upgrade via governance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions