Skip to content

fix: dashboard terminal endpoints die without supervision causing recurring /ao outages #417

@AgentWrapper

Description

@AgentWrapper

Problem

AO dashboard and terminal endpoints are frequently unavailable in real usage because web + websocket processes are started as ad-hoc dev commands and are not supervised. Users repeatedly hit:

  • AO dashboard upstream unavailable ... 127.0.0.1:3000 connection refused
  • terminal stuck at Connecting… XDA when 14800/14801 are down

Why this matters

This breaks AO day-to-day operation and undermines confidence in the orchestration UX.

Scope

Implement a permanent production-grade runtime model so dashboard and terminal sockets are resilient.

Requirements

  1. Provide a supervised startup path (systemd user service or equivalent) for:
    • dashboard web server
    • terminal websocket server
    • direct terminal websocket server
  2. Auto-restart on crash and boot/login where appropriate.
  3. Health checks + clear status command output for 3000/14800/14801 readiness.
  4. Remove dependency on fragile ad-hoc dev shell processes for normal operation.
  5. Document operator setup/recovery in repo docs.

Acceptance Criteria

  • Killing one process auto-recovers without manual restarts.
  • /ao/ and /sessions/<id> remain reachable under routine restarts.
  • Terminal connects reliably (no perpetual Connecting… XDA from missing ws backends).
  • Tests added for health/status behavior where feasible.

Nice to have

  • CLI command for ao services install|start|stop|status
  • watchdog that self-heals missing terminal ws processes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions