Skip to content

Conversation

@agent-relay
Copy link
Contributor

@agent-relay agent-relay bot commented Jan 8, 2026

Progress Tracker Sidecar Agent

Detailed specification for Progress Tracker external sidecar agent that monitors relay workspace servers and provides deep visibility into agent work.

What's Included

1. Comprehensive Specification (docs/PROGRESS_TRACKER_SIDECAR_SPEC.md)

  • Architecture: External sidecar agent using Claude/Codex SDK
  • Components: 6 core components with detailed design
    • Relay API client (query daemon state)
    • Log tailer (stream agent stdout/stderr)
    • Pattern analyzer (LLM-powered stuck detection)
    • Reminder system (context-aware reminders)
    • Lifecycle manager (start/stop/restart agents)
    • Escalation engine (alert lead with context)

2. Key Features

  • Real-time monitoring of multiple relay servers simultaneously
  • Deep work visibility via log tailing (not just relay messages)
  • Intelligent stuck detection using LLM analysis
  • Context-aware reminders using agent's trail/continuity
  • Agent lifecycle control (start/stop/restart)
  • Smart escalation with full context to lead
  • Automatic recovery for error loops and timeouts

3. Implementation Plan (docs/PROGRESS_TRACKER_BEADS_TASKS.md)

  • 6 feature tasks organized by implementation phase
  • 54 hours total estimated effort
  • 2-3 week timeline with 2 developers
  • Clear requirements and acceptance criteria per task

Problem Solved

Current relay monitoring gaps:

  • ❌ Time-based prompts create noise without insight
  • ❌ No visibility beyond relay messages
  • ❌ Stuck agents undetected until manual intervention
  • ❌ No intelligent recovery mechanisms
  • ❌ Continuity context underutilized

Solution Benefits

✅ Event-based monitoring (no time-based prompts)
✅ Deep work visibility (logs + relay messages)
✅ Automatic stuck detection (<2min latency)
✅ Intelligent reminders using context
✅ Auto-recovery for common failure modes
✅ Lead gets real, actionable alerts
✅ Scales to monitor n relay servers

Implementation Phases

  1. Core Framework (8h) - Polling loop, config, logging
  2. Relay API (6h) - Query daemon, get agent state
  3. Log Tailing (12h) - Stream logs, detect patterns
  4. Reminders (10h) - Context-aware messages, response tracking
  5. Lifecycle (8h) - Start/stop/restart, auto-recovery
  6. Alerting (10h) - Slack/webhook, history, escalation

Next Steps

  1. Review specification and design
  2. Discuss implementation approach
  3. Create individual PRs for each phase
  4. Deploy Phase 1 (core framework) first
  5. Iterate based on feedback

Discussion Points

  • Should sidecar be Claude or Codex (language model)?
  • Relay daemon API endpoints - any gaps vs spec?
  • Alert delivery (Slack, webhook, both)?
  • Configuration format and defaults?
  • Multi-server monitoring priorities?

RFC (Request for Comment) - looking for feedback on architecture and approach before implementation begins.

🤖 Generated with Claude Code

Co-Authored-By: khaliqgant [email protected]

Agent Relay and others added 3 commits January 8, 2026 06:53
…ads tasks

Add detailed specification for Progress Tracker external sidecar agent:
- Overview and architecture (external agent monitoring relay servers)
- 6 components: Relay API client, log tailer, pattern analyzer, reminders, lifecycle, escalation
- Full feature set: log tailing, pattern detection, intelligent reminders, lifecycle control
- Data flows for polling, reminders, and escalation
- 5 implementation phases with clear deliverables
- Technology stack and deployment guidance
- API design for relay daemon extensions
- Testing strategy and success criteria

Add beads task breakdown:
- 6 feature tasks organized by implementation phase
- Task descriptions with requirements and acceptance criteria
- Estimated effort: 54 hours total (2-3 week timeline)
- Risk assessment and mitigation strategies

🤖 Generated with Claude Code

Co-Authored-By: khaliqgant <[email protected]>
Record trajectory decisions and justifications:
- External sidecar architecture (vs in-relay daemon or hooks)
- 5-phase implementation approach (foundation → intelligence → action)
- LLM-powered pattern analysis for intelligent detection
- 54-hour effort estimate (2-3 week timeline)

Architectural highlights and technical decisions documented.
Confidence: 90% (architecture sound, requirements clear, risks identified)

Edge cases and next steps identified for team reference.
Adds section 3.5 "Activity State Detector" to complement the Pattern Analyzer
in the Progress Tracker Sidecar spec (PR #102).

Key features (inspired by NTM):
- 8 activity states: waiting, thinking, generating, tool_executing, etc.
- Three-signal detection: velocity + patterns + temporal analysis
- CLI-specific patterns for Claude, Codex, Gemini
- Hysteresis to prevent state flicker (2s stability requirement)
- Health score computation from activity signals
- Dashboard display with real-time state indicators

Benefits:
- Fast detection (< 100ms) vs LLM-based analysis (1-5s)
- Deterministic for known patterns, LLM only when needed
- Real-time dashboard updates

Includes beads task breakdown (~8h effort).
@khaliqgant
Copy link
Collaborator

Along with this we should remove the continuity scheduled injections as it causes confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants