GitHub - Antonio-Tresol/daemon: watches the watchers: a monitoring system for long-running Claude Code agents

daemon

watches the watchers

A monitoring system for long-running Claude Code agents.
_{built by agents, watched by daemon, improved by humans.}

The Problem · What It Does · Harness Engineering · Setup · Architecture

When codebases become optimised harness-engineered systems, agents run for hours across multiple context compaction horizons. They make thousands of tool calls, hit failures they silently recover from, and leave behind sessions that no human has time to read end-to-end.

Someone needs to watch the watchers.

_{the timeline — plans within plans, tasks within tasks, events within events}

The problem

Agents work autonomously for hours. Sometimes while their operators sleep. A single Claude Code session can span hundreds of tool calls, multiple context compactions, and several hours of wall-clock time.

When something goes wrong, you get a failed PR.

When something goes subtly wrong, you get nothing at all — just a session that looks fine on the surface but made poor architectural decisions, repeated work across compaction boundaries, or silently swallowed errors that will compound later.

  What did the agent actually do during that six-hour session?
  Where did it fail, and what evidence exists?
  What patterns keep recurring?
  How can the harness be improved to prevent these failures?

What daemon does

Daemon ingests events from Claude Code sessions via HTTP hooks and OpenTelemetry, then lets you explore what happened at whatever depth you need.

◎ Multi-resolution exploration

Three levels of depth. Pick the one that matches your question.

  ┌─────────────────────────────────────────────────┐
  │  Narrative    what happened, in thirty seconds  │
  ├─────────────────────────────────────────────────┤
  │  Phases       research → implementation →       │
  │               testing → debugging               │
  ├─────────────────────────────────────────────────┤
  │  Events       every tool call, every decision   │
  └─────────────────────────────────────────────────┘

_{drill down — expand any plan to see the raw events underneath}

✗ Failure analysis with evidence

Not just what failed — why, with the receipts. Every failure links back to its source events: the tool calls that preceded it, the error messages, the recovery attempts. The full causal chain.

Classified by impact (critical · warning · info) and type (tool_failure · api_error · logic_error · timeout · permission_denied).

◆ Actionable improvement recommendations

Every failure pattern comes paired with a recommendation targeting the harness itself:

  hooks           auto-enforcement, pre/post-tool hooks
  skills          reusable /commands for common workflows
  subagents       agent teams for parallel work
  tools           MCP servers, integrations
  context         CLAUDE.md, architecture docs
  architecture    layer boundaries, structural lints
  legibility      agent-friendly code organisation

The agent works → daemon watches → the human improves the harness → the agent works better.

● Live session monitoring

_{a live session — 1597 events, tool calls streaming in, failures highlighted in ember}

Harness engineering

Daemon is built on the principles of harness engineering — the discipline of designing environments where agents can do reliable, autonomous work.

The term comes from the infrastructure surrounding a long-running agent: the CLAUDE.md that gives it direction, the hooks that enforce invariants, the tools that extend its capabilities, the feedback loops that catch failures before they compound. Engineering this well is what separates a productive agent from one that drifts.

The foundational ideas draw from:

"Harness engineering: leveraging Codex in an agent-first world" — the engineer's job is no longer writing code but designing environments, specifying intent, and building feedback loops.

"Effective Harnesses for Long-Running Agents" — agents working across context compaction boundaries need structured artifacts to bridge the gap between sessions.

"Long-running Claude for scientific computing" — progress files as portable long-term memory, reference implementations as test oracles, git as the coordination mechanism.

Where harness engineering tells you how to build the environment, daemon tells you how well the environment is working.

See docs/harness-engineering.md for the deep dive.

How it works

Daemon monitors Claude Code sessions through two channels:

  Claude Code ──── HTTP hooks ───→ POST /api/events ───→  ┐
                                                          ├──→ SQLite ──→ Analysis ──→ UI
  Claude Code ──── OpenTelemetry ─→ POST /api/otel ────→  ┘

When you trigger analysis, daemon uses the Claude Agent SDK with structured outputs to process the session's event stream and produces typed timeline, failure, and improvement results.

Get started →

See docs/setup.md for installation and configuration.

Architecture

Domain-Driven Design backend. Feature-Sliced Design frontend. Three colours.

  src/
    server/
      domain/           pure entities, repository interfaces, ports
      application/      use cases orchestrating domain + infrastructure
      infrastructure/   SQLite, Claude Agent SDK runner, GraphQL
    shared/             UI primitives, utilities, hooks
    entities/           entity models and display components
    features/           timeline · failures · improvements · session · harness
    app/                Next.js pages and API routes
    prompts/            LLM prompt templates for analysis

See docs/architecture.md for the full technical breakdown.

Design

  VOID     #0a0a0a    neon black, the deepest background
  BONE     #f0ece5    neon white, warm Anthropic parchment
  EMBER    #d4a574    Claude's warm amber, the only accent

Three colours. Symbols for status. Typography for hierarchy. Nothing else.

See DESIGN.md for the full design system.

Status

Proof of concept · v0.1

Daemon currently monitors Claude Code only. The core timeline, failure analysis, and improvement recommendation features are functional. The system was used to monitor its own development — Claude Code agents built daemon while daemon watched them work.

Documentation

  docs/setup.md                 installation, hook connection, agent API
  docs/architecture.md          DDD backend, FSD frontend, data flows
  docs/harness-engineering.md   founding principles, daemon's position
  DESIGN.md                     three-colour visual language

References

Harness engineering: leveraging Codex in an agent-first world — OpenAI, February 2026
Effective Harnesses for Long-Running Agents — Anthropic, November 2025
Long-running Claude for scientific computing — Anthropic, March 2026

daemon watches the watchers.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.claude		.claude
docs		docs
e2e		e2e
plugin		plugin
public		public
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
docker-compose.yml		docker-compose.yml
e2e-analyze-button.spec.ts		e2e-analyze-button.spec.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

watches the watchers

The problem

What daemon does

◎ Multi-resolution exploration

✗ Failure analysis with evidence

◆ Actionable improvement recommendations

● Live session monitoring

Harness engineering

How it works

Architecture

Design

Status

Documentation

References

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

watches the watchers

The problem

What daemon does

◎ Multi-resolution exploration

✗ Failure analysis with evidence

◆ Actionable improvement recommendations

● Live session monitoring

Harness engineering

How it works

Architecture

Design

Status

Documentation

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages