Skip to content

Latest commit

 

History

History
295 lines (196 loc) · 15.5 KB

File metadata and controls

295 lines (196 loc) · 15.5 KB

📊 Oversight & Analytics

Wallfacer provides a suite of monitoring and analytics tools that help you understand what agents are doing during task execution, how long each phase takes, and how much each task costs. This guide covers oversight summaries, live log streaming, execution timelines, usage tracking, and the stats dashboard.

📋 Key Concepts

  • Oversight -- An AI-generated high-level summary of an agent's activity during a task, organized into logical phases (e.g., "Reading codebase", "Implementing feature", "Running tests"). Each phase lists the tools used, commands run, and key actions taken.
  • Spans -- Timed intervals recorded during task execution. Each span captures a discrete phase of work such as a single agent turn, container startup, worktree setup, or commit pipeline. Spans power both the flamegraph visualization and the span statistics table.
  • Turns -- Individual execution cycles of the agent. Each turn corresponds to one prompt-response round trip. Per-turn usage records track token consumption and cost for every turn.
  • Usage -- Token and cost accounting broken down by sub-agent activity (implementation, testing, refinement, title generation, oversight, commit message, ideation).

✅ Essentials

👁️ Oversight Summaries

Oversight summaries are structured reports that describe what an agent did during a task in human-readable form. Rather than reading through raw logs, you can review a concise phase-by-phase breakdown of the agent's work.

Each phase in an oversight summary includes:

  • A descriptive title (e.g., "Analyzing repository structure")
  • A short summary of what happened
  • Timestamp information for timeline placement

Oversight summaries are generated automatically when a task reaches the waiting, done, or failed state. The server launches a lightweight agent container that reads the task's activity log and produces a structured summary.

Test verification runs also produce their own separate oversight summary, covering only the test agent's activity.

Viewing Oversight

Open a task by clicking its card, then look at the Implementation tab on the left side of the detail modal. The log viewer has three modes, selectable via tabs at the top:

  • Oversight -- The structured phase-by-phase summary. This is the default view when a ready summary exists.
  • Pretty -- Parsed and syntax-highlighted agent output.
  • Raw -- Unprocessed agent output with ANSI codes stripped.

If the task has a test run, the Testing tab provides the same three modes for the test agent's output and oversight.

Oversight Statuses

The oversight summary has one of four statuses:

Status Meaning
pending Not yet generated
generating Agent container is running to produce the summary
ready Summary is available and displayed
failed Generation failed (hover for error details)

🖥️ Live Log Monitoring

While a task is running, the detail modal streams live container output in real time. The log viewer connects via a streaming HTTP response and updates incrementally as new output arrives.

For completed tasks, saved turn output files are served from disk instead of a live stream.

Switch between log display modes using the tabs above the log panel:

  • Oversight -- Shows the oversight summary (if available).
  • Pretty -- Parses NDJSON output and renders it with syntax highlighting, tool calls, and formatted content. This is the default for tasks without a ready oversight summary.
  • Raw -- Shows the raw text output with ANSI escape codes stripped.

💰 Checking Task Costs

Every task tracks token consumption and cost across all its turns:

  • Input tokens -- Tokens sent to the model
  • Output tokens -- Tokens generated by the model
  • Cache read tokens -- Tokens served from the prompt cache
  • Cache creation tokens -- Tokens written to the prompt cache
  • Total cost (USD) -- Accumulated dollar cost

This information is visible in the task detail modal and on task cards.

🔍 Basic Search

Header Search Bar

The search bar in the header filters visible cards on the board. Type a query to match against task titles, prompts, and tags. The filter is applied client-side and updates the board in real time.

  • Prefix words with # to filter by tag (e.g., #refactor shows only tasks with that tag).
  • Press / to focus the search bar from anywhere on the page.
  • Click the clear button or press Escape to reset the filter.

Command Palette

Press Cmd+K (macOS) or Ctrl+K (Windows/Linux) to open the command palette. This provides quick fuzzy-search access to all tasks and contextual actions:

  • Type to search tasks by title, prompt, or ID prefix.
  • Use arrow keys to navigate results, Enter to execute, Escape to close.
  • Prefix with @ for server-side search (same as the header search bar).
  • When a task is selected, contextual actions appear below (Start, Run Test, Mark Done, Resume, Retry, Archive, Sync, Open Flamegraph, Open Timeline, etc.) depending on the task's current state.

🔧 Advanced Topics

Periodic Oversight Generation

If WALLFACER_OVERSIGHT_INTERVAL is set to a positive number, the server generates intermediate oversight summaries at that interval (in minutes) while a task runs. Set to 0 (the default) to generate summaries only when the task finishes.

Bulk Oversight Generation

If you have older tasks that were completed before oversight was enabled, or tasks where oversight generation failed, you can trigger bulk generation from the API:

POST /api/tasks/generate-oversight?limit=10

This queues up to limit eligible tasks (those in a terminal state with at least one turn but no ready oversight) for background generation. The response reports how many were queued and the total number still without oversight.

Customizing the Oversight Prompt

The system prompt used to generate oversight summaries is based on the built-in oversight.tmpl template. You can customize it via Settings > System Prompts in the UI or the API (PUT /api/system-prompts/oversight). Delete the override to restore the default.

Log Search and Filtering

In Pretty and Raw modes, a search bar appears above the log panel. Type a query to filter log lines -- only lines containing the search term are shown. A counter displays how many lines matched out of the total. Matches are highlighted in the output.

The search bar is hidden in Oversight mode since the structured view is not line-based.

Truncation

Large outputs are capped at 10,000 lines in the browser to prevent memory issues. When this limit is reached, a notice appears with a link to download the full log. The server also enforces an 8 MB per-turn output limit; a banner warns when server-side truncation has occurred.

🐳 Container Monitor

Click the sandbox monitor button in the header to open the Container Monitor modal. This shows all running Wallfacer containers with:

  • Container ID (short form)
  • Associated task (with title and status badge)
  • Container name
  • State (running, exited, paused, created, dead) with a color indicator
  • Detailed status
  • Creation time (relative)

The list auto-refreshes every 5 seconds while the modal is open. Click the refresh button for an immediate update.

🔥 Flamegraph (Spans Tab)

The task detail modal has two visualization tabs on the right side: Spans (flamegraph) and Timeline.

The Spans tab renders an interactive flamegraph-style visualization of execution timing. It displays:

  • Time axis -- Horizontal axis showing elapsed time from task start, with tick marks at 0%, 25%, 50%, 75%, and 100% of the execution duration.
  • Oversight phase band -- When an oversight summary is available, a row of colored blocks shows the high-level phases across the timeline. Hover over a phase block to see its title and description.
  • Span blocks -- Each span (agent turn, container run, worktree setup, commit pipeline) is rendered as a colored block positioned on the timeline. Blocks are packed into lanes to avoid overlap. Hover over a span to see its label, raw identifier, start offset, duration, and associated oversight phase.
  • Cumulative cost chart -- Below the flamegraph, an SVG line chart shows how cost accumulated over time, with colored dots indicating which sub-agent activity incurred each cost increment.
  • Detail table -- Below the chart, a table lists all spans sorted by duration (longest first) with columns for span name, activity type, oversight phase, start offset, duration, and percentage of total time.

Idle gaps between activity are compressed in the visualization so that long waits (e.g., for user feedback) do not distort the timeline. Compressed gaps are indicated by hatched regions with the idle duration labeled.

⏱️ Timeline Tab

The Timeline tab shows a chronological execution chart that updates in real time for running tasks. It provides a different perspective from the flamegraph, focusing on the sequential flow of execution phases.

📈 Span Statistics (Global)

Click the span statistics button in the header to open the Span Stats modal. This aggregates timing data across all tasks and shows:

  • Throughput summary tiles -- Completed tasks, failed tasks, success rate, median execution time, and P95 execution time.
  • Daily completions chart -- A mini bar chart showing task completions per day over the last 30 days.
  • Phase statistics table -- For each execution phase (worktree setup, agent turn, container run, commit pipeline), the table shows:
    • Number of occurrences (runs)
    • Minimum duration
    • Median (P50) with a proportional bar
    • Mean duration
    • P95 duration
    • P99 duration
    • Maximum duration

Duration values are color-coded: green for under 5 seconds, amber for 5-30 seconds, red for over 30 seconds.

Per-Sub-Agent Usage Breakdown

Usage is further broken down by the type of agent activity that incurred it:

Sub-Agent Description
implementation Main task execution agent
test Test verification agent
refinement Prompt refinement agent
title Automatic title generation
oversight Oversight summary generation
oversight-test Test oversight summary generation
commit_message Commit message generation
idea_agent Brainstorm/ideation agent

Per-Turn Usage

For granular analysis, the per-turn usage endpoint provides a record for each individual turn:

GET /api/tasks/{id}/turn-usage

Each record includes the sub-agent type, timestamp, token counts, and cost for that specific turn.

🚫 Budget Enforcement

Set per-task cost and token limits to prevent runaway execution:

  • Max Cost (USD) -- The task is stopped when accumulated cost exceeds this threshold.
  • Max Input Tokens -- The task is stopped when cumulative input and cache tokens exceed this limit.

When a budget is exceeded, the task fails with a budget_exceeded failure category. A banner in the task detail modal shows the exceeded limit with a button to raise it and retry.

Set either limit to 0 (the default) for unlimited execution.

Usage Statistics Modal

Click the usage button in the header to open the Usage Statistics modal. It provides:

  • Period selector -- Filter by last 7 days, 30 days, 90 days, or all time.
  • Summary bar -- Task count, selected period, and total cost.
  • By Status table -- Token and cost totals grouped by task status (done, failed, waiting, etc.), each with a colored status badge.
  • By Sub-Agent table -- Token and cost totals grouped by agent activity type.

📊 Stats Dashboard

Click the stats button in the header to open the Stats modal. This provides a comprehensive analytics dashboard with:

  • Summary tiles -- Total cost, input tokens, output tokens, and cache tokens across all tasks.
  • Daily spend chart -- A bar chart showing cost per day over the last 30 calendar days. Today's bar is highlighted in blue. Days with no activity show as empty.
  • By Status table -- Input tokens, output tokens, and cost grouped by task status.
  • By Activity table -- Input tokens, output tokens, and cost grouped by sub-agent type (implementation, test, refinement, title, oversight, oversight-test), sorted in a logical order.
  • By Workspace table -- When multiple workspaces are active, shows task count, token totals, and cost per workspace. Workspaces are sorted by cost (highest first). Hover over a workspace name to see the full path.
  • Top 10 Tasks by Cost -- The most expensive tasks with their title (clickable to open the task), status, and cost. Useful for identifying outlier tasks.

The stats endpoint also supports workspace-scoped queries:

GET /api/stats?workspace=/path/to/repo

Task Summaries

For completed (done) tasks, the server caches immutable usage summaries that are cheaper to aggregate than re-reading the full task data. The summaries endpoint is available for building external cost dashboards:

GET /api/tasks/summaries

Server-Side Search

Prefix your query with @ to trigger server-side full-text search. This searches across all tasks (including archived) and matches against task ID, title, prompt, tags, and oversight summaries. Results appear in a dropdown panel below the search bar, each showing the matched field and a context snippet. Click a result to open that task's detail modal.

Server-side search requires at least 2 characters after the @ prefix and debounces requests to avoid excessive API calls.

API Endpoints

Endpoint Method Description
/api/tasks/{id}/oversight GET Get implementation oversight summary
/api/tasks/{id}/oversight/test GET Get test oversight summary
/api/tasks/generate-oversight POST Bulk-generate missing oversight summaries
/api/tasks/{id}/logs GET Stream live logs (SSE) or serve saved output
/api/tasks/{id}/turn-usage GET Per-turn token usage breakdown
/api/tasks/{id}/spans GET Span timing data for flamegraph
/api/debug/spans GET Aggregate span statistics across all tasks
/api/usage GET Aggregated usage stats (period-filtered)
/api/stats GET Full analytics dashboard data
/api/tasks/summaries GET Immutable summaries for done tasks
/api/containers GET List running containers
/api/tasks/search GET Server-side full-text search

Configuration Variables

Variable Default Description
WALLFACER_OVERSIGHT_INTERVAL 0 Minutes between periodic oversight generation while a task runs. 0 = generate only at completion.
WALLFACER_MAX_PARALLEL 5 Maximum concurrent tasks (affects throughput metrics)

Keyboard Shortcuts

Shortcut Context Action
n Board Open new task form
/ Board Focus the search bar
Cmd+K / Ctrl+K Board Open command palette
` Board Toggle terminal panel
? Board Show keyboard shortcuts help
Escape Any Close topmost modal or blur search bar
Ctrl+Enter / Cmd+Enter New task form Save task
Escape New task form Cancel
Enter / Space Focused card Open task detail
Arrow keys Focused card Navigate between cards
s Focused backlog card Start task
d Focused waiting card Mark as done

Board shortcuts are suppressed when focus is in a text input or when a modal is open.


See also: Usage Guide for task lifecycle and board operations, Getting Started for setup, Circuit Breakers for automation safety.