Skip to content

Zijian-Ni/agent-replay

Repository files navigation

    _                    _     ____            _
   / \   __ _  ___ _ __ | |_  |  _ \ ___ _ __ | | __ _ _   _
  / _ \ / _` |/ _ \ '_ \| __| | |_) / _ \ '_ \| |/ _` | | | |
 / ___ \ (_| |  __/ | | | |_  |  _ <  __/ |_) | | (_| | |_| |
/_/   \_\__, |\___|_| |_|\__| |_| \_\___| .__/|_|\__,_|\__, |
        |___/                            |_|            |___/

Production-grade AI agent execution trace debugger

npm version License: PolyForm Shield Node.js TypeScript Tests Build Bundle Size

Record, replay, and debug every LLM call, tool invocation, and decision your AI agents make. Zero-config. Local-first. No cloud account required.

Getting Started Β· Viewer Demo Β· CLI Β· Docs Β· Changelog


Why Agent Replay?

When your AI agent makes 47 LLM calls, invokes 12 tools, and costs $0.83 per run β€” you need visibility.

The problems:

  • Your agent silently burns tokens on redundant LLM calls, but you can't see which ones
  • A tool call fails deep in a chain, and you have no idea what happened 3 steps before it
  • You optimized a prompt, but did it actually reduce cost? You can't compare runs
  • Your agent works in dev but fails in prod β€” with no trace to debug
  • You need to audit what your agent did, but there's no record

Agent Replay gives you:

  • Full execution trace of every LLM call, tool invocation, and decision
  • Step-through replay with time travel, breakpoints, and speed control
  • Cost analysis and anomaly detection to find optimization opportunities
  • A visual debugger with flame graphs, heatmaps, and timeline views
  • Automatic PII redaction and encryption for production safety

Features

Feature Description
Record Zero-config capture Proxy-based interception for OpenAI & Anthropic SDKs. Streaming support. One line to start.
Replay Time-travel debugging Step-through with breakpoints, speed control, model swapping, checkpoint save/restore.
Analyze Cost & anomaly detection Per-model cost breakdown, token spike detection, error patterns, trace diffing.
View Visual debugger Flame graph, token heatmap, timeline view, minimap, dark/light themes. Export as HTML.
Plugin Extensible Plugin system for LangChain, CrewAI, and custom frameworks.
Compress Space efficient gzip compression reduces trace files by ~80%. Auto-detect on load.
Secure Privacy by default Auto-redaction of API keys, emails, SSNs, credit cards. AES-256-GCM encryption at rest.
Store Flexible backends JSONL (plain or compressed), SQLite, or in-memory. Bring your own storage.

Quick Start

pnpm add @agent-replay/core

Record an OpenAI agent

import { AgentRecorder, interceptOpenAI } from '@agent-replay/core';
import OpenAI from 'openai';

const recorder = new AgentRecorder({ name: 'my-agent', storage: 'file' });
const openai = interceptOpenAI(new OpenAI(), recorder);

// Use openai as normal β€” every call is recorded automatically
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Explain quantum computing in one sentence.' }],
});

const trace = await recorder.stop();
console.log(`Recorded ${trace.summary!.totalSteps} steps, cost: $${trace.summary!.totalCost.toFixed(4)}`);

Record streaming responses

import { AgentRecorder, interceptOpenAIStream } from '@agent-replay/core';
import OpenAI from 'openai';

const recorder = new AgentRecorder({ name: 'streaming-agent', storage: 'file' });
const openai = interceptOpenAIStream(new OpenAI(), recorder);

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Write a haiku' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

await recorder.stop(); // Stream chunks are recorded with timing

Record Anthropic agents

import { AgentRecorder, interceptAnthropic } from '@agent-replay/core';
import Anthropic from '@anthropic-ai/sdk';

const recorder = new AgentRecorder({ name: 'claude-agent', storage: 'file' });
const anthropic = interceptAnthropic(new Anthropic(), recorder);

const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'What is the meaning of life?' }],
});

await recorder.stop();

Use plugins (LangChain, CrewAI)

import { AgentRecorder, PluginRegistry, langchainPlugin } from '@agent-replay/core';

const registry = new PluginRegistry();
registry.register(langchainPlugin);

const recorder = new AgentRecorder({ name: 'langchain-agent' });
registry.attach(recorder);

// Wrap your LangChain chain
const chain = registry.createInterceptor('langchain', myChain, recorder);
const result = await chain.invoke('my input');

Replay with breakpoints and speed control

import { AgentReplayer, JsonlStorage } from '@agent-replay/core';

const storage = new JsonlStorage('./traces');
const data = await storage.load('trace-id');
const replayer = new AgentReplayer(data!, {
  speed: 2, // 2x speed
  breakpoints: [
    { type: 'cost', costThreshold: 0.50 },  // Stop when cost exceeds $0.50
    { type: 'error' },                        // Stop on any error
    { type: 'tool', toolName: 'web_search' }, // Stop when web_search is called
  ],
  modelSubstitutions: { 'gpt-4': 'gpt-3.5-turbo' }, // Swap models for cost comparison
  onBreakpoint: (bp, step) => console.log(`Breakpoint hit: ${bp.type} at ${step.name}`),
});

replayer.saveCheckpoint('start');
const steps = await replayer.replayAll();

// If paused at breakpoint, resume
if (replayer.isPaused) {
  await replayer.resume();
}

Analyze costs and detect anomalies

import { analyzeCost, detectAnomalies } from '@agent-replay/core';

const costs = analyzeCost(trace);
console.log(`Total: $${costs.total.toFixed(4)}`);
for (const [model, data] of Object.entries(costs.byModel)) {
  console.log(`  ${model}: ${data.calls} calls, $${data.cost.toFixed(4)}`);
}

const anomalies = detectAnomalies(trace);
anomalies.forEach(a => console.log(`[${a.severity}] ${a.message}`));

Compress traces (~80% savings)

import { CompressedJsonlStorage, compressDirectory } from '@agent-replay/core';

// Use compressed storage backend
const storage = new CompressedJsonlStorage('./traces', true);
await storage.save(serializedTrace); // Saves as .jsonl.gz

// Or batch compress existing files
const result = await compressDirectory('./traces');
console.log(`Compressed ${result.compressed.length} files, saved ${result.savedBytes} bytes`);

Architecture

graph TB
    subgraph "Your Application"
        A[Agent Code] --> B[OpenAI / Anthropic SDK]
    end

    subgraph "@agent-replay/core"
        C[Interceptors] --> D[AgentRecorder]
        C2[Streaming] --> D
        P[Plugins] --> C
        D --> E[Redactor]
        D --> F[Encryption]
        D --> G[Storage]
        G --> G1[JSONL]
        G --> G2[Compressed JSONL]
        G --> G3[SQLite]
        G --> G4[Memory]
        H[AgentReplayer]
        H --> H1[Breakpoints]
        H --> H2[Checkpoints]
        H --> H3[Model Swap]
        I[Analysis]
        I --> I1[Cost]
        I --> I2[Diff]
        I --> I3[Timeline]
        I --> I4[Anomaly]
    end

    subgraph "Tools"
        J["@agent-replay/viewer<br/>Flame Graph Β· Heatmap Β· Timeline"]
        K["@agent-replay/cli<br/>record Β· serve Β· compress Β· validate"]
    end

    B -.->|proxied| C
    G1 --> J
    G1 --> K
    G1 --> H
Loading

Packages

Package Description Version
@agent-replay/core Recording, replay, analysis, plugins, compression, security 0.2.0
@agent-replay/viewer Web viewer with flame graph, heatmap, timeline, dark/light themes 0.2.0
@agent-replay/cli CLI: record, serve, view, diff, stats, compress, validate, merge, export 0.2.0

Viewer

The web viewer provides three visualization modes:

  • Timeline β€” Step-by-step cards with duration bars, color-coded by type
  • Flame Graph β€” Chrome DevTools-style timing visualization
  • Heatmap β€” Token usage intensity map (cool blue β†’ hot red)

Plus: minimap navigation, keyboard shortcuts (press ?), dark/light themes, export as shareable HTML.

CLI

# Record an agent session
agent-replay record -- node my-agent.js

# Serve the viewer with live API
agent-replay serve --port 3000 --dir ./traces

# View traces in terminal
agent-replay view ./traces/

# Compare two trace runs
agent-replay diff trace-a.jsonl trace-b.jsonl

# Show cost and token statistics
agent-replay stats ./traces/

# Compress trace files (~80% disk savings)
agent-replay compress ./traces/

# Validate trace file schema
agent-replay validate trace.jsonl

# Merge multiple traces into one
agent-replay merge trace1.jsonl trace2.jsonl -o merged.jsonl

# Export trace to standalone HTML
agent-replay export trace.jsonl --format html

Comparison

Feature Agent Replay Langfuse LangSmith Phoenix
Open source Yes Partial No Yes
Self-hosted / local-first Yes Yes No Yes
No cloud account required Yes No No No
Works offline Yes No No No
SDK-level recording Yes Yes Yes Yes
Streaming support Yes Yes Yes Yes
Plugin system Yes No No No
Automatic PII redaction Yes No No No
Encryption at rest (AES-256-GCM) Yes No No No
Trace replay / time-travel Yes No No No
Conditional breakpoints Yes No No No
Model substitution in replay Yes No No No
Trace diffing Yes No No No
Flame graph visualization Yes No No No
Token heatmap Yes No No No
Trace compression (gzip) Yes No No No
Export as shareable HTML Yes No No No
Built-in web viewer Yes Yes Yes Yes
CLI tool Yes No Yes No
Cost analysis Yes Yes Yes Yes
Anomaly detection Yes No No No
Zero config Yes No No No
Free forever Yes Freemium Freemium Yes

Performance

Operation Metric
Record overhead per step < 0.5ms
Replay 1000 steps (instant) < 50ms
Compress 10MB trace ~200ms (~80% reduction)
Load compressed trace ~100ms
Memory per 1000 steps ~2MB

Examples

Documentation

Roadmap

  • Real-time recording mode via WebSocket (watch agent execute live)
  • Trace comparison overlay (superimpose two traces on same timeline)
  • Worker thread for embedding computation
  • VS Code extension for in-editor trace viewing
  • OpenTelemetry export support
  • Shared trace URLs (signed, expiring links)
  • Cost optimization suggestions (auto-detect where cheaper models would suffice)

Development

git clone https://github.com/Zijian-Ni/agent-replay.git
cd agent-replay
pnpm install
pnpm build
pnpm test      # 150+ tests
pnpm lint

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

PolyForm Shield License 1.0.0 β€” Copyright (c) 2026 Zijian Ni (ε€ͺ子ε₯)

Built With

TypeScript Β· Vite Β· Vitest Β· tsup Β· Turborepo

About

πŸ”„ Record, replay, and debug AI agent execution traces β€” the DevTools for AI agents

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors