|
| 1 | +# Evaluation: agent-trace (Siddhant-K-code/agent-trace) |
| 2 | + |
| 3 | +**Date**: 2026-03-16 |
| 4 | +**Source**: https://github.com/Siddhant-K-code/agent-trace |
| 5 | +**Type**: GitHub repository (Python tool) |
| 6 | +**Evaluator**: Claude (eval-resource skill) |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Summary |
| 11 | + |
| 12 | +`agent-trace` (pip package: `agent-strace`) is a Python tool — zero dependencies, stdlib only — that captures every tool call, user prompt, and assistant response in Claude Code via hooks, then lets you replay sessions in the terminal or export as OpenTelemetry spans. Created 2026-03-15. 7 stars at time of evaluation. |
| 13 | + |
| 14 | +The "strace for AI agents" framing is apt: it solves the "my agent modified 47 files and I have no idea why" problem by giving you a time-stamped, replayable record of every decision point. |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## Key Points |
| 19 | + |
| 20 | +- **Claude Code hooks**: Setup via `agent-strace setup`. Registers PreToolUse, PostToolUse, PostToolUseFailure, UserPromptSubmit, Stop, SessionStart, SessionEnd in `.claude/settings.json` |
| 21 | +- **Session replay**: `agent-strace replay` shows full session with timestamps, durations, tool inputs, errors — the missing layer between JSONL and understanding |
| 22 | +- **MCP proxy**: Wraps any MCP server (stdio or HTTP/SSE). Works with Cursor, Windsurf, any MCP client |
| 23 | +- **OpenTelemetry export**: OTLP output → Datadog, Honeycomb, New Relic, Splunk |
| 24 | +- **Python decorator API**: `@trace_tool`, `@trace_llm_call`, `log_decision()` for custom agents |
| 25 | +- **Secret redaction**: `--redact` flag strips OpenAI, GitHub, AWS, Anthropic, Slack, JWTs, Bearer tokens, connection strings |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## Relevance Score: 2/5 |
| 30 | + |
| 31 | +**Pertinent but too immature for immediate integration.** |
| 32 | + |
| 33 | +The session replay angle is real and not covered by existing tools in the guide. But `claude-code-otel` already handles the OTel export use case, and the manual jq queries at `guide/ops/observability.md:519-550` cover most of the audit use case. The unique differentiator — interactive replay — needs production validation before being recommended to readers. |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Comparison vs Current Guide Coverage |
| 38 | + |
| 39 | +| Aspect | agent-trace | Guide coverage | |
| 40 | +|--------|-------------|----------------| |
| 41 | +| Manual JSONL audit (jq) | ✅ Abstracted as CLI | ✅ observability.md:520 | |
| 42 | +| Session replay (visual) | ✅ Unique differentiator | ❌ Not covered | |
| 43 | +| OpenTelemetry export | ✅ OTLP | ✅ claude-code-otel already in table | |
| 44 | +| Hook setup automation | ✅ `agent-strace setup` | ✅ Documented manually | |
| 45 | +| MCP proxy (Cursor/Windsurf) | ✅ stdio + HTTP/SSE | ❌ Not covered | |
| 46 | +| Python decorator API | ✅ Custom agents | ❌ Not covered | |
| 47 | +| Maturity | ❌ 1 day old, 7 stars | ✅ Table tools have 100-10K stars | |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Challenge Notes (technical-writer review) |
| 52 | + |
| 53 | +**Score should be 2/5, not 3/5.** Reasons: |
| 54 | + |
| 55 | +1. `claude-code-otel` already exports to Datadog/Honeycomb. The OTel angle is not additive. |
| 56 | +2. The jq queries at observability.md:519-550 cover most of the audit use case already. The "replay niche" is thinner than it appears. |
| 57 | +3. ICM (1 star) was put on watch list. Agent-trace at 7 stars deserves the same treatment. |
| 58 | + |
| 59 | +**Missing aspects not in initial analysis**: |
| 60 | + |
| 61 | +- **MCP proxy = MITM risk**: Routing all MCP traffic through an unaudited HTTP/SSE proxy is a security surface. The guide has a full hardening section — adding this to the monitoring table without flagging would be inconsistent. |
| 62 | +- **Secret redaction unverified**: Base64-encoded tokens, multi-line .env values, AWS temporary credentials — edge cases not tested. Could create false confidence. |
| 63 | +- **Python decorator API vs MLflow SDK**: MLflow has versioning + experiment tracking + LLM-as-judge. Agent-trace has lower friction. Real trade-off not mentioned. |
| 64 | + |
| 65 | +**On placement**: If integrated, not in the External Monitoring Tools table (that's monitoring, not debugging). Better as a footnote in the JSONL section (~observability.md:565) as "a higher-level wrapper for session replay." |
| 66 | + |
| 67 | +**Risk of NOT integrating**: Near zero. The jq queries + claude-code-otel cover the primary use cases. Real risk runs the other direction: adding a 1-day-old tool that goes unmaintained = dead link in a table readers use for tooling decisions. |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## Fact-Check |
| 72 | + |
| 73 | +| Claim | Verified | Source | |
| 74 | +|-------|----------|--------| |
| 75 | +| Zero dependencies, Python stdlib only | ✅ | pyproject.toml + README | |
| 76 | +| Created 2026-03-15 | ✅ | GitHub API: `created_at: 2026-03-15T08:09:45Z` | |
| 77 | +| MIT licensed | ✅ | GitHub API: `license: MIT License` | |
| 78 | +| Captures all CC hook events | ✅ | README hooks JSON: all 7 event types | |
| 79 | +| Export to Datadog, Honeycomb, Splunk | ✅ | README: `export --to otlp` (OTLP compatible) | |
| 80 | +| 7 stars at evaluation | ✅ | GitHub API 2026-03-16 | |
| 81 | + |
| 82 | +No hallucinations detected. All stats confirmed against source. |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +## Decision |
| 87 | + |
| 88 | +**Action: Watch list** |
| 89 | +**Integration trigger**: 100+ stars AND at least one practitioner write-up showing real production use. |
| 90 | + |
| 91 | +**If triggered**: Add as footnote in observability.md ~line 565 (JSONL section), not in the External Monitoring Tools table. Frame as "higher-level wrapper for session replay/debug" distinct from the monitoring tools. |
| 92 | + |
| 93 | +**Why watch list and not reject**: Session replay is a real gap. Zero-deps Python is a genuine adoption differentiator. The engineering quality looks solid (automated setup, secret redaction, HTTP/SSE proxy). Just needs time to prove reliability on real sessions. |
0 commit comments