FlorianBruniaux
diff --git a/‎CHANGELOG.md‎
Lines changed: 27 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎docs/resource-evaluations/2026-03-16-agent-trace-siddhant-github.md‎
Lines changed: 93 additions & 0 deletions b/‎docs/resource-evaluations/2026-03-16-agent-trace-siddhant-github.md‎
Lines changed: 93 additions & 0 deletions
diff --git a/‎docs/resource-evaluations/2026-03-16-andrewyng-context-hub.md‎
Lines changed: 76 additions & 0 deletions b/‎docs/resource-evaluations/2026-03-16-andrewyng-context-hub.md‎
Lines changed: 76 additions & 0 deletions
@@ -8,12 +8,39 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
 ### Added
 
+- **Failure-triggered context drift pattern** (`guide/core/architecture.md` §Session Degradation Limits): New subsection documenting a distinct degradation mode from compaction drift — repeated tool failures accumulate error noise that dilutes the original intent without filling the context window. Pattern: re-inject core task instructions on every command failure via `PostToolUse` hook, not just after `/compact`. Source: Nick Tune (2026-03-01). Resource evaluation: `docs/resource-evaluations/2026-03-16-nick-tune-workflow-dsl-ddd.md` (score 3/5 — 1 of 3 patterns integrated).
+
+- **Identity re-injection after compaction** (`guide/ultimate-guide.md` §7.5 + `examples/hooks/bash/identity-reinjection.sh`): New hook pattern from Nick Tune (Feb 2026). Solves agent identity drift after context compaction in long sessions — `UserPromptSubmit` hook reads transcript, detects missing identity marker in last assistant message, re-injects `.claude/agent-identity.txt` as `additionalContext`. Configurable via `CLAUDE_IDENTITY_FILE` and `CLAUDE_IDENTITY_MARKER` env vars. `reference.yaml` updated with `identity_reinjection_hook` + `identity_reinjection_example` keys.
+
+- **Security audit hardening — 3 patterns** (`examples/commands/security-audit.md`, `examples/agents/security-auditor.md`): (1) Pre-step added to `/security-audit`: asks dev/staging/prod before running — avoids false positives on debug flags and CORS `*` in local dev. (2) Anti-false-positive rule in Phase 2 (secrets scan): mandates running `git log --all -p` and checking `.gitignore` before raising any secret finding — no more findings based on pattern matching alone. (3) Paywall/billing checklist added to `security-auditor.md` under A04 Insecure Design: server-side limit enforcement, subscription status from DB, webhook signature verification, billing bypass endpoints, race conditions on resource creation.
+
+- **Resource evaluation: VicKayro — claude-security-audit** (`docs/resource-evaluations/2026-03-16-vickairo-claude-security-audit.md`): Score 2/5. Single-file `/security-audit` command, OWASP Top 10 (2021) + 16 sections, MIT, 60 stars (18 days old). Substantial overlap with existing `security-audit.md`, `security-auditor.md`, and `security-hardening.md`. Genuine gaps: paywall/billing audit section (not covered anywhere), environment context pre-step (dev/staging/prod before auditing), and stricter anti-false-positive pattern for secrets (mandate `git log --all -p` proof before raising finding). Decision: extract 3 patterns into existing commands silently, no guide mention, revisit at 200+ stars.
+
+- **Resource evaluation: Nick Tune — Hook-Driven Dev Workflows** (`docs/resource-evaluations/2026-03-16-nick-tune-hook-driven-workflows.md`): Score 3/5. Covers hooks-as-workflow-engine pattern: typed state machine (Zod), per-state SubagentStart context injection, agent respawn for fresh context windows, identity re-injection after compaction, JSON workflow persistence. Key gap confirmed: guide lacks identity re-injection after compaction + per-state SubagentStart injection. Tiered integration: identity re-injection → §7.5 now; SubagentStart injection → agent-teams.md (3-4 weeks); full state machine guide deferred 60-90 days (1 week of author testing, needs community validation). Prerequisites: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1, Opus 4.6, Node.js + TypeScript.
+
+- **1M context window status update** (`guide/ultimate-guide.md` lines ~2021-2070): Updated from "beta" to GA for Max/Team/Enterprise Claude Code plans (v2.1.75, March 13 2026). Preserved distinction: direct API use still requires tier 4 / custom rate limits. Pricing table updated to reflect standard rates for plan users.
+
+- **Code Review feature** (`guide/workflows/code-review.md` + cross-reference in `guide/ultimate-guide.md`): New workflow guide for Anthropic's Code Review research preview (Teams/Enterprise). Covers: multi-agent architecture and severity levels (🔴/🟡/🟣), full setup flow (admin URL `claude.ai/admin-settings/claude-code`, GitHub App permissions, 3 trigger modes — once/every push/manual), `@claude review` manual trigger, `REVIEW.md` schema with example, pricing model ($15-25 avg, billed via extra usage outside plan, spend cap at `claude.ai/admin-settings/usage`), analytics dashboard, and cross-links to manual CLI workflows + GitLab CI/CD. Verified against official docs at `code.claude.com/docs/en/code-review`.
+
+- **Context engineering guide — 3 additions** (`guide/core/context-engineering.md`):
+  - **"Most failures are context failures"** framing added to §1 Why It Matters — reframes troubleshooting from "the AI is bad" to "what's missing from context"
+  - **Static vs. Dynamic context** — new subsection distinguishing CLAUDE.md (static) from runtime tool outputs and agent context (dynamic); includes reference to Anthropic's September 2025 engineering post on agent context engineering
+  - **Maturity assessment §9** — Level 0-5 self-assessment grid grounded in Claude Code patterns (no CLAUDE.md → flat config → structured → modular → measured → full system); includes "what to do at each level" action table
+
 - **Spring Break promotion note** (guide line ~2395): Documented Anthropic's March 13-27, 2026 promotion — 2x usage limits outside 5-11am PT (peak hours) and all weekends, bonus usage doesn't count against weekly limits, applies to Free/Pro/Max/Team. Includes CET timezone conversion for European users (2x from midnight-13h and 19h-24h France time). Source: Anthropic support article.
+
+- **Smart-Suggest ROI script** (`examples/scripts/smart-suggest-roi.py`): Python stdlib-only analyzer for the `smart-suggest` UserPromptSubmit hook. Correlates suggestion log (`~/.claude/logs/smart-suggest.jsonl`) with session JSONL files to estimate command acceptance rate. Detects 4 acceptance signals: slash command tags, Skill tool use, Agent tool use, and text mention in next 5 user messages. Reports: summary, tier breakdown (Enforcement/Discovery/Contextual/Custom), top suggested/followed commands, never-followed list, and daily trend chart. CLI: `--since Nd`, `--no-sessions` (fast mode), `--json`, `--log PATH`.
 - **ICM (Infinite Context Memory)**: New MCP memory server section after Kairn (~line 11365) — Rust single binary, zero deps, Homebrew install, dual architecture (episodic decay Memories + permanent knowledge graph Memoirs), 9 typed relation types, auto-extraction 3 layers, 14 editor clients. Score 3/5 — recommended as Rust-native alternative when Python dependency management is a friction point. Includes explicit license callout (Source-Available, free ≤20 people) and vendor-reported benchmark flags.
 - **Comparison matrix update**: Added ICM column to MCP memory stack matrix (Runtime + License rows added for all tools)
 
 ### Documentation
 
+- **Resource evaluation** (rejected, no file): LinkedIn post "Five Levels of Context Engineering" by Matthew Alverson (via Addy Osmani) — score 1/5, rejected. Content is a pedagogical reformulation of concepts already covered with more rigor in `guide/core/context-engineering.md`. Alverson's 5-level taxonomy is not empirically grounded and not widely cited in the literature. Evaluation surfaced 3 real gaps now addressed (see Added section above). Better primary sources identified: Anthropic Engineering Blog (Sept 2025), MCP Maturity Model (Mitra, Nov 2025).
+
+- **Resource evaluation** (no file — text digest): Anthropic weekly recap March 9-15, 2026 (5 Claude Code releases, Code Review launch, 1M GA, Spring Break promo, corporate news) — score 4/5. Two gaps actioned: (1) Code Review product feature added as `guide/workflows/code-review.md`; (2) 1M context status updated from beta to GA in `guide/ultimate-guide.md` lines 2021-2070. Source reliability note: digest incorrectly attributes Claude Code changelog to `anthropics/anthropic-sdk-python` (correct repo: `anthropics/claude-code`); Code Review pricing ($15-25/PR) verified against official docs.
+
+- **Resource evaluation** (`docs/resource-evaluations/eval-claude-1m-context-window-jp-caparas.md`): JP Caparas article on 1M token context window — score 2/5, do not integrate. Central claim (flat pricing, no surcharge above 200K tokens) is factually wrong; invalidates the competitive pricing analysis. Fact-check table, comparative analysis vs guide, and independent action items (verify 1M GA status, potential update to guide lines 2028-2070 on beta/GA status).
+
 - **Claude Code Releases**: Updated tracking to v2.1.76
   - MCP elicitation support — servers request structured input mid-task via interactive dialog
   - New hooks: `Elicitation`, `ElicitationResult`, `PostCompact`
 
@@ -0,0 +1,93 @@
+# Evaluation: agent-trace (Siddhant-K-code/agent-trace)
+
+**Date**: 2026-03-16
+**Source**: https://github.com/Siddhant-K-code/agent-trace
+**Type**: GitHub repository (Python tool)
+**Evaluator**: Claude (eval-resource skill)
+
+---
+
+## Summary
+
+`agent-trace` (pip package: `agent-strace`) is a Python tool — zero dependencies, stdlib only — that captures every tool call, user prompt, and assistant response in Claude Code via hooks, then lets you replay sessions in the terminal or export as OpenTelemetry spans. Created 2026-03-15. 7 stars at time of evaluation.
+
+The "strace for AI agents" framing is apt: it solves the "my agent modified 47 files and I have no idea why" problem by giving you a time-stamped, replayable record of every decision point.
+
+---
+
+## Key Points
+
+- **Claude Code hooks**: Setup via `agent-strace setup`. Registers PreToolUse, PostToolUse, PostToolUseFailure, UserPromptSubmit, Stop, SessionStart, SessionEnd in `.claude/settings.json`
+- **Session replay**: `agent-strace replay` shows full session with timestamps, durations, tool inputs, errors — the missing layer between JSONL and understanding
+- **MCP proxy**: Wraps any MCP server (stdio or HTTP/SSE). Works with Cursor, Windsurf, any MCP client
+- **OpenTelemetry export**: OTLP output → Datadog, Honeycomb, New Relic, Splunk
+- **Python decorator API**: `@trace_tool`, `@trace_llm_call`, `log_decision()` for custom agents
+- **Secret redaction**: `--redact` flag strips OpenAI, GitHub, AWS, Anthropic, Slack, JWTs, Bearer tokens, connection strings
+
+---
+
+## Relevance Score: 2/5
+
+**Pertinent but too immature for immediate integration.**
+
+The session replay angle is real and not covered by existing tools in the guide. But `claude-code-otel` already handles the OTel export use case, and the manual jq queries at `guide/ops/observability.md:519-550` cover most of the audit use case. The unique differentiator — interactive replay — needs production validation before being recommended to readers.
+
+---
+
+## Comparison vs Current Guide Coverage
+
+| Aspect | agent-trace | Guide coverage |
+|--------|-------------|----------------|
+| Manual JSONL audit (jq) | ✅ Abstracted as CLI | ✅ observability.md:520 |
+| Session replay (visual) | ✅ Unique differentiator | ❌ Not covered |
+| OpenTelemetry export | ✅ OTLP | ✅ claude-code-otel already in table |
+| Hook setup automation | ✅ `agent-strace setup` | ✅ Documented manually |
+| MCP proxy (Cursor/Windsurf) | ✅ stdio + HTTP/SSE | ❌ Not covered |
+| Python decorator API | ✅ Custom agents | ❌ Not covered |
+| Maturity | ❌ 1 day old, 7 stars | ✅ Table tools have 100-10K stars |
+
+---
+
+## Challenge Notes (technical-writer review)
+
+**Score should be 2/5, not 3/5.** Reasons:
+
+1. `claude-code-otel` already exports to Datadog/Honeycomb. The OTel angle is not additive.
+2. The jq queries at observability.md:519-550 cover most of the audit use case already. The "replay niche" is thinner than it appears.
+3. ICM (1 star) was put on watch list. Agent-trace at 7 stars deserves the same treatment.
+
+**Missing aspects not in initial analysis**:
+
+- **MCP proxy = MITM risk**: Routing all MCP traffic through an unaudited HTTP/SSE proxy is a security surface. The guide has a full hardening section — adding this to the monitoring table without flagging would be inconsistent.
+- **Secret redaction unverified**: Base64-encoded tokens, multi-line .env values, AWS temporary credentials — edge cases not tested. Could create false confidence.
+- **Python decorator API vs MLflow SDK**: MLflow has versioning + experiment tracking + LLM-as-judge. Agent-trace has lower friction. Real trade-off not mentioned.
+
+**On placement**: If integrated, not in the External Monitoring Tools table (that's monitoring, not debugging). Better as a footnote in the JSONL section (~observability.md:565) as "a higher-level wrapper for session replay."
+
+**Risk of NOT integrating**: Near zero. The jq queries + claude-code-otel cover the primary use cases. Real risk runs the other direction: adding a 1-day-old tool that goes unmaintained = dead link in a table readers use for tooling decisions.
+
+---
+
+## Fact-Check
+
+| Claim | Verified | Source |
+|-------|----------|--------|
+| Zero dependencies, Python stdlib only | ✅ | pyproject.toml + README |
+| Created 2026-03-15 | ✅ | GitHub API: `created_at: 2026-03-15T08:09:45Z` |
+| MIT licensed | ✅ | GitHub API: `license: MIT License` |
+| Captures all CC hook events | ✅ | README hooks JSON: all 7 event types |
+| Export to Datadog, Honeycomb, Splunk | ✅ | README: `export --to otlp` (OTLP compatible) |
+| 7 stars at evaluation | ✅ | GitHub API 2026-03-16 |
+
+No hallucinations detected. All stats confirmed against source.
+
+---
+
+## Decision
+
+**Action: Watch list**
+**Integration trigger**: 100+ stars AND at least one practitioner write-up showing real production use.
+
+**If triggered**: Add as footnote in observability.md ~line 565 (JSONL section), not in the External Monitoring Tools table. Frame as "higher-level wrapper for session replay/debug" distinct from the monitoring tools.
+
+**Why watch list and not reject**: Session replay is a real gap. Zero-deps Python is a genuine adoption differentiator. The engineering quality looks solid (automated setup, secret redaction, HTTP/SSE proxy). Just needs time to prove reliability on real sessions.
@@ -0,0 +1,76 @@
+# Resource Evaluation: Context Hub (andrewyng/context-hub)
+
+**Date**: 2026-03-16
+**Source**: LinkedIn post (text) + https://github.com/andrewyng/context-hub
+**Type**: Open-source CLI tool
+**Author**: Andrew Ng (andrewyng)
+**Score**: 2/5
+
+---
+
+## Summary of Content
+
+- **What it is**: A CLI tool (`chub`) providing coding agents with curated, versioned API documentation as markdown files
+- **Core commands**: `chub get openai/chat --lang py` to fetch API docs; `chub annotate <id> "note"` for persistent cross-session annotations
+- **Corpus**: 602+ documentation entries (as of 2026-03-16), covering OpenAI, Anthropic, Stripe, AWS, and others
+- **Community loop**: Users vote on doc quality (`chub feedback`), surfacing improvements to maintainers
+- **Claude Code integration**: SKILL.md support for dropping into `~/.claude/skills/`
+- **License**: MIT, 6,342 stars
+
+---
+
+## Score: 2/5
+
+**Justification**: One genuinely novel feature (cross-session persistent annotations on external API docs) that Context7 cannot replicate. Everything else overlaps with existing guide coverage: Context7 already handles versioned library docs, `@url` natively pulls live documentation into Claude Code context, and anti-hallucination patterns are already documented. The annotation use case is real but solves a narrow problem. No production benchmarks, no independent validation.
+
+---
+
+## Comparative Analysis
+
+| Aspect | Context Hub | Our Guide |
+|--------|------------|-----------|
+| Curated API docs for agents | New CLI approach | Not covered as dedicated tool |
+| Cross-session doc annotations | Unique feature | Not covered |
+| Official library docs lookup | Overlaps with Context7 | Covered (Section 8, Context7) |
+| Live URL context | Overlaps with native `@url` | Covered (native Claude Code) |
+| Agent hallucination prevention | Indirect angle | Covered but scattered |
+| Maintenance/freshness guarantees | Community-maintained, lag risk | N/A |
+
+---
+
+## Challenge Notes (technical-writer agent)
+
+**Key pushbacks:**
+
+1. **Stars ≠ adoption**: 6,342 stars driven by Andrew Ng's social amplification, not production validation
+2. **Context7 overlap not demonstrated**: `chub get openai/chat --lang py` vs Context7's `query-docs` — the evaluation doesn't prove the concrete gap
+3. **Annotation is the only novel angle**: and it got buried — it's the one feature Context7 cannot replicate
+4. **Hallucination framing is a stretch**: community-maintained docs introduce a trust problem Context7 avoids (official sources)
+5. **Missing: `@url` native alternative**: Claude Code already pulls live docs natively, weakening the "gap" case
+6. **Missing: maintenance risk**: update lag when APIs change vs. Context7's live resolution
+7. **Risk of not integrating**: Low — existing guide coverage (Context7, `@url`, grepai) handles most use cases
+
+---
+
+## Fact-Check
+
+| Claim (from LinkedIn post) | Verdict | Notes |
+|---------------------------|---------|-------|
+| "Andrew Ng just dropped" | Verified | Repo owner is `andrewyng`, not a fork |
+| "68+ APIs" | False | Actual corpus: 602+ entries as of 2026-03-16 |
+| "One of the fastest accelerating new repos" | Unverifiable | 6,342 stars in ~5 months; no public velocity data |
+| "100% free & open source (MIT)" | Verified | MIT confirmed in license file |
+
+**Corrections**: The "68+ APIs" figure is either from an early snapshot or fabricated. Real coverage is ~9x larger. The LinkedIn post is marketing-inflated.
+
+---
+
+## Recommendation
+
+**Action**: Do not integrate — one-line mention only.
+
+If mentioned at all, one sentence under the Context7 entry in Section 8 (MCP servers): "For teams requiring persistent annotations on external API docs across sessions, see [context-hub](https://github.com/andrewyng/context-hub)."
+
+No section, no dedicated coverage, no hallucination-prevention framing. Revisit if production use cases emerge in the community.
+
+**Confidence**: High (fact-check complete, challenge addressed)