Skip to content

Commit da8bc09

Browse files
feat: smart-suggest ROI script + hook tuning + guide updates (Mar 16)
- Add examples/scripts/smart-suggest-roi.py: stdlib-only analyzer correlating suggestion log with session JSONL files to measure command acceptance rate. 4 acceptance signals, tier breakdown, daily trend, --json/--since/--no-sessions CLI. - Tune Aristote smart-suggest hook: tighten 5 over-firing triggers (/tech:commit, /tech:sonarqube, /tech:dupes, /check-conventions a11y, /tech:worktree) - Guide: identity re-injection hook, context engineering maturity grid, code review workflow, 1M context window GA update, Spring Break promo, security audit patterns - Resource evaluations: Nick Tune hooks (3/5), VicKayro security audit (2/5), Karl Mazier CLAUDE.md templates, Paul Rayner ContextFlow, Siddhant agent trace, Andrew Yng context hub, JP Caparas 1M context window Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent d9cff74 commit da8bc09

19 files changed

+1963
-6
lines changed

CHANGELOG.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,39 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
88

99
### Added
1010

11+
- **Failure-triggered context drift pattern** (`guide/core/architecture.md` §Session Degradation Limits): New subsection documenting a distinct degradation mode from compaction drift — repeated tool failures accumulate error noise that dilutes the original intent without filling the context window. Pattern: re-inject core task instructions on every command failure via `PostToolUse` hook, not just after `/compact`. Source: Nick Tune (2026-03-01). Resource evaluation: `docs/resource-evaluations/2026-03-16-nick-tune-workflow-dsl-ddd.md` (score 3/5 — 1 of 3 patterns integrated).
12+
13+
- **Identity re-injection after compaction** (`guide/ultimate-guide.md` §7.5 + `examples/hooks/bash/identity-reinjection.sh`): New hook pattern from Nick Tune (Feb 2026). Solves agent identity drift after context compaction in long sessions — `UserPromptSubmit` hook reads transcript, detects missing identity marker in last assistant message, re-injects `.claude/agent-identity.txt` as `additionalContext`. Configurable via `CLAUDE_IDENTITY_FILE` and `CLAUDE_IDENTITY_MARKER` env vars. `reference.yaml` updated with `identity_reinjection_hook` + `identity_reinjection_example` keys.
14+
15+
- **Security audit hardening — 3 patterns** (`examples/commands/security-audit.md`, `examples/agents/security-auditor.md`): (1) Pre-step added to `/security-audit`: asks dev/staging/prod before running — avoids false positives on debug flags and CORS `*` in local dev. (2) Anti-false-positive rule in Phase 2 (secrets scan): mandates running `git log --all -p` and checking `.gitignore` before raising any secret finding — no more findings based on pattern matching alone. (3) Paywall/billing checklist added to `security-auditor.md` under A04 Insecure Design: server-side limit enforcement, subscription status from DB, webhook signature verification, billing bypass endpoints, race conditions on resource creation.
16+
17+
- **Resource evaluation: VicKayro — claude-security-audit** (`docs/resource-evaluations/2026-03-16-vickairo-claude-security-audit.md`): Score 2/5. Single-file `/security-audit` command, OWASP Top 10 (2021) + 16 sections, MIT, 60 stars (18 days old). Substantial overlap with existing `security-audit.md`, `security-auditor.md`, and `security-hardening.md`. Genuine gaps: paywall/billing audit section (not covered anywhere), environment context pre-step (dev/staging/prod before auditing), and stricter anti-false-positive pattern for secrets (mandate `git log --all -p` proof before raising finding). Decision: extract 3 patterns into existing commands silently, no guide mention, revisit at 200+ stars.
18+
19+
- **Resource evaluation: Nick Tune — Hook-Driven Dev Workflows** (`docs/resource-evaluations/2026-03-16-nick-tune-hook-driven-workflows.md`): Score 3/5. Covers hooks-as-workflow-engine pattern: typed state machine (Zod), per-state SubagentStart context injection, agent respawn for fresh context windows, identity re-injection after compaction, JSON workflow persistence. Key gap confirmed: guide lacks identity re-injection after compaction + per-state SubagentStart injection. Tiered integration: identity re-injection → §7.5 now; SubagentStart injection → agent-teams.md (3-4 weeks); full state machine guide deferred 60-90 days (1 week of author testing, needs community validation). Prerequisites: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1, Opus 4.6, Node.js + TypeScript.
20+
21+
- **1M context window status update** (`guide/ultimate-guide.md` lines ~2021-2070): Updated from "beta" to GA for Max/Team/Enterprise Claude Code plans (v2.1.75, March 13 2026). Preserved distinction: direct API use still requires tier 4 / custom rate limits. Pricing table updated to reflect standard rates for plan users.
22+
23+
- **Code Review feature** (`guide/workflows/code-review.md` + cross-reference in `guide/ultimate-guide.md`): New workflow guide for Anthropic's Code Review research preview (Teams/Enterprise). Covers: multi-agent architecture and severity levels (🔴/🟡/🟣), full setup flow (admin URL `claude.ai/admin-settings/claude-code`, GitHub App permissions, 3 trigger modes — once/every push/manual), `@claude review` manual trigger, `REVIEW.md` schema with example, pricing model ($15-25 avg, billed via extra usage outside plan, spend cap at `claude.ai/admin-settings/usage`), analytics dashboard, and cross-links to manual CLI workflows + GitLab CI/CD. Verified against official docs at `code.claude.com/docs/en/code-review`.
24+
25+
- **Context engineering guide — 3 additions** (`guide/core/context-engineering.md`):
26+
- **"Most failures are context failures"** framing added to §1 Why It Matters — reframes troubleshooting from "the AI is bad" to "what's missing from context"
27+
- **Static vs. Dynamic context** — new subsection distinguishing CLAUDE.md (static) from runtime tool outputs and agent context (dynamic); includes reference to Anthropic's September 2025 engineering post on agent context engineering
28+
- **Maturity assessment §9** — Level 0-5 self-assessment grid grounded in Claude Code patterns (no CLAUDE.md → flat config → structured → modular → measured → full system); includes "what to do at each level" action table
29+
1130
- **Spring Break promotion note** (guide line ~2395): Documented Anthropic's March 13-27, 2026 promotion — 2x usage limits outside 5-11am PT (peak hours) and all weekends, bonus usage doesn't count against weekly limits, applies to Free/Pro/Max/Team. Includes CET timezone conversion for European users (2x from midnight-13h and 19h-24h France time). Source: Anthropic support article.
31+
32+
- **Smart-Suggest ROI script** (`examples/scripts/smart-suggest-roi.py`): Python stdlib-only analyzer for the `smart-suggest` UserPromptSubmit hook. Correlates suggestion log (`~/.claude/logs/smart-suggest.jsonl`) with session JSONL files to estimate command acceptance rate. Detects 4 acceptance signals: slash command tags, Skill tool use, Agent tool use, and text mention in next 5 user messages. Reports: summary, tier breakdown (Enforcement/Discovery/Contextual/Custom), top suggested/followed commands, never-followed list, and daily trend chart. CLI: `--since Nd`, `--no-sessions` (fast mode), `--json`, `--log PATH`.
1233
- **ICM (Infinite Context Memory)**: New MCP memory server section after Kairn (~line 11365) — Rust single binary, zero deps, Homebrew install, dual architecture (episodic decay Memories + permanent knowledge graph Memoirs), 9 typed relation types, auto-extraction 3 layers, 14 editor clients. Score 3/5 — recommended as Rust-native alternative when Python dependency management is a friction point. Includes explicit license callout (Source-Available, free ≤20 people) and vendor-reported benchmark flags.
1334
- **Comparison matrix update**: Added ICM column to MCP memory stack matrix (Runtime + License rows added for all tools)
1435

1536
### Documentation
1637

38+
- **Resource evaluation** (rejected, no file): LinkedIn post "Five Levels of Context Engineering" by Matthew Alverson (via Addy Osmani) — score 1/5, rejected. Content is a pedagogical reformulation of concepts already covered with more rigor in `guide/core/context-engineering.md`. Alverson's 5-level taxonomy is not empirically grounded and not widely cited in the literature. Evaluation surfaced 3 real gaps now addressed (see Added section above). Better primary sources identified: Anthropic Engineering Blog (Sept 2025), MCP Maturity Model (Mitra, Nov 2025).
39+
40+
- **Resource evaluation** (no file — text digest): Anthropic weekly recap March 9-15, 2026 (5 Claude Code releases, Code Review launch, 1M GA, Spring Break promo, corporate news) — score 4/5. Two gaps actioned: (1) Code Review product feature added as `guide/workflows/code-review.md`; (2) 1M context status updated from beta to GA in `guide/ultimate-guide.md` lines 2021-2070. Source reliability note: digest incorrectly attributes Claude Code changelog to `anthropics/anthropic-sdk-python` (correct repo: `anthropics/claude-code`); Code Review pricing ($15-25/PR) verified against official docs.
41+
42+
- **Resource evaluation** (`docs/resource-evaluations/eval-claude-1m-context-window-jp-caparas.md`): JP Caparas article on 1M token context window — score 2/5, do not integrate. Central claim (flat pricing, no surcharge above 200K tokens) is factually wrong; invalidates the competitive pricing analysis. Fact-check table, comparative analysis vs guide, and independent action items (verify 1M GA status, potential update to guide lines 2028-2070 on beta/GA status).
43+
1744
- **Claude Code Releases**: Updated tracking to v2.1.76
1845
- MCP elicitation support — servers request structured input mid-task via interactive dialog
1946
- New hooks: `Elicitation`, `ElicitationResult`, `PostCompact`
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Evaluation: agent-trace (Siddhant-K-code/agent-trace)
2+
3+
**Date**: 2026-03-16
4+
**Source**: https://github.com/Siddhant-K-code/agent-trace
5+
**Type**: GitHub repository (Python tool)
6+
**Evaluator**: Claude (eval-resource skill)
7+
8+
---
9+
10+
## Summary
11+
12+
`agent-trace` (pip package: `agent-strace`) is a Python tool — zero dependencies, stdlib only — that captures every tool call, user prompt, and assistant response in Claude Code via hooks, then lets you replay sessions in the terminal or export as OpenTelemetry spans. Created 2026-03-15. 7 stars at time of evaluation.
13+
14+
The "strace for AI agents" framing is apt: it solves the "my agent modified 47 files and I have no idea why" problem by giving you a time-stamped, replayable record of every decision point.
15+
16+
---
17+
18+
## Key Points
19+
20+
- **Claude Code hooks**: Setup via `agent-strace setup`. Registers PreToolUse, PostToolUse, PostToolUseFailure, UserPromptSubmit, Stop, SessionStart, SessionEnd in `.claude/settings.json`
21+
- **Session replay**: `agent-strace replay` shows full session with timestamps, durations, tool inputs, errors — the missing layer between JSONL and understanding
22+
- **MCP proxy**: Wraps any MCP server (stdio or HTTP/SSE). Works with Cursor, Windsurf, any MCP client
23+
- **OpenTelemetry export**: OTLP output → Datadog, Honeycomb, New Relic, Splunk
24+
- **Python decorator API**: `@trace_tool`, `@trace_llm_call`, `log_decision()` for custom agents
25+
- **Secret redaction**: `--redact` flag strips OpenAI, GitHub, AWS, Anthropic, Slack, JWTs, Bearer tokens, connection strings
26+
27+
---
28+
29+
## Relevance Score: 2/5
30+
31+
**Pertinent but too immature for immediate integration.**
32+
33+
The session replay angle is real and not covered by existing tools in the guide. But `claude-code-otel` already handles the OTel export use case, and the manual jq queries at `guide/ops/observability.md:519-550` cover most of the audit use case. The unique differentiator — interactive replay — needs production validation before being recommended to readers.
34+
35+
---
36+
37+
## Comparison vs Current Guide Coverage
38+
39+
| Aspect | agent-trace | Guide coverage |
40+
|--------|-------------|----------------|
41+
| Manual JSONL audit (jq) | ✅ Abstracted as CLI | ✅ observability.md:520 |
42+
| Session replay (visual) | ✅ Unique differentiator | ❌ Not covered |
43+
| OpenTelemetry export | ✅ OTLP | ✅ claude-code-otel already in table |
44+
| Hook setup automation |`agent-strace setup` | ✅ Documented manually |
45+
| MCP proxy (Cursor/Windsurf) | ✅ stdio + HTTP/SSE | ❌ Not covered |
46+
| Python decorator API | ✅ Custom agents | ❌ Not covered |
47+
| Maturity | ❌ 1 day old, 7 stars | ✅ Table tools have 100-10K stars |
48+
49+
---
50+
51+
## Challenge Notes (technical-writer review)
52+
53+
**Score should be 2/5, not 3/5.** Reasons:
54+
55+
1. `claude-code-otel` already exports to Datadog/Honeycomb. The OTel angle is not additive.
56+
2. The jq queries at observability.md:519-550 cover most of the audit use case already. The "replay niche" is thinner than it appears.
57+
3. ICM (1 star) was put on watch list. Agent-trace at 7 stars deserves the same treatment.
58+
59+
**Missing aspects not in initial analysis**:
60+
61+
- **MCP proxy = MITM risk**: Routing all MCP traffic through an unaudited HTTP/SSE proxy is a security surface. The guide has a full hardening section — adding this to the monitoring table without flagging would be inconsistent.
62+
- **Secret redaction unverified**: Base64-encoded tokens, multi-line .env values, AWS temporary credentials — edge cases not tested. Could create false confidence.
63+
- **Python decorator API vs MLflow SDK**: MLflow has versioning + experiment tracking + LLM-as-judge. Agent-trace has lower friction. Real trade-off not mentioned.
64+
65+
**On placement**: If integrated, not in the External Monitoring Tools table (that's monitoring, not debugging). Better as a footnote in the JSONL section (~observability.md:565) as "a higher-level wrapper for session replay."
66+
67+
**Risk of NOT integrating**: Near zero. The jq queries + claude-code-otel cover the primary use cases. Real risk runs the other direction: adding a 1-day-old tool that goes unmaintained = dead link in a table readers use for tooling decisions.
68+
69+
---
70+
71+
## Fact-Check
72+
73+
| Claim | Verified | Source |
74+
|-------|----------|--------|
75+
| Zero dependencies, Python stdlib only || pyproject.toml + README |
76+
| Created 2026-03-15 || GitHub API: `created_at: 2026-03-15T08:09:45Z` |
77+
| MIT licensed || GitHub API: `license: MIT License` |
78+
| Captures all CC hook events || README hooks JSON: all 7 event types |
79+
| Export to Datadog, Honeycomb, Splunk || README: `export --to otlp` (OTLP compatible) |
80+
| 7 stars at evaluation || GitHub API 2026-03-16 |
81+
82+
No hallucinations detected. All stats confirmed against source.
83+
84+
---
85+
86+
## Decision
87+
88+
**Action: Watch list**
89+
**Integration trigger**: 100+ stars AND at least one practitioner write-up showing real production use.
90+
91+
**If triggered**: Add as footnote in observability.md ~line 565 (JSONL section), not in the External Monitoring Tools table. Frame as "higher-level wrapper for session replay/debug" distinct from the monitoring tools.
92+
93+
**Why watch list and not reject**: Session replay is a real gap. Zero-deps Python is a genuine adoption differentiator. The engineering quality looks solid (automated setup, secret redaction, HTTP/SSE proxy). Just needs time to prove reliability on real sessions.
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Resource Evaluation: Context Hub (andrewyng/context-hub)
2+
3+
**Date**: 2026-03-16
4+
**Source**: LinkedIn post (text) + https://github.com/andrewyng/context-hub
5+
**Type**: Open-source CLI tool
6+
**Author**: Andrew Ng (andrewyng)
7+
**Score**: 2/5
8+
9+
---
10+
11+
## Summary of Content
12+
13+
- **What it is**: A CLI tool (`chub`) providing coding agents with curated, versioned API documentation as markdown files
14+
- **Core commands**: `chub get openai/chat --lang py` to fetch API docs; `chub annotate <id> "note"` for persistent cross-session annotations
15+
- **Corpus**: 602+ documentation entries (as of 2026-03-16), covering OpenAI, Anthropic, Stripe, AWS, and others
16+
- **Community loop**: Users vote on doc quality (`chub feedback`), surfacing improvements to maintainers
17+
- **Claude Code integration**: SKILL.md support for dropping into `~/.claude/skills/`
18+
- **License**: MIT, 6,342 stars
19+
20+
---
21+
22+
## Score: 2/5
23+
24+
**Justification**: One genuinely novel feature (cross-session persistent annotations on external API docs) that Context7 cannot replicate. Everything else overlaps with existing guide coverage: Context7 already handles versioned library docs, `@url` natively pulls live documentation into Claude Code context, and anti-hallucination patterns are already documented. The annotation use case is real but solves a narrow problem. No production benchmarks, no independent validation.
25+
26+
---
27+
28+
## Comparative Analysis
29+
30+
| Aspect | Context Hub | Our Guide |
31+
|--------|------------|-----------|
32+
| Curated API docs for agents | New CLI approach | Not covered as dedicated tool |
33+
| Cross-session doc annotations | Unique feature | Not covered |
34+
| Official library docs lookup | Overlaps with Context7 | Covered (Section 8, Context7) |
35+
| Live URL context | Overlaps with native `@url` | Covered (native Claude Code) |
36+
| Agent hallucination prevention | Indirect angle | Covered but scattered |
37+
| Maintenance/freshness guarantees | Community-maintained, lag risk | N/A |
38+
39+
---
40+
41+
## Challenge Notes (technical-writer agent)
42+
43+
**Key pushbacks:**
44+
45+
1. **Stars ≠ adoption**: 6,342 stars driven by Andrew Ng's social amplification, not production validation
46+
2. **Context7 overlap not demonstrated**: `chub get openai/chat --lang py` vs Context7's `query-docs` — the evaluation doesn't prove the concrete gap
47+
3. **Annotation is the only novel angle**: and it got buried — it's the one feature Context7 cannot replicate
48+
4. **Hallucination framing is a stretch**: community-maintained docs introduce a trust problem Context7 avoids (official sources)
49+
5. **Missing: `@url` native alternative**: Claude Code already pulls live docs natively, weakening the "gap" case
50+
6. **Missing: maintenance risk**: update lag when APIs change vs. Context7's live resolution
51+
7. **Risk of not integrating**: Low — existing guide coverage (Context7, `@url`, grepai) handles most use cases
52+
53+
---
54+
55+
## Fact-Check
56+
57+
| Claim (from LinkedIn post) | Verdict | Notes |
58+
|---------------------------|---------|-------|
59+
| "Andrew Ng just dropped" | Verified | Repo owner is `andrewyng`, not a fork |
60+
| "68+ APIs" | False | Actual corpus: 602+ entries as of 2026-03-16 |
61+
| "One of the fastest accelerating new repos" | Unverifiable | 6,342 stars in ~5 months; no public velocity data |
62+
| "100% free & open source (MIT)" | Verified | MIT confirmed in license file |
63+
64+
**Corrections**: The "68+ APIs" figure is either from an early snapshot or fabricated. Real coverage is ~9x larger. The LinkedIn post is marketing-inflated.
65+
66+
---
67+
68+
## Recommendation
69+
70+
**Action**: Do not integrate — one-line mention only.
71+
72+
If mentioned at all, one sentence under the Context7 entry in Section 8 (MCP servers): "For teams requiring persistent annotations on external API docs across sessions, see [context-hub](https://github.com/andrewyng/context-hub)."
73+
74+
No section, no dedicated coverage, no hallucination-prevention framing. Revisit if production use cases emerge in the community.
75+
76+
**Confidence**: High (fact-check complete, challenge addressed)

0 commit comments

Comments
 (0)