diff --git a/AGENTS.md b/AGENTS.md
index e5cb302..a97cb40 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -179,10 +179,13 @@ Specialized agents for development tasks.
MCP server with built-in adapters for AI tools.
-**Adapters:**
+**Adapters (9 tools):**
- **SearchAdapter:** Semantic code search (`dev_search`)
+- **RefsAdapter:** Relationship queries - callers/callees (`dev_refs`)
+- **MapAdapter:** Codebase structure with change frequency (`dev_map`)
+- **HistoryAdapter:** Semantic git commit search (`dev_history`)
- **StatusAdapter:** Repository status (`dev_status`)
-- **PlanAdapter:** Planning from issues (`dev_plan`)
+- **PlanAdapter:** Context assembly for issues (`dev_plan`)
- **ExploreAdapter:** Code exploration (`dev_explore`)
- **GitHubAdapter:** Issue/PR search (`dev_gh`)
- **HealthAdapter:** Server health checks (`dev_health`)
diff --git a/CLAUDE.md b/CLAUDE.md
index 88361a2..b1697df 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -154,16 +154,19 @@ dev mcp install
That's it! Claude Code now has access to all dev-agent capabilities.
-### Available Tools in Claude Code & Cursor
-
-Once installed, AI tools gain access to these powerful capabilities:
-
-- **`dev_search`** - Semantic code search across indexed repositories
-- **`dev_status`** - Repository indexing status and health information
-- **`dev_plan`** - Generate implementation plans from GitHub issues
-- **`dev_explore`** - Explore code patterns, find similar code, analyze relationships
-- **`dev_gh`** - Search GitHub issues and pull requests with semantic context (auto-reloads on index changes)
-- **`dev_health`** - Check MCP server health and component status (vector storage, repository, GitHub index)
+### Available Tools in Claude Code & Cursor (9 tools)
+
+Once installed, AI tools gain access to:
+
+- **`dev_search`** - Semantic code search (USE THIS FIRST for conceptual queries)
+- **`dev_refs`** - Find callers/callees of functions (for specific symbols)
+- **`dev_map`** - Codebase structure with component counts and change frequency
+- **`dev_history`** - Semantic search over git commits (who changed what and why)
+- **`dev_plan`** - Assemble context for GitHub issues (code + history + patterns)
+- **`dev_explore`** - Find similar code, trace relationships
+- **`dev_gh`** - Search GitHub issues/PRs semantically
+- **`dev_status`** - Repository indexing status
+- **`dev_health`** - Server health checks
### MCP Command Reference
diff --git a/PLAN.md b/PLAN.md
index 4785d52..8970494 100644
--- a/PLAN.md
+++ b/PLAN.md
@@ -213,7 +213,7 @@ Git history is valuable context that LLMs can't easily access. We add intelligen
> Building on git history with deeper insights.
-### Tasks
+### Git Tasks
| Task | Priority | Status |
|------|----------|--------|
@@ -222,6 +222,28 @@ Git history is valuable context that LLMs can't easily access. We add intelligen
| Contributor expertise mapping | 🟢 Low | 🔲 Todo |
| Cross-repo history | 🟢 Low | 🔲 Todo |
+### Tool Improvements
+
+| Task | Rationale | Priority | Status |
+|------|-----------|----------|--------|
+| Generalize `dev_plan` → `dev_context` | Currently requires GitHub issue; should work with any task description | 🔴 High | 🔲 Todo |
+| Freeform context assembly | `dev_context "Add rate limiting"` without needing issue # | 🔴 High | 🔲 Todo |
+| Multiple input modes | `--issue 42`, `--file src/auth.ts`, or freeform query | 🟡 Medium | 🔲 Todo |
+
+**Why:** `dev_plan` is really a context assembler but is tightly coupled to GitHub issues. Generalizing it:
+- Works without GitHub
+- Easier to benchmark (no real issues needed)
+- Name matches function (assembles context, doesn't "plan")
+- More useful for ad-hoc implementation tasks
+
+### Benchmark Improvements
+
+| Task | Rationale | Priority | Status |
+|------|-----------|----------|--------|
+| Add implementation task types | Current benchmark only tests exploration; missing `dev_plan`/`dev_gh` coverage | 🟡 Medium | 🔲 Todo |
+| Generic implementation patterns | "Add a new adapter similar to X" — tests pattern discovery | 🟡 Medium | 🔲 Todo |
+| Snapshotted issue tests | Capture real issues for reproducible `dev_plan` testing | 🟢 Low | 🔲 Todo |
+
---
## Future: Extended Intelligence (v0.6+)
@@ -314,23 +336,40 @@ How we know dev-agent is working:
4. **Daily use:** We actually use it ourselves (dogfooding)
5. **LLM effectiveness:** Claude/Cursor make better suggestions with dev-agent
-### Benchmark Results (v0.4.2)
+### Benchmark Results (v0.4.3)
+
+#### By Task Type
+
+| Task Type | Cost Savings | Time Savings | Why |
+|-----------|--------------|--------------|-----|
+| **Debugging** | **42%** | 37% | Semantic search beats grep chains |
+| **Exploration** | **44%** | 19% | Find code by meaning |
+| **Implementation** | **29%** | 22% | Context bundling via `dev_plan` |
+| **Simple lookup** | ~0% | ~0% | Both approaches are fast |
+
+**Key insight:** Savings scale with task complexity.
+
+#### Why It Saves Money
+
+| What dev-agent does | Manual equivalent | Impact |
+|---------------------|-------------------|--------|
+| Returns code snippets in search | Read entire files | 99% fewer input tokens |
+| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction |
+| Semantic search finds relevant code | grep chains + filtering | 42% cost reduction |
-Measured against baseline Claude Code across 5 task types:
+#### Token Analysis (Debugging Task)
-| Metric | Baseline | With dev-agent | Improvement |
-|--------|----------|----------------|-------------|
-| Cost per session | $1.82 | $1.02 | **-44%** |
-| Time per session | 14.1 min | 11.5 min | **-19%** |
-| Tool calls | 69 | 40 | **-42%** |
-| Files examined | 23 | 15 | **-35%** |
+| Metric | Without dev-agent | With dev-agent | Difference |
+|--------|-------------------|----------------|------------|
+| Input tokens | 18,800 | 65 | **99.7% less** |
+| Output tokens | 12,200 | 6,200 | **49% less** |
+| Files read | 10 | 5 | **50% less** |
**Trade-offs identified:**
-- Less thorough for debugging (missing diagnostic commands)
-- Fewer code examples in responses
-- Skips test files (baseline reads them)
+- Baseline provides more diagnostic shell commands
+- Baseline reads more files (sometimes helpful for thoroughness)
-**Target users:** Mid-to-senior engineers who value speed over exhaustiveness for routine exploration tasks.
+**Target users:** Engineers working on complex exploration, debugging, or implementation tasks in large/unfamiliar codebases.
---
@@ -347,4 +386,4 @@ pnpm test
---
-*Last updated: November 2025*
+*Last updated: November 29, 2025 at 02:30 PST*
diff --git a/website/content/docs/index.mdx b/website/content/docs/index.mdx
index de00484..fc03ea4 100644
--- a/website/content/docs/index.mdx
+++ b/website/content/docs/index.mdx
@@ -1,35 +1,46 @@
# Introduction
-**dev-agent** provides semantic code search to AI assistants like Cursor and Claude Code via MCP.
+**dev-agent** provides semantic code search and context bundling to AI assistants like Cursor and Claude Code via MCP.
-We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files. dev-agent gives them a faster path: search by meaning, not keywords.
+We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files and reading entire files to find relevant code. dev-agent gives them a faster path: search by meaning, get code snippets, bundle context.
## What it does
1. **Indexes your codebase** locally with embeddings (all-MiniLM-L6-v2)
-2. **Exposes 9 MCP tools** for semantic search, code relationships, git history
-3. **Integrates with GitHub** to search issues and PRs semantically
+2. **Returns code snippets** — not just file paths, reducing input tokens by 99%
+3. **Bundles context** — `dev_plan` assembles issue + code + commits in one call
+4. **Integrates with GitHub** to search issues and PRs semantically
## Measured impact
-We benchmarked dev-agent against baseline Claude Code:
+We benchmarked dev-agent against baseline Claude Code across different task types:
-| Metric | Baseline | With dev-agent | Change |
-|--------|----------|----------------|--------|
-| Cost | $1.82 | $1.02 | **-44%** |
-| Time | 14.1 min | 11.5 min | **-19%** |
-| Tool calls | 69 | 40 | **-42%** |
+| Task Type | Cost Savings | Time Savings | Why |
+|-----------|--------------|--------------|-----|
+| **Debugging** | **42%** | 37% | Semantic search beats grep chains |
+| **Exploration** | **44%** | 19% | Find code by meaning |
+| **Implementation** | **29%** | 22% | Context bundling via `dev_plan` |
-**Trade-off:** Faster but sometimes less thorough. Best for implementation tasks and exploration. For deep debugging, baseline Claude may read more files.
+**Key insight:** Savings scale with task complexity. Simple lookups show no improvement; complex debugging shows 42% cost reduction.
+
+**Trade-off:** Faster but sometimes less thorough. Baseline Claude provides more diagnostic shell commands.
+
+## Why it saves money
+
+| What dev-agent does | Manual equivalent | Impact |
+|---------------------|-------------------|--------|
+| Returns code snippets in search | Read entire files | 99% fewer input tokens |
+| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction |
+| Semantic search finds relevant code | grep chains + filtering | 42% cost reduction |
## Key Features
| Feature | Description |
|---------|-------------|
+| **Context Bundling** | `dev_plan` replaces 5-10 tool calls with one |
+| **Code Snippets** | Search returns code, not just file paths |
| **Semantic Search** | Find code by meaning, not keywords |
-| **Relationship Queries** | What calls this function? What does it call? |
| **Git History** | Semantic search over commits |
-| **GitHub Integration** | Search issues and PRs semantically |
| **100% Local** | Your code never leaves your machine |
## Architecture
@@ -45,4 +56,3 @@ dev-agent is a monorepo:
- [Installation →](/docs/install) — Get dev-agent installed in under 2 minutes
- [Quickstart →](/docs/quickstart) — Index and search in 5 minutes
-
diff --git a/website/content/index.mdx b/website/content/index.mdx
index 356930a..58fdee3 100644
--- a/website/content/index.mdx
+++ b/website/content/index.mdx
@@ -15,65 +15,112 @@ Local semantic code search for Cursor and Claude Code via MCP.
- **Built by engineers, for engineers.** An MCP server that gives your AI tools semantic code search. We built it to speed up our own workflow — and measured 44% cost savings.
+ **Built by engineers, for engineers.** An MCP server that gives your AI tools semantic code search and context bundling. Savings scale with task complexity — up to 42% on debugging tasks.
+
+
+## Why it saves money
+
+dev-agent doesn't just search — it **bundles context** so Claude reads less:
+
+| What dev-agent does | Manual equivalent | Savings |
+|---------------------|-------------------|---------|
+| Returns code snippets in search | Read entire files | 99% fewer input tokens |
+| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction |
+| Semantic search finds relevant code | grep chains + manual filtering | 42% cost reduction |
+
+**The harder the task, the bigger the savings.**
+
+## Measured results by task type
+
+| Task Type | Cost Savings | Time Savings | Why |
+|-----------|--------------|--------------|-----|
+| **Debugging** | **42%** | 37% | Semantic search beats grep for "where is the bug?" |
+| **Exploration** | **44%** | 19% | Find code by meaning, not keywords |
+| **Implementation** | **29%** | 22% | `dev_plan` bundles context in one call |
+| **Simple lookup** | ~0% | ~0% | Both approaches are fast |
+
+
+ **Trade-off:** Faster but sometimes less thorough. Baseline Claude provides more diagnostic shell commands. dev-agent excels when you need to explore or understand code.
## Same question, different approach
-We asked Claude Code: *"Where is rate limiting implemented and how does it work?"*
+We asked Claude Code: *"Debug why search returns duplicates"*
- **Claude's approach (8 tool calls):**
+ **Claude's approach:**
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
- **Result:** 8 tool calls, 3 files read → **$0.36, 2.1 minutes**
+ **Result:** 18+ tool calls, 10 files read → **$1.37, 12 minutes**
+
+ *18,800 input tokens consumed*
- **Claude's approach (2 tool calls):**
+ **Claude's approach:**
-
-
-
+
+
+
-
+
+
+
+
+
- **Result:** 2 tool calls, 1 file read → **$0.20, 1.3 minutes**
+ **Result:** 6 tool calls, 5 files read → **$0.79, 7.5 minutes**
+
+ *65 input tokens consumed (99.7% less)*
- **Same answer. 44% cheaper. 38% faster.**
+ **Same root cause identified. 42% cheaper. 37% faster.**
-## Measured results
+## Context bundling: `dev_plan`
-We ran 5 task types comparing baseline Claude Code vs. with dev-agent:
+For implementation tasks, `dev_plan` bundles everything in one call:
-| Metric | Baseline | With dev-agent | Change |
-|--------|----------|----------------|--------|
-| Cost | $1.82 | $1.02 | **-44%** |
-| Time | 14.1 min | 11.5 min | **-19%** |
-| Tool calls | 69 | 40 | **-42%** |
-| Files read | 23 | 15 | **-35%** |
-
-
- **Trade-off:** Faster but sometimes less thorough. Baseline Claude read more files for debugging tasks. dev-agent excels at implementation and exploration.
-
+
+
+ **Claude's approach for "Implement issue #61":**
+ ```bash
+ gh issue view 61 --json title,body # Fetch issue
+ grep "--json" -r packages/cli # Find existing flags
+ Read search.ts # Check implementation
+ Read mcp.ts # Check implementation
+ Read config.ts # Check file writes
+ # ... 5+ more tool calls
+ ```
+
+ **Result:** $0.55, 5.7 minutes
+
+
+ **Claude's approach:**
+ ```bash
+ dev_plan --issue 61
+ # Returns in ONE call:
+ # - Issue details + comments
+ # - Relevant code snippets
+ # - Related commits (5 found)
+ # - Codebase patterns
+ ```
+
+ **Result:** $0.39, 4.5 minutes (**29% cheaper**)
+
+
## How it works
@@ -99,11 +146,7 @@ flowchart LR
D <--> E
```
-**The flow:**
-1. Your AI tool asks a question like *"where is auth handled?"*
-2. dev-agent searches the vector database semantically
-3. Returns relevant code with snippets, relationships, and context
-4. All processing happens locally — your code never leaves your machine
+**Key insight:** dev-agent returns **code snippets with context** — Claude doesn't read entire files. This is why input tokens drop by 99%.
## Quick Start
@@ -129,46 +172,15 @@ dev mcp install # For Claude Code
```
-## Example: What dev_search returns
-
-When Claude asks *"where is rate limiting implemented?"*, dev-agent returns:
-
-```typescript
-// dev_search: "rate limiting implementation"
-// Found 2 results
-
-// 1. packages/mcp-server/src/server/utils/rate-limiter.ts
-// Score: 0.89 | Type: Class
-// Callers: AdapterRegistry.executeTool
-
-export class RateLimiter {
- private buckets = new Map();
-
- check(key: string): { allowed: boolean; retryAfter?: number } {
- // Token bucket algorithm implementation
- }
-}
-
-// 2. packages/mcp-server/src/adapters/adapter-registry.ts
-// Score: 0.72 | Type: Function
-
-if (this.rateLimiter) {
- const result = this.rateLimiter.check(toolName);
- if (!result.allowed) return { error: 'Rate limited' };
-}
-```
-
-Claude gets **code snippets + relationships** in one call. No grep chains needed.
-
## 9 MCP Tools
| Tool | What it does |
|------|--------------|
-| [`dev_search`](/docs/tools/dev-search) | Semantic code search — find by meaning, not keywords |
+| [`dev_search`](/docs/tools/dev-search) | Semantic code search — returns snippets, not just paths |
+| [`dev_plan`](/docs/tools/dev-plan) | **Context bundling** — issue + code + commits in one call |
| [`dev_refs`](/docs/tools/dev-refs) | Find callers/callees of any function |
| [`dev_map`](/docs/tools/dev-map) | Codebase structure with change frequency |
| [`dev_history`](/docs/tools/dev-history) | Semantic search over git commits |
-| [`dev_plan`](/docs/tools/dev-plan) | Assemble context for GitHub issues |
| [`dev_explore`](/docs/tools/dev-explore) | Find similar code, trace relationships |
| [`dev_gh`](/docs/tools/dev-gh) | Search GitHub issues/PRs semantically |
| [`dev_status`](/docs/tools/dev-status) | Repository indexing status |
@@ -176,21 +188,21 @@ Claude gets **code snippets + relationships** in one call. No grep chains needed
## When to use it
-| Scenario | dev-agent? | Why |
-|----------|------------|-----|
-| Large/unfamiliar codebase | ✅ Yes | Semantic search beats grep for conceptual queries |
-| Implementation tasks | ✅ Yes | Finds existing code to reuse |
-| Reducing API costs | ✅ Yes | 44% cost reduction measured |
-| Small codebase you know | ❌ Skip | Your mental model is faster |
-| Deep debugging | ⚠️ Maybe | May need more file reads than dev-agent provides |
-| Thoroughness over speed | ⚠️ Maybe | Baseline Claude reads more files |
+| Scenario | dev-agent? | Expected Savings |
+|----------|------------|------------------|
+| Debugging unfamiliar code | ✅ Yes | **42% cost** |
+| Exploring large codebase | ✅ Yes | **44% cost** |
+| Implementing GitHub issues | ✅ Yes | **29% cost** |
+| Small codebase you know | ❌ Skip | ~0% |
+| Need exhaustive file reads | ⚠️ Maybe | Trade speed for thoroughness |
## Features
-- **100% Local** — Code never leaves your machine. No API keys needed.
-- **TypeScript/JS/Markdown** — Full support today. More languages planned.
-- **Sub-second Search** — Fast even on large repos with LanceDB.
-- **1300+ Tests** — Production-grade reliability.
+- **Context Bundling** — `dev_plan` replaces 5-10 tool calls with one
+- **Code Snippets** — Search returns code, not just file paths
+- **100% Local** — Your code never leaves your machine
+- **Sub-second Search** — Fast even on large repos with LanceDB
+- **1379+ Tests** — Production-grade reliability
---