Skip to content

Commit 903e539

Browse files
committed
docs: update website with implementation benchmark findings and v0.5.0 roadmap
Website changes: - Highlight context bundling as core value prop - Add 'scales with complexity' messaging (42% for debugging, 29% for implementation) - Show 99% input token reduction from code snippets - Add dev_plan context bundling comparison - Update benchmark results with task-type breakdown PLAN.md: - Add v0.5.0 roadmap with dev_context generalization - Add benchmark improvements for implementation task coverage - Update benchmark results with token analysis AGENTS.md & CLAUDE.md: - Update to show all 9 MCP tools (was missing dev_refs, dev_map, dev_history) - Improve tool descriptions to match v0.4.2 updates Benchmark data from studies/: - Debugging: 42% cost savings, 37% time savings - Implementation: 29% cost savings, 22% time savings - Exploration: 44% cost savings, 19% time savings
1 parent ddb2eb3 commit 903e539

File tree

3 files changed

+151
-112
lines changed

3 files changed

+151
-112
lines changed

PLAN.md

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -336,23 +336,40 @@ How we know dev-agent is working:
336336
4. **Daily use:** We actually use it ourselves (dogfooding)
337337
5. **LLM effectiveness:** Claude/Cursor make better suggestions with dev-agent
338338

339-
### Benchmark Results (v0.4.2)
339+
### Benchmark Results (v0.4.3)
340340

341-
Measured against baseline Claude Code across 5 task types:
341+
#### By Task Type
342342

343-
| Metric | Baseline | With dev-agent | Improvement |
344-
|--------|----------|----------------|-------------|
345-
| Cost per session | $1.82 | $1.02 | **-44%** |
346-
| Time per session | 14.1 min | 11.5 min | **-19%** |
347-
| Tool calls | 69 | 40 | **-42%** |
348-
| Files examined | 23 | 15 | **-35%** |
343+
| Task Type | Cost Savings | Time Savings | Why |
344+
|-----------|--------------|--------------|-----|
345+
| **Debugging** | **42%** | 37% | Semantic search beats grep chains |
346+
| **Exploration** | **44%** | 19% | Find code by meaning |
347+
| **Implementation** | **29%** | 22% | Context bundling via `dev_plan` |
348+
| **Simple lookup** | ~0% | ~0% | Both approaches are fast |
349+
350+
**Key insight:** Savings scale with task complexity.
351+
352+
#### Why It Saves Money
353+
354+
| What dev-agent does | Manual equivalent | Impact |
355+
|---------------------|-------------------|--------|
356+
| Returns code snippets in search | Read entire files | 99% fewer input tokens |
357+
| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction |
358+
| Semantic search finds relevant code | grep chains + filtering | 42% cost reduction |
359+
360+
#### Token Analysis (Debugging Task)
361+
362+
| Metric | Without dev-agent | With dev-agent | Difference |
363+
|--------|-------------------|----------------|------------|
364+
| Input tokens | 18,800 | 65 | **99.7% less** |
365+
| Output tokens | 12,200 | 6,200 | **49% less** |
366+
| Files read | 10 | 5 | **50% less** |
349367

350368
**Trade-offs identified:**
351-
- Less thorough for debugging (missing diagnostic commands)
352-
- Fewer code examples in responses
353-
- Skips test files (baseline reads them)
369+
- Baseline provides more diagnostic shell commands
370+
- Baseline reads more files (sometimes helpful for thoroughness)
354371

355-
**Target users:** Mid-to-senior engineers who value speed over exhaustiveness for routine exploration tasks.
372+
**Target users:** Engineers working on complex exploration, debugging, or implementation tasks in large/unfamiliar codebases.
356373

357374
---
358375

@@ -369,4 +386,4 @@ pnpm test
369386

370387
---
371388

372-
*Last updated: November 29, 2025 at 01:42 PST*
389+
*Last updated: November 29, 2025 at 02:30 PST*

website/content/docs/index.mdx

Lines changed: 24 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,46 @@
11
# Introduction
22

3-
**dev-agent** provides semantic code search to AI assistants like Cursor and Claude Code via MCP.
3+
**dev-agent** provides semantic code search and context bundling to AI assistants like Cursor and Claude Code via MCP.
44

5-
We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files. dev-agent gives them a faster path: search by meaning, not keywords.
5+
We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files and reading entire files to find relevant code. dev-agent gives them a faster path: search by meaning, get code snippets, bundle context.
66

77
## What it does
88

99
1. **Indexes your codebase** locally with embeddings (all-MiniLM-L6-v2)
10-
2. **Exposes 9 MCP tools** for semantic search, code relationships, git history
11-
3. **Integrates with GitHub** to search issues and PRs semantically
10+
2. **Returns code snippets** — not just file paths, reducing input tokens by 99%
11+
3. **Bundles context**`dev_plan` assembles issue + code + commits in one call
12+
4. **Integrates with GitHub** to search issues and PRs semantically
1213

1314
## Measured impact
1415

15-
We benchmarked dev-agent against baseline Claude Code:
16+
We benchmarked dev-agent against baseline Claude Code across different task types:
1617

17-
| Metric | Baseline | With dev-agent | Change |
18-
|--------|----------|----------------|--------|
19-
| Cost | $1.82 | $1.02 | **-44%** |
20-
| Time | 14.1 min | 11.5 min | **-19%** |
21-
| Tool calls | 69 | 40 | **-42%** |
18+
| Task Type | Cost Savings | Time Savings | Why |
19+
|-----------|--------------|--------------|-----|
20+
| **Debugging** | **42%** | 37% | Semantic search beats grep chains |
21+
| **Exploration** | **44%** | 19% | Find code by meaning |
22+
| **Implementation** | **29%** | 22% | Context bundling via `dev_plan` |
2223

23-
**Trade-off:** Faster but sometimes less thorough. Best for implementation tasks and exploration. For deep debugging, baseline Claude may read more files.
24+
**Key insight:** Savings scale with task complexity. Simple lookups show no improvement; complex debugging shows 42% cost reduction.
25+
26+
**Trade-off:** Faster but sometimes less thorough. Baseline Claude provides more diagnostic shell commands.
27+
28+
## Why it saves money
29+
30+
| What dev-agent does | Manual equivalent | Impact |
31+
|---------------------|-------------------|--------|
32+
| Returns code snippets in search | Read entire files | 99% fewer input tokens |
33+
| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction |
34+
| Semantic search finds relevant code | grep chains + filtering | 42% cost reduction |
2435

2536
## Key Features
2637

2738
| Feature | Description |
2839
|---------|-------------|
40+
| **Context Bundling** | `dev_plan` replaces 5-10 tool calls with one |
41+
| **Code Snippets** | Search returns code, not just file paths |
2942
| **Semantic Search** | Find code by meaning, not keywords |
30-
| **Relationship Queries** | What calls this function? What does it call? |
3143
| **Git History** | Semantic search over commits |
32-
| **GitHub Integration** | Search issues and PRs semantically |
3344
| **100% Local** | Your code never leaves your machine |
3445

3546
## Architecture
@@ -45,4 +56,3 @@ dev-agent is a monorepo:
4556

4657
- [Installation →](/docs/install) — Get dev-agent installed in under 2 minutes
4758
- [Quickstart →](/docs/quickstart) — Index and search in 5 minutes
48-

website/content/index.mdx

Lines changed: 97 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -15,65 +15,112 @@ Local semantic code search for Cursor and Claude Code via MCP.
1515
</Callout>
1616

1717
<Callout type="default">
18-
**Built by engineers, for engineers.** An MCP server that gives your AI tools semantic code search. We built it to speed up our own workflow — and measured 44% cost savings.
18+
**Built by engineers, for engineers.** An MCP server that gives your AI tools semantic code search and context bundling. Savings scale with task complexity — up to 42% on debugging tasks.
19+
</Callout>
20+
21+
## Why it saves money
22+
23+
dev-agent doesn't just search — it **bundles context** so Claude reads less:
24+
25+
| What dev-agent does | Manual equivalent | Savings |
26+
|---------------------|-------------------|---------|
27+
| Returns code snippets in search | Read entire files | 99% fewer input tokens |
28+
| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction |
29+
| Semantic search finds relevant code | grep chains + manual filtering | 42% cost reduction |
30+
31+
**The harder the task, the bigger the savings.**
32+
33+
## Measured results by task type
34+
35+
| Task Type | Cost Savings | Time Savings | Why |
36+
|-----------|--------------|--------------|-----|
37+
| **Debugging** | **42%** | 37% | Semantic search beats grep for "where is the bug?" |
38+
| **Exploration** | **44%** | 19% | Find code by meaning, not keywords |
39+
| **Implementation** | **29%** | 22% | `dev_plan` bundles context in one call |
40+
| **Simple lookup** | ~0% | ~0% | Both approaches are fast |
41+
42+
<Callout type="warning">
43+
**Trade-off:** Faster but sometimes less thorough. Baseline Claude provides more diagnostic shell commands. dev-agent excels when you need to explore or understand code.
1944
</Callout>
2045

2146
## Same question, different approach
2247

23-
We asked Claude Code: *"Where is rate limiting implemented and how does it work?"*
48+
We asked Claude Code: *"Debug why search returns duplicates"*
2449

2550
<Tabs items={['Without dev-agent', 'With dev-agent']}>
2651
<Tabs.Tab>
27-
**Claude's approach (8 tool calls):**
52+
**Claude's approach:**
2853
<FileTree>
29-
<FileTree.Folder name="grep 'rate' → 47 matches" />
30-
<FileTree.Folder name="grep 'limit' → 23 matches" />
31-
<FileTree.Folder name="grep 'RateLimiter' → found 3 files" defaultOpen>
32-
<FileTree.File name="rate-limiter.ts" />
33-
<FileTree.File name="adapter-registry.ts" />
34-
<FileTree.File name="rate-limiter.test.ts" />
35-
</FileTree.Folder>
36-
<FileTree.File name="Read rate-limiter.ts" />
37-
<FileTree.Folder name="grep 'rateLimiter' → find usage" />
38-
<FileTree.File name="Read adapter-registry.ts" />
39-
<FileTree.Folder name="grep for test files" />
40-
<FileTree.File name="Read rate-limiter.test.ts" />
54+
<FileTree.Folder name="grep 'duplicate' → 30 matches" />
55+
<FileTree.Folder name="grep 'search' → 100+ matches" />
56+
<FileTree.Folder name="grep 'id' → too many, narrow down" />
57+
<FileTree.File name="Read indexer/index.ts (441 lines)" />
58+
<FileTree.File name="Read vector/store.ts (258 lines)" />
59+
<FileTree.File name="Read scanner/typescript.ts (full file)" />
60+
<FileTree.File name="Read scanner/markdown.ts (full file)" />
61+
<FileTree.Folder name="... more greps and reads" />
4162
</FileTree>
4263

43-
**Result:** 8 tool calls, 3 files read → **$0.36, 2.1 minutes**
64+
**Result:** 18+ tool calls, 10 files read → **$1.37, 12 minutes**
65+
66+
*18,800 input tokens consumed*
4467
</Tabs.Tab>
4568
<Tabs.Tab>
46-
**Claude's approach (2 tool calls):**
69+
**Claude's approach:**
4770
<FileTree>
48-
<FileTree.Folder name="dev_search 'rate limiting implementation'" defaultOpen>
49-
<FileTree.File name="rate-limiter.ts (score: 0.89, with code snippet)" />
50-
<FileTree.File name="adapter-registry.ts (score: 0.72, shows caller)" />
71+
<FileTree.Folder name="dev_search 'search duplicate results'" defaultOpen>
72+
<FileTree.File name="store.ts (with upsert code snippet)" />
73+
<FileTree.File name="indexer.ts (with ID generation)" />
5174
</FileTree.Folder>
52-
<FileTree.File name="Read rate-limiter.ts (for full implementation)" />
75+
<FileTree.Folder name="dev_search 'document ID generation'" defaultOpen>
76+
<FileTree.File name="→ typescript.ts (ID pattern)" />
77+
<FileTree.File name="→ markdown.ts (slug generation)" />
78+
</FileTree.Folder>
79+
<FileTree.File name="Read store.ts (for detail)" />
5380
</FileTree>
5481

55-
**Result:** 2 tool calls, 1 file read → **$0.20, 1.3 minutes**
82+
**Result:** 6 tool calls, 5 files read → **$0.79, 7.5 minutes**
83+
84+
*65 input tokens consumed (99.7% less)*
5685
</Tabs.Tab>
5786
</Tabs>
5887

5988
<Callout type="info">
60-
**Same answer. 44% cheaper. 38% faster.**
89+
**Same root cause identified. 42% cheaper. 37% faster.**
6190
</Callout>
6291

63-
## Measured results
92+
## Context bundling: `dev_plan`
6493

65-
We ran 5 task types comparing baseline Claude Code vs. with dev-agent:
94+
For implementation tasks, `dev_plan` bundles everything in one call:
6695

67-
| Metric | Baseline | With dev-agent | Change |
68-
|--------|----------|----------------|--------|
69-
| Cost | $1.82 | $1.02 | **-44%** |
70-
| Time | 14.1 min | 11.5 min | **-19%** |
71-
| Tool calls | 69 | 40 | **-42%** |
72-
| Files read | 23 | 15 | **-35%** |
73-
74-
<Callout type="warning">
75-
**Trade-off:** Faster but sometimes less thorough. Baseline Claude read more files for debugging tasks. dev-agent excels at implementation and exploration.
76-
</Callout>
96+
<Tabs items={['Without dev-agent', 'With dev-agent']}>
97+
<Tabs.Tab>
98+
**Claude's approach for "Implement issue #61":**
99+
```bash
100+
gh issue view 61 --json title,body # Fetch issue
101+
grep "--json" -r packages/cli # Find existing flags
102+
Read search.ts # Check implementation
103+
Read mcp.ts # Check implementation
104+
Read config.ts # Check file writes
105+
# ... 5+ more tool calls
106+
```
107+
108+
**Result:** $0.55, 5.7 minutes
109+
</Tabs.Tab>
110+
<Tabs.Tab>
111+
**Claude's approach:**
112+
```bash
113+
dev_plan --issue 61
114+
# Returns in ONE call:
115+
# - Issue details + comments
116+
# - Relevant code snippets
117+
# - Related commits (5 found)
118+
# - Codebase patterns
119+
```
120+
121+
**Result:** $0.39, 4.5 minutes (**29% cheaper**)
122+
</Tabs.Tab>
123+
</Tabs>
77124

78125
## How it works
79126

@@ -99,11 +146,7 @@ flowchart LR
99146
D <--> E
100147
```
101148

102-
**The flow:**
103-
1. Your AI tool asks a question like *"where is auth handled?"*
104-
2. dev-agent searches the vector database semantically
105-
3. Returns relevant code with snippets, relationships, and context
106-
4. All processing happens locally — your code never leaves your machine
149+
**Key insight:** dev-agent returns **code snippets with context** — Claude doesn't read entire files. This is why input tokens drop by 99%.
107150

108151
## Quick Start
109152

@@ -129,68 +172,37 @@ dev mcp install # For Claude Code
129172
```
130173
</Steps>
131174

132-
## Example: What dev_search returns
133-
134-
When Claude asks *"where is rate limiting implemented?"*, dev-agent returns:
135-
136-
```typescript
137-
// dev_search: "rate limiting implementation"
138-
// Found 2 results
139-
140-
// 1. packages/mcp-server/src/server/utils/rate-limiter.ts
141-
// Score: 0.89 | Type: Class
142-
// Callers: AdapterRegistry.executeTool
143-
144-
export class RateLimiter {
145-
private buckets = new Map<string, TokenBucket>();
146-
147-
check(key: string): { allowed: boolean; retryAfter?: number } {
148-
// Token bucket algorithm implementation
149-
}
150-
}
151-
152-
// 2. packages/mcp-server/src/adapters/adapter-registry.ts
153-
// Score: 0.72 | Type: Function
154-
155-
if (this.rateLimiter) {
156-
const result = this.rateLimiter.check(toolName);
157-
if (!result.allowed) return { error: 'Rate limited' };
158-
}
159-
```
160-
161-
Claude gets **code snippets + relationships** in one call. No grep chains needed.
162-
163175
## 9 MCP Tools
164176

165177
| Tool | What it does |
166178
|------|--------------|
167-
| [`dev_search`](/docs/tools/dev-search) | Semantic code search — find by meaning, not keywords |
179+
| [`dev_search`](/docs/tools/dev-search) | Semantic code search — returns snippets, not just paths |
180+
| [`dev_plan`](/docs/tools/dev-plan) | **Context bundling** — issue + code + commits in one call |
168181
| [`dev_refs`](/docs/tools/dev-refs) | Find callers/callees of any function |
169182
| [`dev_map`](/docs/tools/dev-map) | Codebase structure with change frequency |
170183
| [`dev_history`](/docs/tools/dev-history) | Semantic search over git commits |
171-
| [`dev_plan`](/docs/tools/dev-plan) | Assemble context for GitHub issues |
172184
| [`dev_explore`](/docs/tools/dev-explore) | Find similar code, trace relationships |
173185
| [`dev_gh`](/docs/tools/dev-gh) | Search GitHub issues/PRs semantically |
174186
| [`dev_status`](/docs/tools/dev-status) | Repository indexing status |
175187
| [`dev_health`](/docs/tools/dev-health) | Server health checks |
176188

177189
## When to use it
178190

179-
| Scenario | dev-agent? | Why |
180-
|----------|------------|-----|
181-
| Large/unfamiliar codebase | ✅ Yes | Semantic search beats grep for conceptual queries |
182-
| Implementation tasks | ✅ Yes | Finds existing code to reuse |
183-
| Reducing API costs | ✅ Yes | 44% cost reduction measured |
184-
| Small codebase you know | ❌ Skip | Your mental model is faster |
185-
| Deep debugging | ⚠️ Maybe | May need more file reads than dev-agent provides |
186-
| Thoroughness over speed | ⚠️ Maybe | Baseline Claude reads more files |
191+
| Scenario | dev-agent? | Expected Savings |
192+
|----------|------------|------------------|
193+
| Debugging unfamiliar code | ✅ Yes | **42% cost** |
194+
| Exploring large codebase | ✅ Yes | **44% cost** |
195+
| Implementing GitHub issues | ✅ Yes | **29% cost** |
196+
| Small codebase you know | ❌ Skip | ~0% |
197+
| Need exhaustive file reads | ⚠️ Maybe | Trade speed for thoroughness |
187198

188199
## Features
189200

190-
- **100% Local** — Code never leaves your machine. No API keys needed.
191-
- **TypeScript/JS/Markdown** — Full support today. More languages planned.
192-
- **Sub-second Search** — Fast even on large repos with LanceDB.
193-
- **1300+ Tests** — Production-grade reliability.
201+
- **Context Bundling**`dev_plan` replaces 5-10 tool calls with one
202+
- **Code Snippets** — Search returns code, not just file paths
203+
- **100% Local** — Your code never leaves your machine
204+
- **Sub-second Search** — Fast even on large repos with LanceDB
205+
- **1379+ Tests** — Production-grade reliability
194206

195207
---
196208

0 commit comments

Comments
 (0)