diff --git a/.gitignore b/.gitignore
index b8143ee..8967b28 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,3 +67,6 @@ temp/
# Work in progress packages (not ready for commit)
packages/benchmark/
+
+# Benchmark studies and session logs
+studies/
diff --git a/PLAN.md b/PLAN.md
index b104b8f..4785d52 100644
--- a/PLAN.md
+++ b/PLAN.md
@@ -181,7 +181,35 @@ Git history is valuable context that LLMs can't easily access. We add intelligen
---
-## Current: Extended Git Intelligence (v0.5.0)
+## Current: Quality & Thoroughness (v0.4.x)
+
+> Addressing gaps identified in benchmark study comparing dev-agent vs baseline Claude Code.
+
+**Context:** Benchmarks showed dev-agent provides 44% cost savings and 19% faster responses, but with quality trade-offs. These improvements close the gap.
+
+### Benchmark-Driven Improvements
+
+| Task | Gap Identified | Priority | Status |
+|------|----------------|----------|--------|
+| Diagnostic command suggestions | Baseline provided shell commands for debugging; dev-agent didn't | π΄ High | π² Todo |
+| Test file inclusion hints | Baseline read test files; dev-agent skipped them | π΄ High | π² Todo |
+| Code example extraction | Baseline included more code snippets in responses | π‘ Medium | π² Todo |
+| Exhaustive mode for debugging | Option for thorough exploration vs fast satisficing | π‘ Medium | π² Todo |
+| Related files suggestions | "You might also want to check: X, Y, Z" | π‘ Medium | π² Todo |
+
+### Tool Description Refinements (Done in v0.4.2)
+
+| Task | Status |
+|------|--------|
+| Improved dev_search description ("USE THIS FIRST") | β
Done |
+| Improved dev_map description (vs list_dir) | β
Done |
+| Improved dev_explore description (workflow hints) | β
Done |
+| Improved dev_refs description (specific symbols) | β
Done |
+| All 9 adapters registered in CLI | β
Done |
+
+---
+
+## Next: Extended Git Intelligence (v0.5.0)
> Building on git history with deeper insights.
@@ -286,6 +314,24 @@ How we know dev-agent is working:
4. **Daily use:** We actually use it ourselves (dogfooding)
5. **LLM effectiveness:** Claude/Cursor make better suggestions with dev-agent
+### Benchmark Results (v0.4.2)
+
+Measured against baseline Claude Code across 5 task types:
+
+| Metric | Baseline | With dev-agent | Improvement |
+|--------|----------|----------------|-------------|
+| Cost per session | $1.82 | $1.02 | **-44%** |
+| Time per session | 14.1 min | 11.5 min | **-19%** |
+| Tool calls | 69 | 40 | **-42%** |
+| Files examined | 23 | 15 | **-35%** |
+
+**Trade-offs identified:**
+- Less thorough for debugging (missing diagnostic commands)
+- Fewer code examples in responses
+- Skips test files (baseline reads them)
+
+**Target users:** Mid-to-senior engineers who value speed over exhaustiveness for routine exploration tasks.
+
---
## Contributing
diff --git a/README.md b/README.md
index db7744c..3e3020b 100644
--- a/README.md
+++ b/README.md
@@ -4,19 +4,45 @@
[](https://pnpm.io/)
[](LICENSE)
-**Local-first repository context provider for AI tools. Semantic code search, git history, relationship queries, and codebase mapping via MCP.**
+**Local semantic code search for Cursor and Claude Code via MCP.**
-## What is dev-agent?
+## What it does
-dev-agent provides **rich, structured context** to AI assistants like Claude and Cursor. Instead of AI tools reading files one at a time, dev-agent gives them:
+dev-agent indexes your codebase and provides 9 MCP tools to AI assistants. Instead of AI tools grepping through files, they can ask conceptual questions like "where do we handle authentication?"
-- π **Semantic search** with code snippets and relationships
-- πΊοΈ **Codebase maps** showing structure and change frequency
-- π **Relationship queries** (what calls what)
-- π **Git history search** (who changed what and why)
-- π **Issue context** assembled for planning
+- `dev_search` β Semantic code search by meaning
+- `dev_refs` β Find callers/callees of functions
+- `dev_map` β Codebase structure with change frequency
+- `dev_history` β Semantic search over git commits
+- `dev_plan` β Assemble context for GitHub issues
+- `dev_explore` β Find similar code, trace relationships
+- `dev_gh` β Search GitHub issues/PRs semantically
+- `dev_status` / `dev_health` β Monitoring
-**Philosophy:** Provide data, let LLMs reason. We don't try to be smart with heuristicsβwe provide comprehensive context so AI assistants can be smart.
+## Measured results
+
+We benchmarked dev-agent against baseline Claude Code across 5 task types:
+
+| Metric | Baseline | With dev-agent | Change |
+|--------|----------|----------------|--------|
+| Cost | $1.82 | $1.02 | **-44%** |
+| Time | 14.1 min | 11.5 min | **-19%** |
+| Tool calls | 69 | 40 | **-42%** |
+
+**Trade-offs:** Faster but sometimes less thorough. Best for implementation tasks and codebase exploration. For deep debugging, baseline Claude may read more files.
+
+## When to use it
+
+**Good fit:**
+- Large or unfamiliar codebases
+- Implementation tasks ("add a feature like X")
+- Exploring how code works
+- Reducing AI API costs
+
+**Less useful:**
+- Small codebases you already know well
+- Deep debugging sessions
+- When thoroughness matters more than speed
## Quick Start
diff --git a/website/content/docs/index.mdx b/website/content/docs/index.mdx
index 2fa1b4b..de00484 100644
--- a/website/content/docs/index.mdx
+++ b/website/content/docs/index.mdx
@@ -1,46 +1,48 @@
# Introduction
-**dev-agent** is a local-first repository context provider for AI tools. It gives AI assistants like Cursor and Claude Code deep understanding of your codebase through semantic search, code analysis, and GitHub integration.
+**dev-agent** provides semantic code search to AI assistants like Cursor and Claude Code via MCP.
-## The Problem
+We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files. dev-agent gives them a faster path: search by meaning, not keywords.
-AI coding assistants are powerful, but they struggle with context:
+## What it does
-- They can't search your codebase semantically
-- They don't understand relationships between files
-- They lack awareness of your GitHub issues and PRs
-- They hallucinate about code that doesn't exist
+1. **Indexes your codebase** locally with embeddings (all-MiniLM-L6-v2)
+2. **Exposes 9 MCP tools** for semantic search, code relationships, git history
+3. **Integrates with GitHub** to search issues and PRs semantically
-## The Solution
+## Measured impact
-dev-agent solves this by:
+We benchmarked dev-agent against baseline Claude Code:
-1. **Indexing your codebase** with local embeddings (all-MiniLM-L6-v2)
-2. **Exposing semantic search** via the Model Context Protocol (MCP)
-3. **Integrating with GitHub** to understand your project's history
-4. **Providing specialized tools** for planning, exploration, and more
+| Metric | Baseline | With dev-agent | Change |
+|--------|----------|----------------|--------|
+| Cost | $1.82 | $1.02 | **-44%** |
+| Time | 14.1 min | 11.5 min | **-19%** |
+| Tool calls | 69 | 40 | **-42%** |
+
+**Trade-off:** Faster but sometimes less thorough. Best for implementation tasks and exploration. For deep debugging, baseline Claude may read more files.
## Key Features
| Feature | Description |
|---------|-------------|
| **Semantic Search** | Find code by meaning, not keywords |
-| **AST Analysis** | Type-aware understanding of your code |
+| **Relationship Queries** | What calls this function? What does it call? |
+| **Git History** | Semantic search over commits |
| **GitHub Integration** | Search issues and PRs semantically |
-| **Local-First** | Your code never leaves your machine |
-| **MCP Native** | Works with Cursor, Claude Code, VS Code |
+| **100% Local** | Your code never leaves your machine |
-## Architecture Overview
+## Architecture
-dev-agent is built as a monorepo with specialized packages:
+dev-agent is a monorepo:
-- **@lytics/dev-agent-core** β Repository scanning, vector storage, GitHub integration
+- **@lytics/dev-agent-core** β Scanning, vector storage, GitHub integration
- **@lytics/dev-agent-cli** β Command-line interface
-- **@lytics/dev-agent-mcp** β MCP server with tool adapters
-- **@lytics/dev-agent-subagents** β Specialized agents (planner, explorer, PR manager)
+- **@lytics/dev-agent-mcp** β MCP server with 9 tool adapters
+- **@lytics/dev-agent-subagents** β Planner, explorer agents
## Next Steps
- [Installation β](/docs/install) β Get dev-agent installed in under 2 minutes
-- [Quickstart β](/docs/quickstart) β From zero to semantic search in 5 minutes
+- [Quickstart β](/docs/quickstart) β Index and search in 5 minutes
diff --git a/website/content/index.mdx b/website/content/index.mdx
index ff4aefd..356930a 100644
--- a/website/content/index.mdx
+++ b/website/content/index.mdx
@@ -2,69 +2,196 @@
title: dev-agent
---
+import { Callout, Steps, Tabs, FileTree } from 'nextra/components'
+
# dev-agent
-Deep code intelligence + AI subagents via MCP. Local-first semantic search for Cursor and Claude Code.
+Local semantic code search for Cursor and Claude Code via MCP.
[Get Started](/docs) Β· [View on GitHub](https://github.com/lytics/dev-agent)
-## Why dev-agent?
-
-AI coding assistants are only as good as the context they receive. dev-agent gives your AI tools **deep understanding** of your codebase:
+
+ **v0.4.3** β Now with git history search (`dev_history`), change frequency in `dev_map`, and improved tool descriptions. [See what's new β](https://github.com/lytics/dev-agent/releases)
+
+
+
+ **Built by engineers, for engineers.** An MCP server that gives your AI tools semantic code search. We built it to speed up our own workflow β and measured 44% cost savings.
+
+
+## Same question, different approach
+
+We asked Claude Code: *"Where is rate limiting implemented and how does it work?"*
+
+
+
+ **Claude's approach (8 tool calls):**
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ **Result:** 8 tool calls, 3 files read β **$0.36, 2.1 minutes**
+
+
+ **Claude's approach (2 tool calls):**
+
+
+
+
+
+
+
+
+ **Result:** 2 tool calls, 1 file read β **$0.20, 1.3 minutes**
+
+
+
+
+ **Same answer. 44% cheaper. 38% faster.**
+
+
+## Measured results
+
+We ran 5 task types comparing baseline Claude Code vs. with dev-agent:
+
+| Metric | Baseline | With dev-agent | Change |
+|--------|----------|----------------|--------|
+| Cost | $1.82 | $1.02 | **-44%** |
+| Time | 14.1 min | 11.5 min | **-19%** |
+| Tool calls | 69 | 40 | **-42%** |
+| Files read | 23 | 15 | **-35%** |
+
+
+ **Trade-off:** Faster but sometimes less thorough. Baseline Claude read more files for debugging tasks. dev-agent excels at implementation and exploration.
+
+
+## How it works
+
+```mermaid
+flowchart LR
+ subgraph IDE["Your AI Tool"]
+ A["Cursor / Claude Code"]
+ end
+
+ subgraph Agent["dev-agent"]
+ B["MCP Server"]
+ C["9 Tools"]
+ end
+
+ subgraph Local["Local Storage"]
+ D["Vector DB"]
+ E["Embeddings"]
+ end
+
+ A <-->|"MCP Protocol"| B
+ B --> C
+ C <--> D
+ D <--> E
+```
-| Feature | Description |
-|---------|-------------|
-| π **Semantic Search** | Find code by meaning, not keywords. "Where do we handle auth?" actually works. |
-| π **Smart Planning** | Generate implementation plans from GitHub issues with accurate estimates. |
-| π **GitHub Integration** | Search issues and PRs semantically. Understand project history. |
-| π§ **Code Exploration** | Find patterns, similar code, and trace relationships across your codebase. |
+**The flow:**
+1. Your AI tool asks a question like *"where is auth handled?"*
+2. dev-agent searches the vector database semantically
+3. Returns relevant code with snippets, relationships, and context
+4. All processing happens locally β your code never leaves your machine
## Quick Start
+
+### Install
+
```bash
-# Install globally
-npm install -g @lytics/dev-agent
+npm install -g dev-agent
+```
+
+### Index your repository
-# Index your repository
+```bash
cd your-project
dev index .
+```
-# Install in Cursor
-dev mcp install --cursor
+### Connect to your AI tool
+
+```bash
+dev mcp install --cursor # For Cursor
+dev mcp install # For Claude Code
```
+
-That's it. Your AI assistant now has superpowers.
+## Example: What dev_search returns
-## How It Works
+When Claude asks *"where is rate limiting implemented?"*, dev-agent returns:
+```typescript
+// dev_search: "rate limiting implementation"
+// Found 2 results
+
+// 1. packages/mcp-server/src/server/utils/rate-limiter.ts
+// Score: 0.89 | Type: Class
+// Callers: AdapterRegistry.executeTool
+
+export class RateLimiter {
+ private buckets = new Map();
+
+ check(key: string): { allowed: boolean; retryAfter?: number } {
+ // Token bucket algorithm implementation
+ }
+}
+
+// 2. packages/mcp-server/src/adapters/adapter-registry.ts
+// Score: 0.72 | Type: Function
+
+if (this.rateLimiter) {
+ const result = this.rateLimiter.check(toolName);
+ if (!result.allowed) return { error: 'Rate limited' };
+}
```
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-β Your AI Tool β
-β (Cursor / Claude Code) β
-βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
- β MCP Protocol
-βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
-β dev-agent β
-β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β
-β β dev_search β β dev_plan β β dev_explore β β
-β β dev_status β β dev_gh β β dev_health β β
-β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β
-β β β
-β ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ β
-β β Local Vector Storage β β
-β β (LanceDB + MiniLM embeddings) β β
-β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-```
+
+Claude gets **code snippets + relationships** in one call. No grep chains needed.
+
+## 9 MCP Tools
+
+| Tool | What it does |
+|------|--------------|
+| [`dev_search`](/docs/tools/dev-search) | Semantic code search β find by meaning, not keywords |
+| [`dev_refs`](/docs/tools/dev-refs) | Find callers/callees of any function |
+| [`dev_map`](/docs/tools/dev-map) | Codebase structure with change frequency |
+| [`dev_history`](/docs/tools/dev-history) | Semantic search over git commits |
+| [`dev_plan`](/docs/tools/dev-plan) | Assemble context for GitHub issues |
+| [`dev_explore`](/docs/tools/dev-explore) | Find similar code, trace relationships |
+| [`dev_gh`](/docs/tools/dev-gh) | Search GitHub issues/PRs semantically |
+| [`dev_status`](/docs/tools/dev-status) | Repository indexing status |
+| [`dev_health`](/docs/tools/dev-health) | Server health checks |
+
+## When to use it
+
+| Scenario | dev-agent? | Why |
+|----------|------------|-----|
+| Large/unfamiliar codebase | β
Yes | Semantic search beats grep for conceptual queries |
+| Implementation tasks | β
Yes | Finds existing code to reuse |
+| Reducing API costs | β
Yes | 44% cost reduction measured |
+| Small codebase you know | β Skip | Your mental model is faster |
+| Deep debugging | β οΈ Maybe | May need more file reads than dev-agent provides |
+| Thoroughness over speed | β οΈ Maybe | Baseline Claude reads more files |
## Features
-- **100% Local** β Your code never leaves your machine
-- **Multi-Language** β TypeScript, JavaScript, Markdown (more coming)
-- **Fast** β Sub-second semantic search across large codebases
-- **MCP Native** β Works with any MCP-compatible AI tool
-- **GitHub Aware** β Understands your issues, PRs, and project history
+- **100% Local** β Code never leaves your machine. No API keys needed.
+- **TypeScript/JS/Markdown** β Full support today. More languages planned.
+- **Sub-second Search** β Fast even on large repos with LanceDB.
+- **1300+ Tests** β Production-grade reliability.
---
-MIT License β’ Built by [Lytics](https://github.com/lytics)
+MIT License β’ Built by [prosdev](https://github.com/prosdev)
diff --git a/website/theme.config.tsx b/website/theme.config.tsx
index 27f8baf..9b86163 100644
--- a/website/theme.config.tsx
+++ b/website/theme.config.tsx
@@ -23,10 +23,13 @@ const config = {
-
+
>
),
sidebar: {