docs: update website with honest positioning and benchmark results

prosdev · prosdev · commit e5f2aa42e845 · 2025-11-29T01:37:00.000-08:00
- Add version callout (v0.4.3) with release notes link
- Add 'Built by engineers, for engineers' messaging
- Add before/after comparison tabs showing grep vs dev_search
- Add example output showing what dev_search returns
- Add measured results table (44% cost, 19% time savings)
- Add 'When to use it' guidance with honest trade-offs
- Update docs intro to match new tone
- Add studies/ to .gitignore for benchmark logs
- Update PLAN.md with v0.4.x roadmap and benchmark data
diff --git a/.gitignore b/.gitignore
@@ -67,3 +67,6 @@ temp/
 
 # Work in progress packages (not ready for commit)
 packages/benchmark/
+
+# Benchmark studies and session logs
+studies/
diff --git a/PLAN.md b/PLAN.md
@@ -181,7 +181,35 @@ Git history is valuable context that LLMs can't easily access. We add intelligen
 
 ---
 
-## Current: Extended Git Intelligence (v0.5.0)
+## Current: Quality & Thoroughness (v0.4.x)
+
+> Addressing gaps identified in benchmark study comparing dev-agent vs baseline Claude Code.
+
+**Context:** Benchmarks showed dev-agent provides 44% cost savings and 19% faster responses, but with quality trade-offs. These improvements close the gap.
+
+### Benchmark-Driven Improvements
+
+| Task | Gap Identified | Priority | Status |
+|------|----------------|----------|--------|
+| Diagnostic command suggestions | Baseline provided shell commands for debugging; dev-agent didn't | 🔴 High | 🔲 Todo |
+| Test file inclusion hints | Baseline read test files; dev-agent skipped them | 🔴 High | 🔲 Todo |
+| Code example extraction | Baseline included more code snippets in responses | 🟡 Medium | 🔲 Todo |
+| Exhaustive mode for debugging | Option for thorough exploration vs fast satisficing | 🟡 Medium | 🔲 Todo |
+| Related files suggestions | "You might also want to check: X, Y, Z" | 🟡 Medium | 🔲 Todo |
+
+### Tool Description Refinements (Done in v0.4.2)
+
+| Task | Status |
+|------|--------|
+| Improved dev_search description ("USE THIS FIRST") | ✅ Done |
+| Improved dev_map description (vs list_dir) | ✅ Done |
+| Improved dev_explore description (workflow hints) | ✅ Done |
+| Improved dev_refs description (specific symbols) | ✅ Done |
+| All 9 adapters registered in CLI | ✅ Done |
+
+---
+
+## Next: Extended Git Intelligence (v0.5.0)
 
 > Building on git history with deeper insights.
 
@@ -286,6 +314,24 @@ How we know dev-agent is working:
 4. **Daily use:** We actually use it ourselves (dogfooding)
 5. **LLM effectiveness:** Claude/Cursor make better suggestions with dev-agent
 
+### Benchmark Results (v0.4.2)
+
+Measured against baseline Claude Code across 5 task types:
+
+| Metric | Baseline | With dev-agent | Improvement |
+|--------|----------|----------------|-------------|
+| Cost per session | $1.82 | $1.02 | **-44%** |
+| Time per session | 14.1 min | 11.5 min | **-19%** |
+| Tool calls | 69 | 40 | **-42%** |
+| Files examined | 23 | 15 | **-35%** |
+
+**Trade-offs identified:**
+- Less thorough for debugging (missing diagnostic commands)
+- Fewer code examples in responses
+- Skips test files (baseline reads them)
+
+**Target users:** Mid-to-senior engineers who value speed over exhaustiveness for routine exploration tasks.
+
 ---
 
 ## Contributing
diff --git a/README.md b/README.md
@@ -4,19 +4,45 @@
 [![pnpm](https://img.shields.io/badge/pnpm-8.15.4-orange.svg)](https://pnpm.io/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
 
-**Local-first repository context provider for AI tools. Semantic code search, git history, relationship queries, and codebase mapping via MCP.**
+**Local semantic code search for Cursor and Claude Code via MCP.**
 
-## What is dev-agent?
+## What it does
 
-dev-agent provides **rich, structured context** to AI assistants like Claude and Cursor. Instead of AI tools reading files one at a time, dev-agent gives them:
+dev-agent indexes your codebase and provides 9 MCP tools to AI assistants. Instead of AI tools grepping through files, they can ask conceptual questions like "where do we handle authentication?"
 
-- 🔍 **Semantic search** with code snippets and relationships
-- 🗺️ **Codebase maps** showing structure and change frequency
-- 🔗 **Relationship queries** (what calls what)
-- 📜 **Git history search** (who changed what and why)
-- 📋 **Issue context** assembled for planning
+- `dev_search` — Semantic code search by meaning
+- `dev_refs` — Find callers/callees of functions  
+- `dev_map` — Codebase structure with change frequency
+- `dev_history` — Semantic search over git commits
+- `dev_plan` — Assemble context for GitHub issues
+- `dev_explore` — Find similar code, trace relationships
+- `dev_gh` — Search GitHub issues/PRs semantically
+- `dev_status` / `dev_health` — Monitoring
 
-**Philosophy:** Provide data, let LLMs reason. We don't try to be smart with heuristics—we provide comprehensive context so AI assistants can be smart.
+## Measured results
+
+We benchmarked dev-agent against baseline Claude Code across 5 task types:
+
+| Metric | Baseline | With dev-agent | Change |
+|--------|----------|----------------|--------|
+| Cost | $1.82 | $1.02 | **-44%** |
+| Time | 14.1 min | 11.5 min | **-19%** |
+| Tool calls | 69 | 40 | **-42%** |
+
+**Trade-offs:** Faster but sometimes less thorough. Best for implementation tasks and codebase exploration. For deep debugging, baseline Claude may read more files.
+
+## When to use it
+
+**Good fit:**
+- Large or unfamiliar codebases
+- Implementation tasks ("add a feature like X")
+- Exploring how code works
+- Reducing AI API costs
+
+**Less useful:**
+- Small codebases you already know well
+- Deep debugging sessions
+- When thoroughness matters more than speed
 
 ## Quick Start
 
diff --git a/website/content/docs/index.mdx b/website/content/docs/index.mdx
@@ -1,46 +1,48 @@
 # Introduction
 
-**dev-agent** is a local-first repository context provider for AI tools. It gives AI assistants like Cursor and Claude Code deep understanding of your codebase through semantic search, code analysis, and GitHub integration.
+**dev-agent** provides semantic code search to AI assistants like Cursor and Claude Code via MCP.
 
-## The Problem
+We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files. dev-agent gives them a faster path: search by meaning, not keywords.
 
-AI coding assistants are powerful, but they struggle with context:
+## What it does
 
-- They can't search your codebase semantically
-- They don't understand relationships between files
-- They lack awareness of your GitHub issues and PRs
-- They hallucinate about code that doesn't exist
+1. **Indexes your codebase** locally with embeddings (all-MiniLM-L6-v2)
+2. **Exposes 9 MCP tools** for semantic search, code relationships, git history
+3. **Integrates with GitHub** to search issues and PRs semantically
 
-## The Solution
+## Measured impact
 
-dev-agent solves this by:
+We benchmarked dev-agent against baseline Claude Code:
 
-1. **Indexing your codebase** with local embeddings (all-MiniLM-L6-v2)
-2. **Exposing semantic search** via the Model Context Protocol (MCP)
-3. **Integrating with GitHub** to understand your project's history
-4. **Providing specialized tools** for planning, exploration, and more
+| Metric | Baseline | With dev-agent | Change |
+|--------|----------|----------------|--------|
+| Cost | $1.82 | $1.02 | **-44%** |
+| Time | 14.1 min | 11.5 min | **-19%** |
+| Tool calls | 69 | 40 | **-42%** |
+
+**Trade-off:** Faster but sometimes less thorough. Best for implementation tasks and exploration. For deep debugging, baseline Claude may read more files.
 
 ## Key Features
 
 | Feature | Description |
 |---------|-------------|
 | **Semantic Search** | Find code by meaning, not keywords |
-| **AST Analysis** | Type-aware understanding of your code |
+| **Relationship Queries** | What calls this function? What does it call? |
+| **Git History** | Semantic search over commits |
 | **GitHub Integration** | Search issues and PRs semantically |
-| **Local-First** | Your code never leaves your machine |
-| **MCP Native** | Works with Cursor, Claude Code, VS Code |
+| **100% Local** | Your code never leaves your machine |
 
-## Architecture Overview
+## Architecture
 
-dev-agent is built as a monorepo with specialized packages:
+dev-agent is a monorepo:
 
-- **@lytics/dev-agent-core** — Repository scanning, vector storage, GitHub integration
+- **@lytics/dev-agent-core** — Scanning, vector storage, GitHub integration
 - **@lytics/dev-agent-cli** — Command-line interface
-- **@lytics/dev-agent-mcp** — MCP server with tool adapters
-- **@lytics/dev-agent-subagents** — Specialized agents (planner, explorer, PR manager)
+- **@lytics/dev-agent-mcp** — MCP server with 9 tool adapters
+- **@lytics/dev-agent-subagents** — Planner, explorer agents
 
 ## Next Steps
 
 - [Installation →](/docs/install) — Get dev-agent installed in under 2 minutes
-- [Quickstart →](/docs/quickstart) — From zero to semantic search in 5 minutes
+- [Quickstart →](/docs/quickstart) — Index and search in 5 minutes
 
diff --git a/website/content/index.mdx b/website/content/index.mdx
diff --git a/website/theme.config.tsx b/website/theme.config.tsx