Skip to content

Commit e5f2aa4

Browse files
committed
docs: update website with honest positioning and benchmark results
- Add version callout (v0.4.3) with release notes link - Add 'Built by engineers, for engineers' messaging - Add before/after comparison tabs showing grep vs dev_search - Add example output showing what dev_search returns - Add measured results table (44% cost, 19% time savings) - Add 'When to use it' guidance with honest trade-offs - Update docs intro to match new tone - Add studies/ to .gitignore for benchmark logs - Update PLAN.md with v0.4.x roadmap and benchmark data
1 parent 2ce7fa4 commit e5f2aa4

File tree

6 files changed

+282
-75
lines changed

6 files changed

+282
-75
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,3 +67,6 @@ temp/
6767

6868
# Work in progress packages (not ready for commit)
6969
packages/benchmark/
70+
71+
# Benchmark studies and session logs
72+
studies/

PLAN.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,35 @@ Git history is valuable context that LLMs can't easily access. We add intelligen
181181

182182
---
183183

184-
## Current: Extended Git Intelligence (v0.5.0)
184+
## Current: Quality & Thoroughness (v0.4.x)
185+
186+
> Addressing gaps identified in benchmark study comparing dev-agent vs baseline Claude Code.
187+
188+
**Context:** Benchmarks showed dev-agent provides 44% cost savings and 19% faster responses, but with quality trade-offs. These improvements close the gap.
189+
190+
### Benchmark-Driven Improvements
191+
192+
| Task | Gap Identified | Priority | Status |
193+
|------|----------------|----------|--------|
194+
| Diagnostic command suggestions | Baseline provided shell commands for debugging; dev-agent didn't | 🔴 High | 🔲 Todo |
195+
| Test file inclusion hints | Baseline read test files; dev-agent skipped them | 🔴 High | 🔲 Todo |
196+
| Code example extraction | Baseline included more code snippets in responses | 🟡 Medium | 🔲 Todo |
197+
| Exhaustive mode for debugging | Option for thorough exploration vs fast satisficing | 🟡 Medium | 🔲 Todo |
198+
| Related files suggestions | "You might also want to check: X, Y, Z" | 🟡 Medium | 🔲 Todo |
199+
200+
### Tool Description Refinements (Done in v0.4.2)
201+
202+
| Task | Status |
203+
|------|--------|
204+
| Improved dev_search description ("USE THIS FIRST") | ✅ Done |
205+
| Improved dev_map description (vs list_dir) | ✅ Done |
206+
| Improved dev_explore description (workflow hints) | ✅ Done |
207+
| Improved dev_refs description (specific symbols) | ✅ Done |
208+
| All 9 adapters registered in CLI | ✅ Done |
209+
210+
---
211+
212+
## Next: Extended Git Intelligence (v0.5.0)
185213

186214
> Building on git history with deeper insights.
187215
@@ -286,6 +314,24 @@ How we know dev-agent is working:
286314
4. **Daily use:** We actually use it ourselves (dogfooding)
287315
5. **LLM effectiveness:** Claude/Cursor make better suggestions with dev-agent
288316

317+
### Benchmark Results (v0.4.2)
318+
319+
Measured against baseline Claude Code across 5 task types:
320+
321+
| Metric | Baseline | With dev-agent | Improvement |
322+
|--------|----------|----------------|-------------|
323+
| Cost per session | $1.82 | $1.02 | **-44%** |
324+
| Time per session | 14.1 min | 11.5 min | **-19%** |
325+
| Tool calls | 69 | 40 | **-42%** |
326+
| Files examined | 23 | 15 | **-35%** |
327+
328+
**Trade-offs identified:**
329+
- Less thorough for debugging (missing diagnostic commands)
330+
- Fewer code examples in responses
331+
- Skips test files (baseline reads them)
332+
333+
**Target users:** Mid-to-senior engineers who value speed over exhaustiveness for routine exploration tasks.
334+
289335
---
290336

291337
## Contributing

README.md

Lines changed: 35 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,45 @@
44
[![pnpm](https://img.shields.io/badge/pnpm-8.15.4-orange.svg)](https://pnpm.io/)
55
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
66

7-
**Local-first repository context provider for AI tools. Semantic code search, git history, relationship queries, and codebase mapping via MCP.**
7+
**Local semantic code search for Cursor and Claude Code via MCP.**
88

9-
## What is dev-agent?
9+
## What it does
1010

11-
dev-agent provides **rich, structured context** to AI assistants like Claude and Cursor. Instead of AI tools reading files one at a time, dev-agent gives them:
11+
dev-agent indexes your codebase and provides 9 MCP tools to AI assistants. Instead of AI tools grepping through files, they can ask conceptual questions like "where do we handle authentication?"
1212

13-
- 🔍 **Semantic search** with code snippets and relationships
14-
- 🗺️ **Codebase maps** showing structure and change frequency
15-
- 🔗 **Relationship queries** (what calls what)
16-
- 📜 **Git history search** (who changed what and why)
17-
- 📋 **Issue context** assembled for planning
13+
- `dev_search` — Semantic code search by meaning
14+
- `dev_refs` — Find callers/callees of functions
15+
- `dev_map` — Codebase structure with change frequency
16+
- `dev_history` — Semantic search over git commits
17+
- `dev_plan` — Assemble context for GitHub issues
18+
- `dev_explore` — Find similar code, trace relationships
19+
- `dev_gh` — Search GitHub issues/PRs semantically
20+
- `dev_status` / `dev_health` — Monitoring
1821

19-
**Philosophy:** Provide data, let LLMs reason. We don't try to be smart with heuristics—we provide comprehensive context so AI assistants can be smart.
22+
## Measured results
23+
24+
We benchmarked dev-agent against baseline Claude Code across 5 task types:
25+
26+
| Metric | Baseline | With dev-agent | Change |
27+
|--------|----------|----------------|--------|
28+
| Cost | $1.82 | $1.02 | **-44%** |
29+
| Time | 14.1 min | 11.5 min | **-19%** |
30+
| Tool calls | 69 | 40 | **-42%** |
31+
32+
**Trade-offs:** Faster but sometimes less thorough. Best for implementation tasks and codebase exploration. For deep debugging, baseline Claude may read more files.
33+
34+
## When to use it
35+
36+
**Good fit:**
37+
- Large or unfamiliar codebases
38+
- Implementation tasks ("add a feature like X")
39+
- Exploring how code works
40+
- Reducing AI API costs
41+
42+
**Less useful:**
43+
- Small codebases you already know well
44+
- Deep debugging sessions
45+
- When thoroughness matters more than speed
2046

2147
## Quick Start
2248

website/content/docs/index.mdx

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,48 @@
11
# Introduction
22

3-
**dev-agent** is a local-first repository context provider for AI tools. It gives AI assistants like Cursor and Claude Code deep understanding of your codebase through semantic search, code analysis, and GitHub integration.
3+
**dev-agent** provides semantic code search to AI assistants like Cursor and Claude Code via MCP.
44

5-
## The Problem
5+
We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files. dev-agent gives them a faster path: search by meaning, not keywords.
66

7-
AI coding assistants are powerful, but they struggle with context:
7+
## What it does
88

9-
- They can't search your codebase semantically
10-
- They don't understand relationships between files
11-
- They lack awareness of your GitHub issues and PRs
12-
- They hallucinate about code that doesn't exist
9+
1. **Indexes your codebase** locally with embeddings (all-MiniLM-L6-v2)
10+
2. **Exposes 9 MCP tools** for semantic search, code relationships, git history
11+
3. **Integrates with GitHub** to search issues and PRs semantically
1312

14-
## The Solution
13+
## Measured impact
1514

16-
dev-agent solves this by:
15+
We benchmarked dev-agent against baseline Claude Code:
1716

18-
1. **Indexing your codebase** with local embeddings (all-MiniLM-L6-v2)
19-
2. **Exposing semantic search** via the Model Context Protocol (MCP)
20-
3. **Integrating with GitHub** to understand your project's history
21-
4. **Providing specialized tools** for planning, exploration, and more
17+
| Metric | Baseline | With dev-agent | Change |
18+
|--------|----------|----------------|--------|
19+
| Cost | $1.82 | $1.02 | **-44%** |
20+
| Time | 14.1 min | 11.5 min | **-19%** |
21+
| Tool calls | 69 | 40 | **-42%** |
22+
23+
**Trade-off:** Faster but sometimes less thorough. Best for implementation tasks and exploration. For deep debugging, baseline Claude may read more files.
2224

2325
## Key Features
2426

2527
| Feature | Description |
2628
|---------|-------------|
2729
| **Semantic Search** | Find code by meaning, not keywords |
28-
| **AST Analysis** | Type-aware understanding of your code |
30+
| **Relationship Queries** | What calls this function? What does it call? |
31+
| **Git History** | Semantic search over commits |
2932
| **GitHub Integration** | Search issues and PRs semantically |
30-
| **Local-First** | Your code never leaves your machine |
31-
| **MCP Native** | Works with Cursor, Claude Code, VS Code |
33+
| **100% Local** | Your code never leaves your machine |
3234

33-
## Architecture Overview
35+
## Architecture
3436

35-
dev-agent is built as a monorepo with specialized packages:
37+
dev-agent is a monorepo:
3638

37-
- **@lytics/dev-agent-core**Repository scanning, vector storage, GitHub integration
39+
- **@lytics/dev-agent-core**Scanning, vector storage, GitHub integration
3840
- **@lytics/dev-agent-cli** — Command-line interface
39-
- **@lytics/dev-agent-mcp** — MCP server with tool adapters
40-
- **@lytics/dev-agent-subagents**Specialized agents (planner, explorer, PR manager)
41+
- **@lytics/dev-agent-mcp** — MCP server with 9 tool adapters
42+
- **@lytics/dev-agent-subagents**Planner, explorer agents
4143

4244
## Next Steps
4345

4446
- [Installation →](/docs/install) — Get dev-agent installed in under 2 minutes
45-
- [Quickstart →](/docs/quickstart)From zero to semantic search in 5 minutes
47+
- [Quickstart →](/docs/quickstart)Index and search in 5 minutes
4648

0 commit comments

Comments
 (0)