diff --git a/AGENTS.md b/AGENTS.md index e5cb302..a97cb40 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -179,10 +179,13 @@ Specialized agents for development tasks. MCP server with built-in adapters for AI tools. -**Adapters:** +**Adapters (9 tools):** - **SearchAdapter:** Semantic code search (`dev_search`) +- **RefsAdapter:** Relationship queries - callers/callees (`dev_refs`) +- **MapAdapter:** Codebase structure with change frequency (`dev_map`) +- **HistoryAdapter:** Semantic git commit search (`dev_history`) - **StatusAdapter:** Repository status (`dev_status`) -- **PlanAdapter:** Planning from issues (`dev_plan`) +- **PlanAdapter:** Context assembly for issues (`dev_plan`) - **ExploreAdapter:** Code exploration (`dev_explore`) - **GitHubAdapter:** Issue/PR search (`dev_gh`) - **HealthAdapter:** Server health checks (`dev_health`) diff --git a/CLAUDE.md b/CLAUDE.md index 88361a2..b1697df 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -154,16 +154,19 @@ dev mcp install That's it! Claude Code now has access to all dev-agent capabilities. -### Available Tools in Claude Code & Cursor - -Once installed, AI tools gain access to these powerful capabilities: - -- **`dev_search`** - Semantic code search across indexed repositories -- **`dev_status`** - Repository indexing status and health information -- **`dev_plan`** - Generate implementation plans from GitHub issues -- **`dev_explore`** - Explore code patterns, find similar code, analyze relationships -- **`dev_gh`** - Search GitHub issues and pull requests with semantic context (auto-reloads on index changes) -- **`dev_health`** - Check MCP server health and component status (vector storage, repository, GitHub index) +### Available Tools in Claude Code & Cursor (9 tools) + +Once installed, AI tools gain access to: + +- **`dev_search`** - Semantic code search (USE THIS FIRST for conceptual queries) +- **`dev_refs`** - Find callers/callees of functions (for specific symbols) +- **`dev_map`** - Codebase structure with component counts and change frequency +- **`dev_history`** - Semantic search over git commits (who changed what and why) +- **`dev_plan`** - Assemble context for GitHub issues (code + history + patterns) +- **`dev_explore`** - Find similar code, trace relationships +- **`dev_gh`** - Search GitHub issues/PRs semantically +- **`dev_status`** - Repository indexing status +- **`dev_health`** - Server health checks ### MCP Command Reference diff --git a/PLAN.md b/PLAN.md index 4785d52..8970494 100644 --- a/PLAN.md +++ b/PLAN.md @@ -213,7 +213,7 @@ Git history is valuable context that LLMs can't easily access. We add intelligen > Building on git history with deeper insights. -### Tasks +### Git Tasks | Task | Priority | Status | |------|----------|--------| @@ -222,6 +222,28 @@ Git history is valuable context that LLMs can't easily access. We add intelligen | Contributor expertise mapping | 🟢 Low | 🔲 Todo | | Cross-repo history | 🟢 Low | 🔲 Todo | +### Tool Improvements + +| Task | Rationale | Priority | Status | +|------|-----------|----------|--------| +| Generalize `dev_plan` → `dev_context` | Currently requires GitHub issue; should work with any task description | 🔴 High | 🔲 Todo | +| Freeform context assembly | `dev_context "Add rate limiting"` without needing issue # | 🔴 High | 🔲 Todo | +| Multiple input modes | `--issue 42`, `--file src/auth.ts`, or freeform query | 🟡 Medium | 🔲 Todo | + +**Why:** `dev_plan` is really a context assembler but is tightly coupled to GitHub issues. Generalizing it: +- Works without GitHub +- Easier to benchmark (no real issues needed) +- Name matches function (assembles context, doesn't "plan") +- More useful for ad-hoc implementation tasks + +### Benchmark Improvements + +| Task | Rationale | Priority | Status | +|------|-----------|----------|--------| +| Add implementation task types | Current benchmark only tests exploration; missing `dev_plan`/`dev_gh` coverage | 🟡 Medium | 🔲 Todo | +| Generic implementation patterns | "Add a new adapter similar to X" — tests pattern discovery | 🟡 Medium | 🔲 Todo | +| Snapshotted issue tests | Capture real issues for reproducible `dev_plan` testing | 🟢 Low | 🔲 Todo | + --- ## Future: Extended Intelligence (v0.6+) @@ -314,23 +336,40 @@ How we know dev-agent is working: 4. **Daily use:** We actually use it ourselves (dogfooding) 5. **LLM effectiveness:** Claude/Cursor make better suggestions with dev-agent -### Benchmark Results (v0.4.2) +### Benchmark Results (v0.4.3) + +#### By Task Type + +| Task Type | Cost Savings | Time Savings | Why | +|-----------|--------------|--------------|-----| +| **Debugging** | **42%** | 37% | Semantic search beats grep chains | +| **Exploration** | **44%** | 19% | Find code by meaning | +| **Implementation** | **29%** | 22% | Context bundling via `dev_plan` | +| **Simple lookup** | ~0% | ~0% | Both approaches are fast | + +**Key insight:** Savings scale with task complexity. + +#### Why It Saves Money + +| What dev-agent does | Manual equivalent | Impact | +|---------------------|-------------------|--------| +| Returns code snippets in search | Read entire files | 99% fewer input tokens | +| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction | +| Semantic search finds relevant code | grep chains + filtering | 42% cost reduction | -Measured against baseline Claude Code across 5 task types: +#### Token Analysis (Debugging Task) -| Metric | Baseline | With dev-agent | Improvement | -|--------|----------|----------------|-------------| -| Cost per session | $1.82 | $1.02 | **-44%** | -| Time per session | 14.1 min | 11.5 min | **-19%** | -| Tool calls | 69 | 40 | **-42%** | -| Files examined | 23 | 15 | **-35%** | +| Metric | Without dev-agent | With dev-agent | Difference | +|--------|-------------------|----------------|------------| +| Input tokens | 18,800 | 65 | **99.7% less** | +| Output tokens | 12,200 | 6,200 | **49% less** | +| Files read | 10 | 5 | **50% less** | **Trade-offs identified:** -- Less thorough for debugging (missing diagnostic commands) -- Fewer code examples in responses -- Skips test files (baseline reads them) +- Baseline provides more diagnostic shell commands +- Baseline reads more files (sometimes helpful for thoroughness) -**Target users:** Mid-to-senior engineers who value speed over exhaustiveness for routine exploration tasks. +**Target users:** Engineers working on complex exploration, debugging, or implementation tasks in large/unfamiliar codebases. --- @@ -347,4 +386,4 @@ pnpm test --- -*Last updated: November 2025* +*Last updated: November 29, 2025 at 02:30 PST* diff --git a/website/content/docs/index.mdx b/website/content/docs/index.mdx index de00484..fc03ea4 100644 --- a/website/content/docs/index.mdx +++ b/website/content/docs/index.mdx @@ -1,35 +1,46 @@ # Introduction -**dev-agent** provides semantic code search to AI assistants like Cursor and Claude Code via MCP. +**dev-agent** provides semantic code search and context bundling to AI assistants like Cursor and Claude Code via MCP. -We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files. dev-agent gives them a faster path: search by meaning, not keywords. +We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files and reading entire files to find relevant code. dev-agent gives them a faster path: search by meaning, get code snippets, bundle context. ## What it does 1. **Indexes your codebase** locally with embeddings (all-MiniLM-L6-v2) -2. **Exposes 9 MCP tools** for semantic search, code relationships, git history -3. **Integrates with GitHub** to search issues and PRs semantically +2. **Returns code snippets** — not just file paths, reducing input tokens by 99% +3. **Bundles context** — `dev_plan` assembles issue + code + commits in one call +4. **Integrates with GitHub** to search issues and PRs semantically ## Measured impact -We benchmarked dev-agent against baseline Claude Code: +We benchmarked dev-agent against baseline Claude Code across different task types: -| Metric | Baseline | With dev-agent | Change | -|--------|----------|----------------|--------| -| Cost | $1.82 | $1.02 | **-44%** | -| Time | 14.1 min | 11.5 min | **-19%** | -| Tool calls | 69 | 40 | **-42%** | +| Task Type | Cost Savings | Time Savings | Why | +|-----------|--------------|--------------|-----| +| **Debugging** | **42%** | 37% | Semantic search beats grep chains | +| **Exploration** | **44%** | 19% | Find code by meaning | +| **Implementation** | **29%** | 22% | Context bundling via `dev_plan` | -**Trade-off:** Faster but sometimes less thorough. Best for implementation tasks and exploration. For deep debugging, baseline Claude may read more files. +**Key insight:** Savings scale with task complexity. Simple lookups show no improvement; complex debugging shows 42% cost reduction. + +**Trade-off:** Faster but sometimes less thorough. Baseline Claude provides more diagnostic shell commands. + +## Why it saves money + +| What dev-agent does | Manual equivalent | Impact | +|---------------------|-------------------|--------| +| Returns code snippets in search | Read entire files | 99% fewer input tokens | +| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction | +| Semantic search finds relevant code | grep chains + filtering | 42% cost reduction | ## Key Features | Feature | Description | |---------|-------------| +| **Context Bundling** | `dev_plan` replaces 5-10 tool calls with one | +| **Code Snippets** | Search returns code, not just file paths | | **Semantic Search** | Find code by meaning, not keywords | -| **Relationship Queries** | What calls this function? What does it call? | | **Git History** | Semantic search over commits | -| **GitHub Integration** | Search issues and PRs semantically | | **100% Local** | Your code never leaves your machine | ## Architecture @@ -45,4 +56,3 @@ dev-agent is a monorepo: - [Installation →](/docs/install) — Get dev-agent installed in under 2 minutes - [Quickstart →](/docs/quickstart) — Index and search in 5 minutes - diff --git a/website/content/index.mdx b/website/content/index.mdx index 356930a..58fdee3 100644 --- a/website/content/index.mdx +++ b/website/content/index.mdx @@ -15,65 +15,112 @@ Local semantic code search for Cursor and Claude Code via MCP. - **Built by engineers, for engineers.** An MCP server that gives your AI tools semantic code search. We built it to speed up our own workflow — and measured 44% cost savings. + **Built by engineers, for engineers.** An MCP server that gives your AI tools semantic code search and context bundling. Savings scale with task complexity — up to 42% on debugging tasks. + + +## Why it saves money + +dev-agent doesn't just search — it **bundles context** so Claude reads less: + +| What dev-agent does | Manual equivalent | Savings | +|---------------------|-------------------|---------| +| Returns code snippets in search | Read entire files | 99% fewer input tokens | +| `dev_plan` bundles issue + code + commits | 5-10 separate tool calls | 29% cost reduction | +| Semantic search finds relevant code | grep chains + manual filtering | 42% cost reduction | + +**The harder the task, the bigger the savings.** + +## Measured results by task type + +| Task Type | Cost Savings | Time Savings | Why | +|-----------|--------------|--------------|-----| +| **Debugging** | **42%** | 37% | Semantic search beats grep for "where is the bug?" | +| **Exploration** | **44%** | 19% | Find code by meaning, not keywords | +| **Implementation** | **29%** | 22% | `dev_plan` bundles context in one call | +| **Simple lookup** | ~0% | ~0% | Both approaches are fast | + + + **Trade-off:** Faster but sometimes less thorough. Baseline Claude provides more diagnostic shell commands. dev-agent excels when you need to explore or understand code. ## Same question, different approach -We asked Claude Code: *"Where is rate limiting implemented and how does it work?"* +We asked Claude Code: *"Debug why search returns duplicates"* - **Claude's approach (8 tool calls):** + **Claude's approach:** - - - - - - - - - - - - + + + + + + + + - **Result:** 8 tool calls, 3 files read → **$0.36, 2.1 minutes** + **Result:** 18+ tool calls, 10 files read → **$1.37, 12 minutes** + + *18,800 input tokens consumed* - **Claude's approach (2 tool calls):** + **Claude's approach:** - - - + + + - + + + + + - **Result:** 2 tool calls, 1 file read → **$0.20, 1.3 minutes** + **Result:** 6 tool calls, 5 files read → **$0.79, 7.5 minutes** + + *65 input tokens consumed (99.7% less)* - **Same answer. 44% cheaper. 38% faster.** + **Same root cause identified. 42% cheaper. 37% faster.** -## Measured results +## Context bundling: `dev_plan` -We ran 5 task types comparing baseline Claude Code vs. with dev-agent: +For implementation tasks, `dev_plan` bundles everything in one call: -| Metric | Baseline | With dev-agent | Change | -|--------|----------|----------------|--------| -| Cost | $1.82 | $1.02 | **-44%** | -| Time | 14.1 min | 11.5 min | **-19%** | -| Tool calls | 69 | 40 | **-42%** | -| Files read | 23 | 15 | **-35%** | - - - **Trade-off:** Faster but sometimes less thorough. Baseline Claude read more files for debugging tasks. dev-agent excels at implementation and exploration. - + + + **Claude's approach for "Implement issue #61":** + ```bash + gh issue view 61 --json title,body # Fetch issue + grep "--json" -r packages/cli # Find existing flags + Read search.ts # Check implementation + Read mcp.ts # Check implementation + Read config.ts # Check file writes + # ... 5+ more tool calls + ``` + + **Result:** $0.55, 5.7 minutes + + + **Claude's approach:** + ```bash + dev_plan --issue 61 + # Returns in ONE call: + # - Issue details + comments + # - Relevant code snippets + # - Related commits (5 found) + # - Codebase patterns + ``` + + **Result:** $0.39, 4.5 minutes (**29% cheaper**) + + ## How it works @@ -99,11 +146,7 @@ flowchart LR D <--> E ``` -**The flow:** -1. Your AI tool asks a question like *"where is auth handled?"* -2. dev-agent searches the vector database semantically -3. Returns relevant code with snippets, relationships, and context -4. All processing happens locally — your code never leaves your machine +**Key insight:** dev-agent returns **code snippets with context** — Claude doesn't read entire files. This is why input tokens drop by 99%. ## Quick Start @@ -129,46 +172,15 @@ dev mcp install # For Claude Code ``` -## Example: What dev_search returns - -When Claude asks *"where is rate limiting implemented?"*, dev-agent returns: - -```typescript -// dev_search: "rate limiting implementation" -// Found 2 results - -// 1. packages/mcp-server/src/server/utils/rate-limiter.ts -// Score: 0.89 | Type: Class -// Callers: AdapterRegistry.executeTool - -export class RateLimiter { - private buckets = new Map(); - - check(key: string): { allowed: boolean; retryAfter?: number } { - // Token bucket algorithm implementation - } -} - -// 2. packages/mcp-server/src/adapters/adapter-registry.ts -// Score: 0.72 | Type: Function - -if (this.rateLimiter) { - const result = this.rateLimiter.check(toolName); - if (!result.allowed) return { error: 'Rate limited' }; -} -``` - -Claude gets **code snippets + relationships** in one call. No grep chains needed. - ## 9 MCP Tools | Tool | What it does | |------|--------------| -| [`dev_search`](/docs/tools/dev-search) | Semantic code search — find by meaning, not keywords | +| [`dev_search`](/docs/tools/dev-search) | Semantic code search — returns snippets, not just paths | +| [`dev_plan`](/docs/tools/dev-plan) | **Context bundling** — issue + code + commits in one call | | [`dev_refs`](/docs/tools/dev-refs) | Find callers/callees of any function | | [`dev_map`](/docs/tools/dev-map) | Codebase structure with change frequency | | [`dev_history`](/docs/tools/dev-history) | Semantic search over git commits | -| [`dev_plan`](/docs/tools/dev-plan) | Assemble context for GitHub issues | | [`dev_explore`](/docs/tools/dev-explore) | Find similar code, trace relationships | | [`dev_gh`](/docs/tools/dev-gh) | Search GitHub issues/PRs semantically | | [`dev_status`](/docs/tools/dev-status) | Repository indexing status | @@ -176,21 +188,21 @@ Claude gets **code snippets + relationships** in one call. No grep chains needed ## When to use it -| Scenario | dev-agent? | Why | -|----------|------------|-----| -| Large/unfamiliar codebase | ✅ Yes | Semantic search beats grep for conceptual queries | -| Implementation tasks | ✅ Yes | Finds existing code to reuse | -| Reducing API costs | ✅ Yes | 44% cost reduction measured | -| Small codebase you know | ❌ Skip | Your mental model is faster | -| Deep debugging | ⚠️ Maybe | May need more file reads than dev-agent provides | -| Thoroughness over speed | ⚠️ Maybe | Baseline Claude reads more files | +| Scenario | dev-agent? | Expected Savings | +|----------|------------|------------------| +| Debugging unfamiliar code | ✅ Yes | **42% cost** | +| Exploring large codebase | ✅ Yes | **44% cost** | +| Implementing GitHub issues | ✅ Yes | **29% cost** | +| Small codebase you know | ❌ Skip | ~0% | +| Need exhaustive file reads | ⚠️ Maybe | Trade speed for thoroughness | ## Features -- **100% Local** — Code never leaves your machine. No API keys needed. -- **TypeScript/JS/Markdown** — Full support today. More languages planned. -- **Sub-second Search** — Fast even on large repos with LanceDB. -- **1300+ Tests** — Production-grade reliability. +- **Context Bundling** — `dev_plan` replaces 5-10 tool calls with one +- **Code Snippets** — Search returns code, not just file paths +- **100% Local** — Your code never leaves your machine +- **Sub-second Search** — Fast even on large repos with LanceDB +- **1379+ Tests** — Production-grade reliability ---