diff --git a/.gitignore b/.gitignore index b8143ee..8967b28 100644 --- a/.gitignore +++ b/.gitignore @@ -67,3 +67,6 @@ temp/ # Work in progress packages (not ready for commit) packages/benchmark/ + +# Benchmark studies and session logs +studies/ diff --git a/PLAN.md b/PLAN.md index b104b8f..4785d52 100644 --- a/PLAN.md +++ b/PLAN.md @@ -181,7 +181,35 @@ Git history is valuable context that LLMs can't easily access. We add intelligen --- -## Current: Extended Git Intelligence (v0.5.0) +## Current: Quality & Thoroughness (v0.4.x) + +> Addressing gaps identified in benchmark study comparing dev-agent vs baseline Claude Code. + +**Context:** Benchmarks showed dev-agent provides 44% cost savings and 19% faster responses, but with quality trade-offs. These improvements close the gap. + +### Benchmark-Driven Improvements + +| Task | Gap Identified | Priority | Status | +|------|----------------|----------|--------| +| Diagnostic command suggestions | Baseline provided shell commands for debugging; dev-agent didn't | πŸ”΄ High | πŸ”² Todo | +| Test file inclusion hints | Baseline read test files; dev-agent skipped them | πŸ”΄ High | πŸ”² Todo | +| Code example extraction | Baseline included more code snippets in responses | 🟑 Medium | πŸ”² Todo | +| Exhaustive mode for debugging | Option for thorough exploration vs fast satisficing | 🟑 Medium | πŸ”² Todo | +| Related files suggestions | "You might also want to check: X, Y, Z" | 🟑 Medium | πŸ”² Todo | + +### Tool Description Refinements (Done in v0.4.2) + +| Task | Status | +|------|--------| +| Improved dev_search description ("USE THIS FIRST") | βœ… Done | +| Improved dev_map description (vs list_dir) | βœ… Done | +| Improved dev_explore description (workflow hints) | βœ… Done | +| Improved dev_refs description (specific symbols) | βœ… Done | +| All 9 adapters registered in CLI | βœ… Done | + +--- + +## Next: Extended Git Intelligence (v0.5.0) > Building on git history with deeper insights. @@ -286,6 +314,24 @@ How we know dev-agent is working: 4. **Daily use:** We actually use it ourselves (dogfooding) 5. **LLM effectiveness:** Claude/Cursor make better suggestions with dev-agent +### Benchmark Results (v0.4.2) + +Measured against baseline Claude Code across 5 task types: + +| Metric | Baseline | With dev-agent | Improvement | +|--------|----------|----------------|-------------| +| Cost per session | $1.82 | $1.02 | **-44%** | +| Time per session | 14.1 min | 11.5 min | **-19%** | +| Tool calls | 69 | 40 | **-42%** | +| Files examined | 23 | 15 | **-35%** | + +**Trade-offs identified:** +- Less thorough for debugging (missing diagnostic commands) +- Fewer code examples in responses +- Skips test files (baseline reads them) + +**Target users:** Mid-to-senior engineers who value speed over exhaustiveness for routine exploration tasks. + --- ## Contributing diff --git a/README.md b/README.md index db7744c..3e3020b 100644 --- a/README.md +++ b/README.md @@ -4,19 +4,45 @@ [![pnpm](https://img.shields.io/badge/pnpm-8.15.4-orange.svg)](https://pnpm.io/) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) -**Local-first repository context provider for AI tools. Semantic code search, git history, relationship queries, and codebase mapping via MCP.** +**Local semantic code search for Cursor and Claude Code via MCP.** -## What is dev-agent? +## What it does -dev-agent provides **rich, structured context** to AI assistants like Claude and Cursor. Instead of AI tools reading files one at a time, dev-agent gives them: +dev-agent indexes your codebase and provides 9 MCP tools to AI assistants. Instead of AI tools grepping through files, they can ask conceptual questions like "where do we handle authentication?" -- πŸ” **Semantic search** with code snippets and relationships -- πŸ—ΊοΈ **Codebase maps** showing structure and change frequency -- πŸ”— **Relationship queries** (what calls what) -- πŸ“œ **Git history search** (who changed what and why) -- πŸ“‹ **Issue context** assembled for planning +- `dev_search` β€” Semantic code search by meaning +- `dev_refs` β€” Find callers/callees of functions +- `dev_map` β€” Codebase structure with change frequency +- `dev_history` β€” Semantic search over git commits +- `dev_plan` β€” Assemble context for GitHub issues +- `dev_explore` β€” Find similar code, trace relationships +- `dev_gh` β€” Search GitHub issues/PRs semantically +- `dev_status` / `dev_health` β€” Monitoring -**Philosophy:** Provide data, let LLMs reason. We don't try to be smart with heuristicsβ€”we provide comprehensive context so AI assistants can be smart. +## Measured results + +We benchmarked dev-agent against baseline Claude Code across 5 task types: + +| Metric | Baseline | With dev-agent | Change | +|--------|----------|----------------|--------| +| Cost | $1.82 | $1.02 | **-44%** | +| Time | 14.1 min | 11.5 min | **-19%** | +| Tool calls | 69 | 40 | **-42%** | + +**Trade-offs:** Faster but sometimes less thorough. Best for implementation tasks and codebase exploration. For deep debugging, baseline Claude may read more files. + +## When to use it + +**Good fit:** +- Large or unfamiliar codebases +- Implementation tasks ("add a feature like X") +- Exploring how code works +- Reducing AI API costs + +**Less useful:** +- Small codebases you already know well +- Deep debugging sessions +- When thoroughness matters more than speed ## Quick Start diff --git a/website/content/docs/index.mdx b/website/content/docs/index.mdx index 2fa1b4b..de00484 100644 --- a/website/content/docs/index.mdx +++ b/website/content/docs/index.mdx @@ -1,46 +1,48 @@ # Introduction -**dev-agent** is a local-first repository context provider for AI tools. It gives AI assistants like Cursor and Claude Code deep understanding of your codebase through semantic search, code analysis, and GitHub integration. +**dev-agent** provides semantic code search to AI assistants like Cursor and Claude Code via MCP. -## The Problem +We built this for ourselves. When exploring large codebases, we found AI tools spending too much time grepping through files. dev-agent gives them a faster path: search by meaning, not keywords. -AI coding assistants are powerful, but they struggle with context: +## What it does -- They can't search your codebase semantically -- They don't understand relationships between files -- They lack awareness of your GitHub issues and PRs -- They hallucinate about code that doesn't exist +1. **Indexes your codebase** locally with embeddings (all-MiniLM-L6-v2) +2. **Exposes 9 MCP tools** for semantic search, code relationships, git history +3. **Integrates with GitHub** to search issues and PRs semantically -## The Solution +## Measured impact -dev-agent solves this by: +We benchmarked dev-agent against baseline Claude Code: -1. **Indexing your codebase** with local embeddings (all-MiniLM-L6-v2) -2. **Exposing semantic search** via the Model Context Protocol (MCP) -3. **Integrating with GitHub** to understand your project's history -4. **Providing specialized tools** for planning, exploration, and more +| Metric | Baseline | With dev-agent | Change | +|--------|----------|----------------|--------| +| Cost | $1.82 | $1.02 | **-44%** | +| Time | 14.1 min | 11.5 min | **-19%** | +| Tool calls | 69 | 40 | **-42%** | + +**Trade-off:** Faster but sometimes less thorough. Best for implementation tasks and exploration. For deep debugging, baseline Claude may read more files. ## Key Features | Feature | Description | |---------|-------------| | **Semantic Search** | Find code by meaning, not keywords | -| **AST Analysis** | Type-aware understanding of your code | +| **Relationship Queries** | What calls this function? What does it call? | +| **Git History** | Semantic search over commits | | **GitHub Integration** | Search issues and PRs semantically | -| **Local-First** | Your code never leaves your machine | -| **MCP Native** | Works with Cursor, Claude Code, VS Code | +| **100% Local** | Your code never leaves your machine | -## Architecture Overview +## Architecture -dev-agent is built as a monorepo with specialized packages: +dev-agent is a monorepo: -- **@lytics/dev-agent-core** β€” Repository scanning, vector storage, GitHub integration +- **@lytics/dev-agent-core** β€” Scanning, vector storage, GitHub integration - **@lytics/dev-agent-cli** β€” Command-line interface -- **@lytics/dev-agent-mcp** β€” MCP server with tool adapters -- **@lytics/dev-agent-subagents** β€” Specialized agents (planner, explorer, PR manager) +- **@lytics/dev-agent-mcp** β€” MCP server with 9 tool adapters +- **@lytics/dev-agent-subagents** β€” Planner, explorer agents ## Next Steps - [Installation β†’](/docs/install) β€” Get dev-agent installed in under 2 minutes -- [Quickstart β†’](/docs/quickstart) β€” From zero to semantic search in 5 minutes +- [Quickstart β†’](/docs/quickstart) β€” Index and search in 5 minutes diff --git a/website/content/index.mdx b/website/content/index.mdx index ff4aefd..356930a 100644 --- a/website/content/index.mdx +++ b/website/content/index.mdx @@ -2,69 +2,196 @@ title: dev-agent --- +import { Callout, Steps, Tabs, FileTree } from 'nextra/components' + # dev-agent -Deep code intelligence + AI subagents via MCP. Local-first semantic search for Cursor and Claude Code. +Local semantic code search for Cursor and Claude Code via MCP. [Get Started](/docs) Β· [View on GitHub](https://github.com/lytics/dev-agent) -## Why dev-agent? - -AI coding assistants are only as good as the context they receive. dev-agent gives your AI tools **deep understanding** of your codebase: + + **v0.4.3** β€” Now with git history search (`dev_history`), change frequency in `dev_map`, and improved tool descriptions. [See what's new β†’](https://github.com/lytics/dev-agent/releases) + + + + **Built by engineers, for engineers.** An MCP server that gives your AI tools semantic code search. We built it to speed up our own workflow β€” and measured 44% cost savings. + + +## Same question, different approach + +We asked Claude Code: *"Where is rate limiting implemented and how does it work?"* + + + + **Claude's approach (8 tool calls):** + + + + + + + + + + + + + + + + **Result:** 8 tool calls, 3 files read β†’ **$0.36, 2.1 minutes** + + + **Claude's approach (2 tool calls):** + + + + + + + + + **Result:** 2 tool calls, 1 file read β†’ **$0.20, 1.3 minutes** + + + + + **Same answer. 44% cheaper. 38% faster.** + + +## Measured results + +We ran 5 task types comparing baseline Claude Code vs. with dev-agent: + +| Metric | Baseline | With dev-agent | Change | +|--------|----------|----------------|--------| +| Cost | $1.82 | $1.02 | **-44%** | +| Time | 14.1 min | 11.5 min | **-19%** | +| Tool calls | 69 | 40 | **-42%** | +| Files read | 23 | 15 | **-35%** | + + + **Trade-off:** Faster but sometimes less thorough. Baseline Claude read more files for debugging tasks. dev-agent excels at implementation and exploration. + + +## How it works + +```mermaid +flowchart LR + subgraph IDE["Your AI Tool"] + A["Cursor / Claude Code"] + end + + subgraph Agent["dev-agent"] + B["MCP Server"] + C["9 Tools"] + end + + subgraph Local["Local Storage"] + D["Vector DB"] + E["Embeddings"] + end + + A <-->|"MCP Protocol"| B + B --> C + C <--> D + D <--> E +``` -| Feature | Description | -|---------|-------------| -| πŸ” **Semantic Search** | Find code by meaning, not keywords. "Where do we handle auth?" actually works. | -| πŸ“‹ **Smart Planning** | Generate implementation plans from GitHub issues with accurate estimates. | -| πŸ™ **GitHub Integration** | Search issues and PRs semantically. Understand project history. | -| 🧭 **Code Exploration** | Find patterns, similar code, and trace relationships across your codebase. | +**The flow:** +1. Your AI tool asks a question like *"where is auth handled?"* +2. dev-agent searches the vector database semantically +3. Returns relevant code with snippets, relationships, and context +4. All processing happens locally β€” your code never leaves your machine ## Quick Start + +### Install + ```bash -# Install globally -npm install -g @lytics/dev-agent +npm install -g dev-agent +``` + +### Index your repository -# Index your repository +```bash cd your-project dev index . +``` -# Install in Cursor -dev mcp install --cursor +### Connect to your AI tool + +```bash +dev mcp install --cursor # For Cursor +dev mcp install # For Claude Code ``` + -That's it. Your AI assistant now has superpowers. +## Example: What dev_search returns -## How It Works +When Claude asks *"where is rate limiting implemented?"*, dev-agent returns: +```typescript +// dev_search: "rate limiting implementation" +// Found 2 results + +// 1. packages/mcp-server/src/server/utils/rate-limiter.ts +// Score: 0.89 | Type: Class +// Callers: AdapterRegistry.executeTool + +export class RateLimiter { + private buckets = new Map(); + + check(key: string): { allowed: boolean; retryAfter?: number } { + // Token bucket algorithm implementation + } +} + +// 2. packages/mcp-server/src/adapters/adapter-registry.ts +// Score: 0.72 | Type: Function + +if (this.rateLimiter) { + const result = this.rateLimiter.check(toolName); + if (!result.allowed) return { error: 'Rate limited' }; +} ``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Your AI Tool β”‚ -β”‚ (Cursor / Claude Code) β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ MCP Protocol -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ dev-agent β”‚ -β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ -β”‚ β”‚ dev_search β”‚ β”‚ dev_plan β”‚ β”‚ dev_explore β”‚ β”‚ -β”‚ β”‚ dev_status β”‚ β”‚ dev_gh β”‚ β”‚ dev_health β”‚ β”‚ -β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ -β”‚ β”‚ β”‚ -β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ -β”‚ β”‚ Local Vector Storage β”‚ β”‚ -β”‚ β”‚ (LanceDB + MiniLM embeddings) β”‚ β”‚ -β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` + +Claude gets **code snippets + relationships** in one call. No grep chains needed. + +## 9 MCP Tools + +| Tool | What it does | +|------|--------------| +| [`dev_search`](/docs/tools/dev-search) | Semantic code search β€” find by meaning, not keywords | +| [`dev_refs`](/docs/tools/dev-refs) | Find callers/callees of any function | +| [`dev_map`](/docs/tools/dev-map) | Codebase structure with change frequency | +| [`dev_history`](/docs/tools/dev-history) | Semantic search over git commits | +| [`dev_plan`](/docs/tools/dev-plan) | Assemble context for GitHub issues | +| [`dev_explore`](/docs/tools/dev-explore) | Find similar code, trace relationships | +| [`dev_gh`](/docs/tools/dev-gh) | Search GitHub issues/PRs semantically | +| [`dev_status`](/docs/tools/dev-status) | Repository indexing status | +| [`dev_health`](/docs/tools/dev-health) | Server health checks | + +## When to use it + +| Scenario | dev-agent? | Why | +|----------|------------|-----| +| Large/unfamiliar codebase | βœ… Yes | Semantic search beats grep for conceptual queries | +| Implementation tasks | βœ… Yes | Finds existing code to reuse | +| Reducing API costs | βœ… Yes | 44% cost reduction measured | +| Small codebase you know | ❌ Skip | Your mental model is faster | +| Deep debugging | ⚠️ Maybe | May need more file reads than dev-agent provides | +| Thoroughness over speed | ⚠️ Maybe | Baseline Claude reads more files | ## Features -- **100% Local** β€” Your code never leaves your machine -- **Multi-Language** β€” TypeScript, JavaScript, Markdown (more coming) -- **Fast** β€” Sub-second semantic search across large codebases -- **MCP Native** β€” Works with any MCP-compatible AI tool -- **GitHub Aware** β€” Understands your issues, PRs, and project history +- **100% Local** β€” Code never leaves your machine. No API keys needed. +- **TypeScript/JS/Markdown** β€” Full support today. More languages planned. +- **Sub-second Search** β€” Fast even on large repos with LanceDB. +- **1300+ Tests** β€” Production-grade reliability. --- -MIT License β€’ Built by [Lytics](https://github.com/lytics) +MIT License β€’ Built by [prosdev](https://github.com/prosdev) diff --git a/website/theme.config.tsx b/website/theme.config.tsx index 27f8baf..9b86163 100644 --- a/website/theme.config.tsx +++ b/website/theme.config.tsx @@ -23,10 +23,13 @@ const config = { - + ), sidebar: {