diff --git a/website/content/_meta.js b/website/content/_meta.js index e445689..6631c40 100644 --- a/website/content/_meta.js +++ b/website/content/_meta.js @@ -1,4 +1,5 @@ export default { index: 'Home', docs: 'Documentation', + blog: 'Blog', }; diff --git a/website/content/blog/10-days-of-vibe-coding.mdx b/website/content/blog/10-days-of-vibe-coding.mdx new file mode 100644 index 0000000..826b3e6 --- /dev/null +++ b/website/content/blog/10-days-of-vibe-coding.mdx @@ -0,0 +1,306 @@ +--- +title: "10 days of vibe coding: what I learned building an MCP server" +date: 2025-11-29 +description: "How a hackathon project turned into 42% cost savings for AI-assisted development" +--- + +import { Callout, Steps, Tabs, FileTree } from 'nextra/components' + +# 10 days of vibe coding: what I learned building an MCP server + + +This is the story of building [dev-agent](https://github.com/lytics/dev-agent), an MCP server that gives AI tools semantic code search. What started as a hackathon exploration turned into measurable improvements in my daily workflow. + + +I was watching Claude read the same file for the third time in a row. It had already found the answer — it just didn't know it. + +That's when I decided to spend a week figuring out why. + +## What "vibe coding" actually means to me + +Vibe coding isn't about letting AI write everything. It's about: + +1. Describing intent at a high level +2. Letting AI handle the boilerplate +3. Focusing my attention on the hard parts + +The problem? AI can't handle the boilerplate if it doesn't understand the codebase. I was spending more time correcting Claude's assumptions than writing code myself. + +## The problem: grep chains + +Here's what a typical Claude Code session looked like before I built dev-agent: + +``` +Task: "Where is rate limiting implemented?" + +Claude's approach: +1. grep "rate limit" → 23 matches across 8 files +2. Read packages/mcp-server/src/server/rate-limiter.ts (180 lines) +3. grep "token bucket" → 12 matches +4. Read packages/mcp-server/src/server/index.ts (340 lines) +5. Read packages/core/src/utils/retry.ts (95 lines) +6. ... 5 more file reads + +Total: 18 tool calls, 10 files read, ~18,000 input tokens +Time: 45 seconds +``` + +The answer was in lines 45-62 of the first file. Claude read 10 files to find it. + +## The premise + +I set aside a week to explore this. My question: **Can I make Claude Code understand my codebase better?** + +I started with a `PLAN.md` and a monorepo scaffold. The goal wasn't to build a product — it was to learn how AI tools explore codebases and whether I could improve that experience. + +## Day 1-2: The foundation + +The first two days were about building the core: a repository scanner and vector storage. + +### Why local-first mattered + +I wanted embeddings stored locally, not sent to a cloud service. My code stays on my machine. This led me to: + +- **LanceDB** for vector storage (embedded, no server) +- **Transformers.js** for embeddings (runs locally, no API calls) +- **ts-morph** for TypeScript parsing (extracts functions, classes, relationships) + +```typescript +// What the scanner extracts +interface Component { + name: string; + type: 'function' | 'class' | 'interface'; + filePath: string; + startLine: number; + endLine: number; + imports: string[]; + exports: string[]; +} +``` + +By day 2, I had a working CLI: + +```bash +dev index . # Index the repository +dev search "auth" # Semantic search +``` + +The scanner hit 94% test coverage on day 1. Not because I'm obsessive about coverage, but because testing edge cases revealed bugs in how I was parsing TypeScript. + +## Day 3-4: The subagent architecture + +I got ambitious. What if I had specialized agents for different tasks? + +- **Explorer** — Find similar code, trace relationships +- **Planner** — Analyze GitHub issues, break them into tasks +- **GitHub agent** — Index issues/PRs for semantic search + +By day 4, I had 557 tests passing. The subagent coordinator could route messages between agents, share context, and handle graceful shutdown. + +### The decision: context provision, not automation + +I originally planned a "PR agent" that would create pull requests automatically. I cut it. + +Why? I realized the real value was in **context provision** — giving AI tools better information to work with. Automation can come later. First, solve the information problem. + +## Day 5-6: MCP integration + +This is where things got interesting. + +### Why MCP over HTTP API + +My original plan was an HTTP API server. But MCP (Model Context Protocol) was a better fit: + +- Works natively with Claude Code and Cursor +- No server management — just a CLI command +- Stdio transport is simple and reliable + +```bash +# One command to integrate with Claude Code +dev mcp install +``` + +### The "aha" moment + +When I first got semantic search working in Claude Code, I noticed something unexpected. Claude was making **fewer file reads**. + +Before: Claude would grep, find file paths, then read entire files. + +After: My search returned **code snippets**, not just file paths. Claude could see the relevant code without reading the file. + +```typescript +// What dev_search returns +// packages/mcp-server/src/server/rate-limiter.ts (score: 0.92) +// Lines 45-62 + +export class TokenBucketRateLimiter implements RateLimiter { + private tokens: number; + private lastRefill: number; + + constructor(private config: RateLimitConfig) { + this.tokens = config.bucketSize; + this.lastRefill = Date.now(); + } + + async consume(): Promise { + this.refill(); + if (this.tokens > 0) { + this.tokens--; + return true; + } + return false; + } +} +``` + +This was the insight that would later show up in benchmarks: **99% fewer input tokens** because Claude doesn't need to read entire files. + +## Day 7-8: Richer context + +With the foundation working, I added more tools: + +- **dev_refs** — Find who calls a function and what it calls +- **dev_map** — Codebase structure with component counts +- **dev_history** — Semantic search over git commits + +The git history integration was particularly useful. Claude can now search commits by meaning: + +```bash +dev_history query="authentication refactor" +# Returns commits about auth, even if they don't use that exact word +``` + +### Unified indexing + +I consolidated everything into one command: + +```bash +dev index . +# Indexes: code → git history → GitHub issues/PRs +``` + +One command, three types of context. This became important for the `dev_plan` tool, which bundles all three into a single response. + +## Day 9-10: Measuring it + +I'm an engineer. I had to measure it. + +I ran the same tasks with and without dev-agent, tracking time, cost, tool calls, and result quality. + +### One real example + +**Task:** "Where is rate limiting implemented and how does it work?" + + + + ``` + Tool calls: 18 + Files read: 10 + Input tokens: ~18,000 + Time: 45 seconds + + Approach: grep → read → grep → read → grep... + ``` + + + ``` + Tool calls: 3 + Files read: 2 + Input tokens: ~1,200 + Time: 28 seconds + + Approach: dev_search → read 2 files for full context + ``` + + + +Same answer. **93% fewer input tokens.** + +### The results across task types + +| Task Type | Cost Savings | Time Savings | +|-----------|--------------|--------------| +| Debugging | 42% | 37% | +| Exploration | 44% | 19% | +| Implementation | 29% | 22% | + +The 42% cost savings wasn't the goal — it was a side effect of returning code snippets instead of file paths. + +### When it helps (and when it doesn't) + +The data revealed something important: **savings scale with task complexity**. + +- **Simple lookups** (find a specific function): ~0% savings. Claude's grep is fine. +- **Conceptual queries** ("how does auth work"): 44% savings. Semantic search shines. +- **Implementation tasks** (GitHub issues): 29% savings. Context bundling helps. + +If your tasks are simple, dev-agent won't help much. If you're doing complex exploration or implementation, it adds up. + +## Things that didn't work + +### Attempt 1: HTTP API server + +I spent half a day building an HTTP server before realizing CLI + MCP was simpler. Lesson: don't add infrastructure you don't need. + +### Attempt 2: Automatic PR creation + +I built a PR agent that would create PRs automatically. Cut it after day 4. Why? The real problem was context, not automation. I was solving the wrong problem. + +### Attempt 3: Complex tool descriptions + +My first tool descriptions were paragraphs long. Claude ignored them. Shorter, more prescriptive descriptions worked better: + +```typescript +// Before: vague +description: "Search the codebase" + +// After: prescriptive +description: "USE THIS FIRST for code exploration. Semantic search finds code by meaning, not just keywords. Better than grep for conceptual queries." +``` + +### Attempt 4: Too many tools too fast + +By day 4, I had 9 tools. That was too many to test properly. I should have started with 3 and added incrementally. + +## How my workflow changed + +Before dev-agent, vibe coding felt like babysitting. I'd describe what I wanted, watch Claude grep around, then correct its assumptions. + +Now it feels more like pair programming. Claude finds the right code faster, which means I spend more time on the interesting decisions and less time on "no, look in *that* file." + +The biggest change: **I trust Claude's first answer more often.** When it has the right context, it makes fewer mistakes. + +## If you're building an MCP server + +1. **Start with one tool.** Don't build 9 tools on day 1. +2. **Return code snippets, not file paths.** This is the biggest win. +3. **Test with real tasks, not synthetic benchmarks.** I waited until day 9 — that was too late. +4. **Tool descriptions matter more than you think.** Be prescriptive. +5. **Measure early.** If I'd measured on day 3, I would have focused on the code-snippet insight sooner. + +## What's next + +The project is open source: + +```bash +npm install -g dev-agent +dev index . +dev mcp install # For Claude Code +dev mcp install --cursor # For Cursor +``` + +I'm using it daily now. The next milestone (v0.5.0) is generalizing `dev_plan` into `dev_context` — a tool that bundles relevant context for any query, not just GitHub issues. + +--- + +## The takeaway + +Vibe coding works better when your AI tools have better context. Semantic search, code snippets, and context bundling aren't magic — they're just information retrieval done right. + +The 42% cost savings is nice, but the real win is **faster iteration**. When Claude finds the right code on the first try, I spend less time correcting it. + +If you're building AI tooling, consider: what context is your tool missing? The answer might be simpler than you think. + +--- + +*Built during a hackathon week in November 2025. [Source code on GitHub](https://github.com/lytics/dev-agent).* diff --git a/website/content/blog/_meta.js b/website/content/blog/_meta.js new file mode 100644 index 0000000..d55dc74 --- /dev/null +++ b/website/content/blog/_meta.js @@ -0,0 +1,4 @@ +export default { + index: 'Blog', + '10-days-of-vibe-coding': '10 days of vibe coding', +}; diff --git a/website/content/blog/index.mdx b/website/content/blog/index.mdx new file mode 100644 index 0000000..19e16ea --- /dev/null +++ b/website/content/blog/index.mdx @@ -0,0 +1,14 @@ +--- +title: Blog +--- + +# Blog + +Notes from building dev-agent — an MCP server for semantic code search. + +--- + +## Latest Posts + +- **[10 days of vibe coding: what I learned building an MCP server](/blog/10-days-of-vibe-coding)** — How a hackathon project turned into 42% cost savings for AI-assisted development. +