|
| 1 | +--- |
| 2 | +title: "10 days of vibe coding: what I learned building an MCP server" |
| 3 | +date: 2025-11-29 |
| 4 | +description: "How a hackathon project turned into 42% cost savings for AI-assisted development" |
| 5 | +--- |
| 6 | + |
| 7 | +import { Callout, Steps, Tabs, FileTree } from 'nextra/components' |
| 8 | + |
| 9 | +# 10 days of vibe coding: what I learned building an MCP server |
| 10 | + |
| 11 | +<Callout type="info"> |
| 12 | +This is the story of building [dev-agent](https://github.com/lytics/dev-agent), an MCP server that gives AI tools semantic code search. What started as a hackathon exploration turned into measurable improvements in my daily workflow. |
| 13 | +</Callout> |
| 14 | + |
| 15 | +I was watching Claude read the same file for the third time in a row. It had already found the answer — it just didn't know it. |
| 16 | + |
| 17 | +That's when I decided to spend a week figuring out why. |
| 18 | + |
| 19 | +## What "vibe coding" actually means to me |
| 20 | + |
| 21 | +Vibe coding isn't about letting AI write everything. It's about: |
| 22 | + |
| 23 | +1. Describing intent at a high level |
| 24 | +2. Letting AI handle the boilerplate |
| 25 | +3. Focusing my attention on the hard parts |
| 26 | + |
| 27 | +The problem? AI can't handle the boilerplate if it doesn't understand the codebase. I was spending more time correcting Claude's assumptions than writing code myself. |
| 28 | + |
| 29 | +## The problem: grep chains |
| 30 | + |
| 31 | +Here's what a typical Claude Code session looked like before I built dev-agent: |
| 32 | + |
| 33 | +``` |
| 34 | +Task: "Where is rate limiting implemented?" |
| 35 | +
|
| 36 | +Claude's approach: |
| 37 | +1. grep "rate limit" → 23 matches across 8 files |
| 38 | +2. Read packages/mcp-server/src/server/rate-limiter.ts (180 lines) |
| 39 | +3. grep "token bucket" → 12 matches |
| 40 | +4. Read packages/mcp-server/src/server/index.ts (340 lines) |
| 41 | +5. Read packages/core/src/utils/retry.ts (95 lines) |
| 42 | +6. ... 5 more file reads |
| 43 | +
|
| 44 | +Total: 18 tool calls, 10 files read, ~18,000 input tokens |
| 45 | +Time: 45 seconds |
| 46 | +``` |
| 47 | + |
| 48 | +The answer was in lines 45-62 of the first file. Claude read 10 files to find it. |
| 49 | + |
| 50 | +## The premise |
| 51 | + |
| 52 | +I set aside a week to explore this. My question: **Can I make Claude Code understand my codebase better?** |
| 53 | + |
| 54 | +I started with a `PLAN.md` and a monorepo scaffold. The goal wasn't to build a product — it was to learn how AI tools explore codebases and whether I could improve that experience. |
| 55 | + |
| 56 | +## Day 1-2: The foundation |
| 57 | + |
| 58 | +The first two days were about building the core: a repository scanner and vector storage. |
| 59 | + |
| 60 | +### Why local-first mattered |
| 61 | + |
| 62 | +I wanted embeddings stored locally, not sent to a cloud service. My code stays on my machine. This led me to: |
| 63 | + |
| 64 | +- **LanceDB** for vector storage (embedded, no server) |
| 65 | +- **Transformers.js** for embeddings (runs locally, no API calls) |
| 66 | +- **ts-morph** for TypeScript parsing (extracts functions, classes, relationships) |
| 67 | + |
| 68 | +```typescript |
| 69 | +// What the scanner extracts |
| 70 | +interface Component { |
| 71 | + name: string; |
| 72 | + type: 'function' | 'class' | 'interface'; |
| 73 | + filePath: string; |
| 74 | + startLine: number; |
| 75 | + endLine: number; |
| 76 | + imports: string[]; |
| 77 | + exports: string[]; |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +By day 2, I had a working CLI: |
| 82 | + |
| 83 | +```bash |
| 84 | +dev index . # Index the repository |
| 85 | +dev search "auth" # Semantic search |
| 86 | +``` |
| 87 | + |
| 88 | +The scanner hit 94% test coverage on day 1. Not because I'm obsessive about coverage, but because testing edge cases revealed bugs in how I was parsing TypeScript. |
| 89 | + |
| 90 | +## Day 3-4: The subagent architecture |
| 91 | + |
| 92 | +I got ambitious. What if I had specialized agents for different tasks? |
| 93 | + |
| 94 | +- **Explorer** — Find similar code, trace relationships |
| 95 | +- **Planner** — Analyze GitHub issues, break them into tasks |
| 96 | +- **GitHub agent** — Index issues/PRs for semantic search |
| 97 | + |
| 98 | +By day 4, I had 557 tests passing. The subagent coordinator could route messages between agents, share context, and handle graceful shutdown. |
| 99 | + |
| 100 | +### The decision: context provision, not automation |
| 101 | + |
| 102 | +I originally planned a "PR agent" that would create pull requests automatically. I cut it. |
| 103 | + |
| 104 | +Why? I realized the real value was in **context provision** — giving AI tools better information to work with. Automation can come later. First, solve the information problem. |
| 105 | + |
| 106 | +## Day 5-6: MCP integration |
| 107 | + |
| 108 | +This is where things got interesting. |
| 109 | + |
| 110 | +### Why MCP over HTTP API |
| 111 | + |
| 112 | +My original plan was an HTTP API server. But MCP (Model Context Protocol) was a better fit: |
| 113 | + |
| 114 | +- Works natively with Claude Code and Cursor |
| 115 | +- No server management — just a CLI command |
| 116 | +- Stdio transport is simple and reliable |
| 117 | + |
| 118 | +```bash |
| 119 | +# One command to integrate with Claude Code |
| 120 | +dev mcp install |
| 121 | +``` |
| 122 | + |
| 123 | +### The "aha" moment |
| 124 | + |
| 125 | +When I first got semantic search working in Claude Code, I noticed something unexpected. Claude was making **fewer file reads**. |
| 126 | + |
| 127 | +Before: Claude would grep, find file paths, then read entire files. |
| 128 | + |
| 129 | +After: My search returned **code snippets**, not just file paths. Claude could see the relevant code without reading the file. |
| 130 | + |
| 131 | +```typescript |
| 132 | +// What dev_search returns |
| 133 | +// packages/mcp-server/src/server/rate-limiter.ts (score: 0.92) |
| 134 | +// Lines 45-62 |
| 135 | + |
| 136 | +export class TokenBucketRateLimiter implements RateLimiter { |
| 137 | + private tokens: number; |
| 138 | + private lastRefill: number; |
| 139 | + |
| 140 | + constructor(private config: RateLimitConfig) { |
| 141 | + this.tokens = config.bucketSize; |
| 142 | + this.lastRefill = Date.now(); |
| 143 | + } |
| 144 | + |
| 145 | + async consume(): Promise<boolean> { |
| 146 | + this.refill(); |
| 147 | + if (this.tokens > 0) { |
| 148 | + this.tokens--; |
| 149 | + return true; |
| 150 | + } |
| 151 | + return false; |
| 152 | + } |
| 153 | +} |
| 154 | +``` |
| 155 | + |
| 156 | +This was the insight that would later show up in benchmarks: **99% fewer input tokens** because Claude doesn't need to read entire files. |
| 157 | + |
| 158 | +## Day 7-8: Richer context |
| 159 | + |
| 160 | +With the foundation working, I added more tools: |
| 161 | + |
| 162 | +- **dev_refs** — Find who calls a function and what it calls |
| 163 | +- **dev_map** — Codebase structure with component counts |
| 164 | +- **dev_history** — Semantic search over git commits |
| 165 | + |
| 166 | +The git history integration was particularly useful. Claude can now search commits by meaning: |
| 167 | + |
| 168 | +```bash |
| 169 | +dev_history query="authentication refactor" |
| 170 | +# Returns commits about auth, even if they don't use that exact word |
| 171 | +``` |
| 172 | + |
| 173 | +### Unified indexing |
| 174 | + |
| 175 | +I consolidated everything into one command: |
| 176 | + |
| 177 | +```bash |
| 178 | +dev index . |
| 179 | +# Indexes: code → git history → GitHub issues/PRs |
| 180 | +``` |
| 181 | + |
| 182 | +One command, three types of context. This became important for the `dev_plan` tool, which bundles all three into a single response. |
| 183 | + |
| 184 | +## Day 9-10: Measuring it |
| 185 | + |
| 186 | +I'm an engineer. I had to measure it. |
| 187 | + |
| 188 | +I ran the same tasks with and without dev-agent, tracking time, cost, tool calls, and result quality. |
| 189 | + |
| 190 | +### One real example |
| 191 | + |
| 192 | +**Task:** "Where is rate limiting implemented and how does it work?" |
| 193 | + |
| 194 | +<Tabs items={['Without dev-agent', 'With dev-agent']}> |
| 195 | + <Tabs.Tab> |
| 196 | + ``` |
| 197 | + Tool calls: 18 |
| 198 | + Files read: 10 |
| 199 | + Input tokens: ~18,000 |
| 200 | + Time: 45 seconds |
| 201 | + |
| 202 | + Approach: grep → read → grep → read → grep... |
| 203 | + ``` |
| 204 | + </Tabs.Tab> |
| 205 | + <Tabs.Tab> |
| 206 | + ``` |
| 207 | + Tool calls: 3 |
| 208 | + Files read: 2 |
| 209 | + Input tokens: ~1,200 |
| 210 | + Time: 28 seconds |
| 211 | + |
| 212 | + Approach: dev_search → read 2 files for full context |
| 213 | + ``` |
| 214 | + </Tabs.Tab> |
| 215 | +</Tabs> |
| 216 | + |
| 217 | +Same answer. **93% fewer input tokens.** |
| 218 | + |
| 219 | +### The results across task types |
| 220 | + |
| 221 | +| Task Type | Cost Savings | Time Savings | |
| 222 | +|-----------|--------------|--------------| |
| 223 | +| Debugging | 42% | 37% | |
| 224 | +| Exploration | 44% | 19% | |
| 225 | +| Implementation | 29% | 22% | |
| 226 | + |
| 227 | +The 42% cost savings wasn't the goal — it was a side effect of returning code snippets instead of file paths. |
| 228 | + |
| 229 | +### When it helps (and when it doesn't) |
| 230 | + |
| 231 | +The data revealed something important: **savings scale with task complexity**. |
| 232 | + |
| 233 | +- **Simple lookups** (find a specific function): ~0% savings. Claude's grep is fine. |
| 234 | +- **Conceptual queries** ("how does auth work"): 44% savings. Semantic search shines. |
| 235 | +- **Implementation tasks** (GitHub issues): 29% savings. Context bundling helps. |
| 236 | + |
| 237 | +If your tasks are simple, dev-agent won't help much. If you're doing complex exploration or implementation, it adds up. |
| 238 | + |
| 239 | +## Things that didn't work |
| 240 | + |
| 241 | +### Attempt 1: HTTP API server |
| 242 | + |
| 243 | +I spent half a day building an HTTP server before realizing CLI + MCP was simpler. Lesson: don't add infrastructure you don't need. |
| 244 | + |
| 245 | +### Attempt 2: Automatic PR creation |
| 246 | + |
| 247 | +I built a PR agent that would create PRs automatically. Cut it after day 4. Why? The real problem was context, not automation. I was solving the wrong problem. |
| 248 | + |
| 249 | +### Attempt 3: Complex tool descriptions |
| 250 | + |
| 251 | +My first tool descriptions were paragraphs long. Claude ignored them. Shorter, more prescriptive descriptions worked better: |
| 252 | + |
| 253 | +```typescript |
| 254 | +// Before: vague |
| 255 | +description: "Search the codebase" |
| 256 | + |
| 257 | +// After: prescriptive |
| 258 | +description: "USE THIS FIRST for code exploration. Semantic search finds code by meaning, not just keywords. Better than grep for conceptual queries." |
| 259 | +``` |
| 260 | + |
| 261 | +### Attempt 4: Too many tools too fast |
| 262 | + |
| 263 | +By day 4, I had 9 tools. That was too many to test properly. I should have started with 3 and added incrementally. |
| 264 | + |
| 265 | +## How my workflow changed |
| 266 | + |
| 267 | +Before dev-agent, vibe coding felt like babysitting. I'd describe what I wanted, watch Claude grep around, then correct its assumptions. |
| 268 | + |
| 269 | +Now it feels more like pair programming. Claude finds the right code faster, which means I spend more time on the interesting decisions and less time on "no, look in *that* file." |
| 270 | + |
| 271 | +The biggest change: **I trust Claude's first answer more often.** When it has the right context, it makes fewer mistakes. |
| 272 | + |
| 273 | +## If you're building an MCP server |
| 274 | + |
| 275 | +1. **Start with one tool.** Don't build 9 tools on day 1. |
| 276 | +2. **Return code snippets, not file paths.** This is the biggest win. |
| 277 | +3. **Test with real tasks, not synthetic benchmarks.** I waited until day 9 — that was too late. |
| 278 | +4. **Tool descriptions matter more than you think.** Be prescriptive. |
| 279 | +5. **Measure early.** If I'd measured on day 3, I would have focused on the code-snippet insight sooner. |
| 280 | + |
| 281 | +## What's next |
| 282 | + |
| 283 | +The project is open source: |
| 284 | + |
| 285 | +```bash |
| 286 | +npm install -g dev-agent |
| 287 | +dev index . |
| 288 | +dev mcp install # For Claude Code |
| 289 | +dev mcp install --cursor # For Cursor |
| 290 | +``` |
| 291 | + |
| 292 | +I'm using it daily now. The next milestone (v0.5.0) is generalizing `dev_plan` into `dev_context` — a tool that bundles relevant context for any query, not just GitHub issues. |
| 293 | + |
| 294 | +--- |
| 295 | + |
| 296 | +## The takeaway |
| 297 | + |
| 298 | +Vibe coding works better when your AI tools have better context. Semantic search, code snippets, and context bundling aren't magic — they're just information retrieval done right. |
| 299 | + |
| 300 | +The 42% cost savings is nice, but the real win is **faster iteration**. When Claude finds the right code on the first try, I spend less time correcting it. |
| 301 | + |
| 302 | +If you're building AI tooling, consider: what context is your tool missing? The answer might be simpler than you think. |
| 303 | + |
| 304 | +--- |
| 305 | + |
| 306 | +*Built during a hackathon week in November 2025. [Source code on GitHub](https://github.com/lytics/dev-agent).* |
0 commit comments