Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions website/content/_meta.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
export default {
index: 'Home',
docs: 'Documentation',
blog: 'Blog',
};
306 changes: 306 additions & 0 deletions website/content/blog/10-days-of-vibe-coding.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
---
title: "10 days of vibe coding: what I learned building an MCP server"
date: 2025-11-29
description: "How a hackathon project turned into 42% cost savings for AI-assisted development"
---

import { Callout, Steps, Tabs, FileTree } from 'nextra/components'

# 10 days of vibe coding: what I learned building an MCP server

<Callout type="info">
This is the story of building [dev-agent](https://github.com/lytics/dev-agent), an MCP server that gives AI tools semantic code search. What started as a hackathon exploration turned into measurable improvements in my daily workflow.
</Callout>

I was watching Claude read the same file for the third time in a row. It had already found the answer — it just didn't know it.

That's when I decided to spend a week figuring out why.

## What "vibe coding" actually means to me

Vibe coding isn't about letting AI write everything. It's about:

1. Describing intent at a high level
2. Letting AI handle the boilerplate
3. Focusing my attention on the hard parts

The problem? AI can't handle the boilerplate if it doesn't understand the codebase. I was spending more time correcting Claude's assumptions than writing code myself.

## The problem: grep chains

Here's what a typical Claude Code session looked like before I built dev-agent:

```
Task: "Where is rate limiting implemented?"

Claude's approach:
1. grep "rate limit" → 23 matches across 8 files
2. Read packages/mcp-server/src/server/rate-limiter.ts (180 lines)
3. grep "token bucket" → 12 matches
4. Read packages/mcp-server/src/server/index.ts (340 lines)
5. Read packages/core/src/utils/retry.ts (95 lines)
6. ... 5 more file reads

Total: 18 tool calls, 10 files read, ~18,000 input tokens
Time: 45 seconds
```

The answer was in lines 45-62 of the first file. Claude read 10 files to find it.

## The premise

I set aside a week to explore this. My question: **Can I make Claude Code understand my codebase better?**

I started with a `PLAN.md` and a monorepo scaffold. The goal wasn't to build a product — it was to learn how AI tools explore codebases and whether I could improve that experience.

## Day 1-2: The foundation

The first two days were about building the core: a repository scanner and vector storage.

### Why local-first mattered

I wanted embeddings stored locally, not sent to a cloud service. My code stays on my machine. This led me to:

- **LanceDB** for vector storage (embedded, no server)
- **Transformers.js** for embeddings (runs locally, no API calls)
- **ts-morph** for TypeScript parsing (extracts functions, classes, relationships)

```typescript
// What the scanner extracts
interface Component {
name: string;
type: 'function' | 'class' | 'interface';
filePath: string;
startLine: number;
endLine: number;
imports: string[];
exports: string[];
}
```

By day 2, I had a working CLI:

```bash
dev index . # Index the repository
dev search "auth" # Semantic search
```

The scanner hit 94% test coverage on day 1. Not because I'm obsessive about coverage, but because testing edge cases revealed bugs in how I was parsing TypeScript.

## Day 3-4: The subagent architecture

I got ambitious. What if I had specialized agents for different tasks?

- **Explorer** — Find similar code, trace relationships
- **Planner** — Analyze GitHub issues, break them into tasks
- **GitHub agent** — Index issues/PRs for semantic search

By day 4, I had 557 tests passing. The subagent coordinator could route messages between agents, share context, and handle graceful shutdown.

### The decision: context provision, not automation

I originally planned a "PR agent" that would create pull requests automatically. I cut it.

Why? I realized the real value was in **context provision** — giving AI tools better information to work with. Automation can come later. First, solve the information problem.

## Day 5-6: MCP integration

This is where things got interesting.

### Why MCP over HTTP API

My original plan was an HTTP API server. But MCP (Model Context Protocol) was a better fit:

- Works natively with Claude Code and Cursor
- No server management — just a CLI command
- Stdio transport is simple and reliable

```bash
# One command to integrate with Claude Code
dev mcp install
```

### The "aha" moment

When I first got semantic search working in Claude Code, I noticed something unexpected. Claude was making **fewer file reads**.

Before: Claude would grep, find file paths, then read entire files.

After: My search returned **code snippets**, not just file paths. Claude could see the relevant code without reading the file.

```typescript
// What dev_search returns
// packages/mcp-server/src/server/rate-limiter.ts (score: 0.92)
// Lines 45-62

export class TokenBucketRateLimiter implements RateLimiter {
private tokens: number;
private lastRefill: number;

constructor(private config: RateLimitConfig) {
this.tokens = config.bucketSize;
this.lastRefill = Date.now();
}

async consume(): Promise<boolean> {
this.refill();
if (this.tokens > 0) {
this.tokens--;
return true;
}
return false;
}
}
```

This was the insight that would later show up in benchmarks: **99% fewer input tokens** because Claude doesn't need to read entire files.

## Day 7-8: Richer context

With the foundation working, I added more tools:

- **dev_refs** — Find who calls a function and what it calls
- **dev_map** — Codebase structure with component counts
- **dev_history** — Semantic search over git commits

The git history integration was particularly useful. Claude can now search commits by meaning:

```bash
dev_history query="authentication refactor"
# Returns commits about auth, even if they don't use that exact word
```

### Unified indexing

I consolidated everything into one command:

```bash
dev index .
# Indexes: code → git history → GitHub issues/PRs
```

One command, three types of context. This became important for the `dev_plan` tool, which bundles all three into a single response.

## Day 9-10: Measuring it

I'm an engineer. I had to measure it.

I ran the same tasks with and without dev-agent, tracking time, cost, tool calls, and result quality.

### One real example

**Task:** "Where is rate limiting implemented and how does it work?"

<Tabs items={['Without dev-agent', 'With dev-agent']}>
<Tabs.Tab>
```
Tool calls: 18
Files read: 10
Input tokens: ~18,000
Time: 45 seconds

Approach: grep → read → grep → read → grep...
```
</Tabs.Tab>
<Tabs.Tab>
```
Tool calls: 3
Files read: 2
Input tokens: ~1,200
Time: 28 seconds

Approach: dev_search → read 2 files for full context
```
</Tabs.Tab>
</Tabs>

Same answer. **93% fewer input tokens.**

### The results across task types

| Task Type | Cost Savings | Time Savings |
|-----------|--------------|--------------|
| Debugging | 42% | 37% |
| Exploration | 44% | 19% |
| Implementation | 29% | 22% |

The 42% cost savings wasn't the goal — it was a side effect of returning code snippets instead of file paths.

### When it helps (and when it doesn't)

The data revealed something important: **savings scale with task complexity**.

- **Simple lookups** (find a specific function): ~0% savings. Claude's grep is fine.
- **Conceptual queries** ("how does auth work"): 44% savings. Semantic search shines.
- **Implementation tasks** (GitHub issues): 29% savings. Context bundling helps.

If your tasks are simple, dev-agent won't help much. If you're doing complex exploration or implementation, it adds up.

## Things that didn't work

### Attempt 1: HTTP API server

I spent half a day building an HTTP server before realizing CLI + MCP was simpler. Lesson: don't add infrastructure you don't need.

### Attempt 2: Automatic PR creation

I built a PR agent that would create PRs automatically. Cut it after day 4. Why? The real problem was context, not automation. I was solving the wrong problem.

### Attempt 3: Complex tool descriptions

My first tool descriptions were paragraphs long. Claude ignored them. Shorter, more prescriptive descriptions worked better:

```typescript
// Before: vague
description: "Search the codebase"

// After: prescriptive
description: "USE THIS FIRST for code exploration. Semantic search finds code by meaning, not just keywords. Better than grep for conceptual queries."
```

### Attempt 4: Too many tools too fast

By day 4, I had 9 tools. That was too many to test properly. I should have started with 3 and added incrementally.

## How my workflow changed

Before dev-agent, vibe coding felt like babysitting. I'd describe what I wanted, watch Claude grep around, then correct its assumptions.

Now it feels more like pair programming. Claude finds the right code faster, which means I spend more time on the interesting decisions and less time on "no, look in *that* file."

The biggest change: **I trust Claude's first answer more often.** When it has the right context, it makes fewer mistakes.

## If you're building an MCP server

1. **Start with one tool.** Don't build 9 tools on day 1.
2. **Return code snippets, not file paths.** This is the biggest win.
3. **Test with real tasks, not synthetic benchmarks.** I waited until day 9 — that was too late.
4. **Tool descriptions matter more than you think.** Be prescriptive.
5. **Measure early.** If I'd measured on day 3, I would have focused on the code-snippet insight sooner.

## What's next

The project is open source:

```bash
npm install -g dev-agent
dev index .
dev mcp install # For Claude Code
dev mcp install --cursor # For Cursor
```

I'm using it daily now. The next milestone (v0.5.0) is generalizing `dev_plan` into `dev_context` — a tool that bundles relevant context for any query, not just GitHub issues.

---

## The takeaway

Vibe coding works better when your AI tools have better context. Semantic search, code snippets, and context bundling aren't magic — they're just information retrieval done right.

The 42% cost savings is nice, but the real win is **faster iteration**. When Claude finds the right code on the first try, I spend less time correcting it.

If you're building AI tooling, consider: what context is your tool missing? The answer might be simpler than you think.

---

*Built during a hackathon week in November 2025. [Source code on GitHub](https://github.com/lytics/dev-agent).*
4 changes: 4 additions & 0 deletions website/content/blog/_meta.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
export default {
index: 'Blog',
'10-days-of-vibe-coding': '10 days of vibe coding',
};
14 changes: 14 additions & 0 deletions website/content/blog/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
title: Blog
---

# Blog

Notes from building dev-agent — an MCP server for semantic code search.

---

## Latest Posts

- **[10 days of vibe coding: what I learned building an MCP server](/blog/10-days-of-vibe-coding)** — How a hackathon project turned into 42% cost savings for AI-assisted development.