Skip to content

Commit a8100a3

Browse files
committed
wip
1 parent 7c0e123 commit a8100a3

15 files changed

+979
-0
lines changed

README.md

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# Advanced Context Engineering for Coding Agents
2+
3+
*Getting AI to solve complex problems in brownfield codebases*
4+
5+
<div align="center">
6+
<a href="https://hlyr.dev/ace">
7+
<img src="https://img.shields.io/badge/YC_Talk-17m-orange" alt="YC Talk"></a>
8+
<a href="https://hlyr.dev/12fa">
9+
<img src="https://img.shields.io/badge/12--factor_agents-context-blue" alt="12-factor agents"></a>
10+
<a href="https://hlyr.dev/he-yt">
11+
<img src="https://img.shields.io/badge/live_coding-BAML_fix-red" alt="Live Coding Session"></a>
12+
</div>
13+
14+
<p></p>
15+
16+
*This guide shares what we learned taking context engineering from agent design to practical coding workflows, why spec-driven development is the future, and how we ship 35k LOC in 7 hours.*
17+
18+
> [!TIP]
19+
> Prefer video? [Watch the YC talk](https://hlyr.dev/ace) this is based on
20+
>
21+
> Want to see it in action? [Watch us fix a bug in 300k LOC Rust codebase](https://hlyr.dev/he-yt)
22+
23+
<img width="1326" height="751" alt="Screenshot 2025-08-29 at 10 55 32 AM" src="https://github.com/user-attachments/assets/06f03232-f9d9-4a92-a182-37056bf877a4" />
24+
25+
Hi, I'm dex. You might remember me from [12-factor agents](https://hlyr.dev/12fa), coining "context engineering," or [the AI Engineer talk](https://www.youtube.com/watch?v=8kMaTybvDUw).
26+
27+
**I've been obsessed** with making AI coding agents actually work in production codebases. Not demos. Not greenfield projects. Real, messy, complex brownfield code.
28+
29+
**I've discovered** that the secret isn't waiting for smarter models. It's being intentional about context management.
30+
31+
**I've shipped** 6 PRs in a day without opening a single non-markdown file in an editor. Our intern shipped 10 PRs on day 8. We fixed complex race conditions in Go and added major features to 300k LOC Rust codebases we'd never seen before.
32+
33+
So, I set out to document:
34+
35+
> ### **How do we engineer context to make AI coding agents solve complex problems in brownfield codebases with zero slop?**
36+
37+
Welcome to Advanced Context Engineering for Coding Agents. Buckle up.
38+
39+
*Special thanks to [@vaibhav](https://github.com/vaibhav), [@sundeep](https://github.com/sundeep), [@geoffreyhuntley](https://github.com/geoffreyhuntley), [@simonfarshid](https://github.com/simonfarshid), [@boundaryml](https://github.com/boundaryml), and everyone who's suffered through early versions of these ideas.*
40+
41+
## The Short Version: The Core Concepts
42+
43+
Even as models [plateau in capability](content/01-from-12factor-to-context-engineering.md#maybe-someday-when-models-are-smarter), there are engineering techniques that make AI coding dramatically more reliable, scalable, and maintainable.
44+
45+
- [From 12-Factor Agents to Context Engineering for Coding](content/01-from-12factor-to-context-engineering.md)
46+
- [The Stanford Study & Sean Grove's Revelation](content/02-stanford-study-and-specs.md)
47+
- [Our Weird Journey to Spec-Driven Development](content/03-our-weird-journey.md)
48+
- [The Naive Way: Chat Until You Apologize](content/04-the-naive-way.md)
49+
- [Intentional Compaction: Your First Power Move](content/05-intentional-compaction.md)
50+
- [What Exactly Are We Compacting?](content/06-what-are-we-compacting.md)
51+
- [Why Obsess Over Context?](content/07-why-obsess-over-context.md)
52+
- [Subagents: Context Control, Not Role Play](content/08-subagents-context-control.md)
53+
- [Frequent Intentional Compaction: The Game Changer](content/09-frequent-intentional-compaction.md)
54+
- [Research, Plan, Implement: The Three-Step Dance](content/10-research-plan-implement.md)
55+
- [Real World: Fixing BAML in 300k LOC](content/11-real-world-baml.md)
56+
- [Human Leverage: Where to Focus Your Attention](content/12-human-leverage.md)
57+
- [Code Review in the Age of AI](content/13-code-review-mental-alignment.md)
58+
- [What's Coming: The Post-IDE World](content/14-whats-coming.md)
59+
60+
## Why This Matters
61+
62+
The general vibe on AI coding for hard stuff tends to be:
63+
64+
> Maybe someday, when models are smarter…
65+
66+
Meanwhile, teams using these techniques are:
67+
- Shipping 2000-line PRs of complex systems code
68+
- Fixing bugs in codebases they've never seen
69+
- Maintaining mental alignment while AI writes 99% of code
70+
- Spending $12k/month on Opus and loving it
71+
72+
## The Problem We're Solving
73+
74+
Current AI coding tools have fundamental issues:
75+
76+
- **"Too much slop"** - Generated code that technically works but creates tech debt
77+
- **"Doesn't work in big repos"** - Context windows explode, agents get lost
78+
- **"Doesn't work for complex systems"** - Race conditions, distributed systems, etc.
79+
- **"Tech debt factory"** - Rework outweighs productivity gains
80+
81+
## The Solution: Advanced Context Engineering
82+
83+
### What We Achieved
84+
85+
-**Works in Brownfield Codebases** - 300k LOC Rust, complex Go systems
86+
-**Solves Complex Problems** - Race conditions, WASM support, cancellation
87+
-**No Slop** - PRs merged by maintainers who didn't know it was AI
88+
-**Maintains Mental Alignment** - Team stays in sync despite 10x velocity
89+
90+
### The Core Insight
91+
92+
<img width="1320" height="235" alt="Context equation" src="https://github.com/user-attachments/assets/a6ea98a6-665b-48af-983b-a1cb2c45e44c" />
93+
94+
> **Context Window Quality = (Correctness × Completeness) / Noise**
95+
96+
At any given point, a coding agent turn is a stateless function call. Context in, next step out. The ONLY lever you have is context quality.
97+
98+
## The Three-Step Workflow
99+
100+
### 1. Research
101+
Understand the codebase, find relevant files, trace information flow. [See our research prompt](https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/research_codebase.md).
102+
103+
### 2. Plan
104+
Outline exact steps, files to edit, testing approach. [See our planning prompt](https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/create_plan.md).
105+
106+
### 3. Implement
107+
Execute the plan phase by phase, compact progress back into plan. [See our implementation prompt](https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/implement_plan.md).
108+
109+
## Key Principles
110+
111+
### 1. Frequent Intentional Compaction
112+
Keep context utilization at 40-60%. Design your ENTIRE workflow around context management.
113+
114+
### 2. Human Leverage at the Right Layer
115+
- Bad line of code = bad line of code
116+
- Bad line of plan = hundreds of bad lines
117+
- Bad line of research = thousands
118+
119+
<img width="1331" height="745" alt="Human leverage pyramid" src="https://github.com/user-attachments/assets/305d3716-cb5c-4c1d-bb2b-bc035b35540b" />
120+
121+
### 3. Subagents Are About Context, Not Roles
122+
[Don't play house](https://x.com/dexhorthy/status/1950288431122436597). Use subagents for context isolation and compaction.
123+
124+
### 4. This Is Not Magic
125+
You MUST engage deeply. There's no magic prompt. This makes your performance better by building high-leverage human review into the pipeline.
126+
127+
## Real Results
128+
129+
- **Our team**: 3 engineers averaging $12k/month on Opus
130+
- **Intern**: 2 PRs on day 1, 10 PRs on day 8
131+
- **BAML bug**: Fixed in 300k LOC Rust, PR merged same day
132+
- **Complex features**: 35k LOC shipped in 7 hours (cancellation + WASM)
133+
134+
## Getting Started
135+
136+
### For Individual Developers
137+
Start with intentional compaction. When context fills up, write progress to a markdown file and start fresh.
138+
139+
### For Teams
140+
1. Adopt research/plan/implement workflow
141+
2. Review plans, not just code
142+
3. Use specs as source of truth
143+
4. Build shared context through markdown artifacts
144+
145+
### For Leaders
146+
The hard part isn't the tech—it's the transformation. Everything about collaboration changes when AI writes 99% of code. If you don't figure this out, you'll get lapped by someone who did.
147+
148+
## What's Next
149+
150+
We're building tools to make this easier. Join the waitlist for CodeLayer, our "Superhuman for Claude Code": [https://hlyr.dev/code](https://hlyr.dev/code)
151+
152+
For engineering leaders ready to 10x productivity: we're forward-deploying to help teams make the culture/process/tech shift.
153+
154+
## Related Resources
155+
156+
- [12-Factor Agents](https://hlyr.dev/12fa) - The foundation for context engineering
157+
- [AI That Works](https://github.com/ai-that-works/ai-that-works) - Weekly live coding sessions
158+
- [Ralph Wiggum Technique](https://ghuntley.com/ralph/) - Hilariously simple context management
159+
- [Sean Grove: Specs Are The New Code](https://www.youtube.com/watch?v=8rABwKRsec4)
160+
- [Stanford Study on AI Developer Productivity](https://www.youtube.com/watch?v=tbDDYKRFjhk)
161+
- [BAML](https://github.com/boundaryml/baml) - Where we tested these techniques
162+
163+
## Contributors
164+
165+
This guide exists because smart people shared their workflows, challenged assumptions, and pushed boundaries.
166+
167+
Thanks to everyone making AI coding actually work in production.
168+
169+
---
170+
171+
*Remember: The name of the game is ~170k context window. Use it wisely.*
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
[← Back to README](../README.md)
2+
3+
## From 12-Factor Agents to Context Engineering for Coding
4+
5+
You may remember me from April's [12-factor agents](https://hlyr.dev/12fa) post, as the coiner of the term "context engineering" or from the [AI Engineer talk on the topic](https://www.youtube.com/watch?v=8kMaTybvDUw).
6+
7+
Since then, we've been deep in the trenches figuring out how to make AI coding agents actually work in production. Not demos. Not greenfield projects. Real, messy, complex brownfield code.
8+
9+
### The Two Talks That Changed Everything
10+
11+
I have 2 favorite talks from AI Engineer 2025 (incidentally, the only two AIE talks with [more views than 12-factor agents](https://www.youtube.com/@aiDotEngineer/videos)):
12+
13+
1. **[Sean Grove's "Specs are the new code"](https://www.youtube.com/watch?v=8rABwKRsec4)** - The idea that chatting with an AI for hours then throwing away prompts while committing only code is like compiling a JAR and checking in the binary while throwing away the source.
14+
15+
2. **[The Stanford study on AI's impact on developer productivity](https://www.youtube.com/watch?v=tbDDYKRFjhk)** - Analyzed commits from 100k developers and found AI tools often lead to rework, diminishing gains.
16+
17+
### The Problem Everyone's Hitting
18+
19+
<img width="1326" height="751" alt="Stanford study results" src="https://github.com/user-attachments/assets/06f03232-f9d9-4a92-a182-37056bf877a4" />
20+
21+
This matched what I heard from founders everywhere:
22+
23+
* "Too much slop."
24+
* "Tech debt factory."
25+
* "Doesn't work in big repos."
26+
* "Doesn't work for complex systems."
27+
28+
The general vibe on AI coding for hard stuff tends to be:
29+
30+
> Maybe someday, when models are smarter…
31+
32+
Even [Amjad](https://x.com/amasad) was on [Lenny's podcast](https://www.lennysnewsletter.com/p/behind-the-product-replit-amjad-masad) talking about how PMs use Replit agent to prototype, then hand off to engineers for production.
33+
34+
### The Context Engineering Answer
35+
36+
Whenever I hear "Maybe someday when the models are smart" I leap to exclaim: **that's what context engineering is all about** - getting the most out of *today's* models.
37+
38+
While 12-factor agents focused on building reliable agent systems, we discovered something even more powerful: applying these principles to how we USE coding agents.
39+
40+
The transformation from "maybe someday" to "shipping 35k LOC in 7 hours" isn't about waiting for GPT-5. It's about being intentional with context.
41+
42+
[← Back to README](../README.md) | [Stanford Study & Specs →](02-stanford-study-and-specs.md)
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
[← Back to README](../README.md)
2+
3+
## The Stanford Study & Sean Grove's Revelation
4+
5+
### Sean Grove: "Specs Are The New Code"
6+
7+
Sean argued that we're all *vibe coding wrong*.
8+
9+
Think about it: You chat with an AI agent for two hours, carefully specifying what you want, iterating on the approach, refining the solution... then you throw away all those prompts and commit only the final code.
10+
11+
That's like a Java developer compiling a JAR and checking in the compiled binary while throwing away the source.
12+
13+
> In two years, you'll be opening Python files in your IDE with about the same frequency that, today, you might open up a hex editor to read assembly (which, for most of us, is never).
14+
15+
Sean proposes that in the AI future, **the specs will become the real code**. The implementation is just a build artifact.
16+
17+
### The Stanford Study: Why AI Tools Fail
18+
19+
Yegor's team analyzed commits from 100,000 developers and found sobering results:
20+
21+
1. **AI tools often lead to massive rework** - The perceived productivity gains are often illusory when you factor in the fixes and refactoring needed later.
22+
23+
2. **AI tools work well for greenfield, fail for brownfield** - Starting fresh? Great. Working in an existing codebase? Good luck.
24+
25+
The data shows what we all feel: current approaches to AI coding aren't cutting it for real work.
26+
27+
### The Gap We Need to Bridge
28+
29+
What teams need:
30+
* AI that works well in brownfield codebases
31+
* AI that solves complex problems
32+
* No slop
33+
* Mental alignment across the team
34+
35+
What they're getting:
36+
* Demos that work in toy repos
37+
* Solutions that create more problems
38+
* Tech debt factories
39+
* Teams losing track of their own codebases
40+
41+
The gap isn't going to be bridged by waiting for smarter models. It's going to be bridged by being smarter about how we use today's models.
42+
43+
[← From 12-Factor to Context Engineering](01-from-12factor-to-context-engineering.md) | [Our Weird Journey →](03-our-weird-journey.md)

content/03-our-weird-journey.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
[← Back to README](../README.md)
2+
3+
## Our Weird Journey to Spec-Driven Development
4+
5+
### The 2000-Line PR Problem
6+
7+
I was working with one of the most productive AI coders I've ever met. Every few days they'd drop **2000-line Go PRs**.
8+
9+
And this wasn't a NextJS app or a CRUD API. This was complex, [race-prone systems code](https://github.com/humanlayer/humanlayer/blob/main/hld/daemon/daemon_subscription_integration_test.go#L45) that:
10+
- Managed JSON RPC over Unix sockets
11+
- Handled streaming stdio from forked Unix processes
12+
- Dealt with Claude Code SDK process management
13+
- Had all the fun concurrency bugs you'd expect
14+
15+
### The Breaking Point
16+
17+
The idea of carefully reading 2,000 lines of complex Go code every few days was simply not sustainable.
18+
19+
I had two choices:
20+
1. Slow everything down
21+
2. Fundamentally change how we work
22+
23+
We had no choice but to adopt **spec-driven development**.
24+
25+
### The Uncomfortable Transformation
26+
27+
It took about 8 weeks. It was incredibly uncomfortable for everyone involved, especially me.
28+
29+
I had to learn to:
30+
- Let go of reading every line of PR code
31+
- Trust specs as the source of truth
32+
- Focus my attention on tests and interfaces
33+
- Review plans instead of implementations
34+
35+
### The Results
36+
37+
But now we're flying:
38+
- I shipped 6 PRs in a day
39+
- Our intern shipped 10 PRs on day 8
40+
- I can count on one hand the number of times I've opened a non-markdown file in an editor in the last two months
41+
42+
The transformation wasn't just about productivity. It was about sustainability. We went from drowning in code reviews to surfing on specifications.
43+
44+
### What This Means
45+
46+
When you can't possibly review all the code being produced, you have to move up a level of abstraction. You have to review the plans, the specs, the research.
47+
48+
This isn't giving up on code quality. It's recognizing that code quality comes from clear thinking, and clear thinking is best reviewed at the specification level.
49+
50+
[← Stanford Study & Specs](02-stanford-study-and-specs.md) | [The Naive Way →](04-the-naive-way.md)

content/04-the-naive-way.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
[← Back to README](../README.md)
2+
3+
## The Naive Way: Chat Until You Apologize
4+
5+
Most of us start by using a coding agent like a chatbot. You talk (or shout) back and forth with it, vibing your way through a problem until you either:
6+
- Run out of context
7+
- Give up
8+
- The agent starts apologizing
9+
10+
<img width="1328" height="741" alt="Naive chat approach" src="https://github.com/user-attachments/assets/51a46854-c542-4515-afbb-a2fe26970809" />
11+
12+
### The Slightly Smarter Restart
13+
14+
A slightly smarter approach is to just start over when you get off track. Discard your session and start fresh, perhaps with more steering:
15+
16+
> [original prompt], but make sure not to use XYZ approach, that won't work
17+
18+
<img width="1331" height="744" alt="Restart with steering" src="https://github.com/user-attachments/assets/c96f9b42-0801-428a-b366-af871d1f97af" />
19+
20+
### Why This Doesn't Scale
21+
22+
The naive approach has fundamental problems:
23+
24+
1. **Context Pollution** - Every failed attempt, every "oops", every correction stays in context
25+
2. **Trajectory Lock-in** - Once the agent starts down a path, it's hard to redirect
26+
3. **No Learning Transfer** - When you restart, you lose everything you learned
27+
4. **Cognitive Load** - You're simultaneously debugging the problem AND managing the agent
28+
29+
The chat paradigm is great for exploration but terrible for execution. It's like using a REPL in production.
30+
31+
### The Missing Piece
32+
33+
What's missing is intentionality. When you're just chatting, you're not being deliberate about:
34+
- What information goes into context
35+
- When to preserve vs discard context
36+
- How to structure information for reuse
37+
- Where human review adds the most value
38+
39+
The solution isn't to chat better. It's to stop chatting and start engineering.
40+
41+
[← Our Weird Journey](03-our-weird-journey.md) | [Intentional Compaction →](05-intentional-compaction.md)

0 commit comments

Comments
 (0)