humanlayer
diff --git a/‎README.md‎
Lines changed: 171 additions & 0 deletions b/‎README.md‎
Lines changed: 171 additions & 0 deletions
diff --git a/‎content/01-from-12factor-to-context-engineering.md‎
Lines changed: 42 additions & 0 deletions b/‎content/01-from-12factor-to-context-engineering.md‎
Lines changed: 42 additions & 0 deletions
diff --git a/‎content/02-stanford-study-and-specs.md‎
Lines changed: 43 additions & 0 deletions b/‎content/02-stanford-study-and-specs.md‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎content/03-our-weird-journey.md‎
Lines changed: 50 additions & 0 deletions b/‎content/03-our-weird-journey.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎content/04-the-naive-way.md‎
Lines changed: 41 additions & 0 deletions b/‎content/04-the-naive-way.md‎
Lines changed: 41 additions & 0 deletions
@@ -0,0 +1,171 @@
+# Advanced Context Engineering for Coding Agents
+
+*Getting AI to solve complex problems in brownfield codebases*
+
+<div align="center">
+<a href="https://hlyr.dev/ace">
+    <img src="https://img.shields.io/badge/YC_Talk-17m-orange" alt="YC Talk"></a>
+<a href="https://hlyr.dev/12fa">
+    <img src="https://img.shields.io/badge/12--factor_agents-context-blue" alt="12-factor agents"></a>
+<a href="https://hlyr.dev/he-yt">
+    <img src="https://img.shields.io/badge/live_coding-BAML_fix-red" alt="Live Coding Session"></a>
+</div>
+
+<p></p>
+
+*This guide shares what we learned taking context engineering from agent design to practical coding workflows, why spec-driven development is the future, and how we ship 35k LOC in 7 hours.*
+
+> [!TIP]
+> Prefer video? [Watch the YC talk](https://hlyr.dev/ace) this is based on
+>
+> Want to see it in action? [Watch us fix a bug in 300k LOC Rust codebase](https://hlyr.dev/he-yt)
+
+<img width="1326" height="751" alt="Screenshot 2025-08-29 at 10 55 32 AM" src="https://github.com/user-attachments/assets/06f03232-f9d9-4a92-a182-37056bf877a4" />
+
+Hi, I'm dex. You might remember me from [12-factor agents](https://hlyr.dev/12fa), coining "context engineering," or [the AI Engineer talk](https://www.youtube.com/watch?v=8kMaTybvDUw).
+
+**I've been obsessed** with making AI coding agents actually work in production codebases. Not demos. Not greenfield projects. Real, messy, complex brownfield code.
+
+**I've discovered** that the secret isn't waiting for smarter models. It's being intentional about context management.
+
+**I've shipped** 6 PRs in a day without opening a single non-markdown file in an editor. Our intern shipped 10 PRs on day 8. We fixed complex race conditions in Go and added major features to 300k LOC Rust codebases we'd never seen before.
+
+So, I set out to document:
+
+> ### **How do we engineer context to make AI coding agents solve complex problems in brownfield codebases with zero slop?**
+
+Welcome to Advanced Context Engineering for Coding Agents. Buckle up.
+
+*Special thanks to [@vaibhav](https://github.com/vaibhav), [@sundeep](https://github.com/sundeep), [@geoffreyhuntley](https://github.com/geoffreyhuntley), [@simonfarshid](https://github.com/simonfarshid), [@boundaryml](https://github.com/boundaryml), and everyone who's suffered through early versions of these ideas.*
+
+## The Short Version: The Core Concepts
+
+Even as models [plateau in capability](content/01-from-12factor-to-context-engineering.md#maybe-someday-when-models-are-smarter), there are engineering techniques that make AI coding dramatically more reliable, scalable, and maintainable.
+
+- [From 12-Factor Agents to Context Engineering for Coding](content/01-from-12factor-to-context-engineering.md)
+- [The Stanford Study & Sean Grove's Revelation](content/02-stanford-study-and-specs.md)
+- [Our Weird Journey to Spec-Driven Development](content/03-our-weird-journey.md)
+- [The Naive Way: Chat Until You Apologize](content/04-the-naive-way.md)
+- [Intentional Compaction: Your First Power Move](content/05-intentional-compaction.md)
+- [What Exactly Are We Compacting?](content/06-what-are-we-compacting.md)
+- [Why Obsess Over Context?](content/07-why-obsess-over-context.md)
+- [Subagents: Context Control, Not Role Play](content/08-subagents-context-control.md)
+- [Frequent Intentional Compaction: The Game Changer](content/09-frequent-intentional-compaction.md)
+- [Research, Plan, Implement: The Three-Step Dance](content/10-research-plan-implement.md)
+- [Real World: Fixing BAML in 300k LOC](content/11-real-world-baml.md)
+- [Human Leverage: Where to Focus Your Attention](content/12-human-leverage.md)
+- [Code Review in the Age of AI](content/13-code-review-mental-alignment.md)
+- [What's Coming: The Post-IDE World](content/14-whats-coming.md)
+
+## Why This Matters
+
+The general vibe on AI coding for hard stuff tends to be:
+
+> Maybe someday, when models are smarter…
+
+Meanwhile, teams using these techniques are:
+- Shipping 2000-line PRs of complex systems code
+- Fixing bugs in codebases they've never seen
+- Maintaining mental alignment while AI writes 99% of code
+- Spending $12k/month on Opus and loving it
+
+## The Problem We're Solving
+
+Current AI coding tools have fundamental issues:
+
+- **"Too much slop"** - Generated code that technically works but creates tech debt
+- **"Doesn't work in big repos"** - Context windows explode, agents get lost
+- **"Doesn't work for complex systems"** - Race conditions, distributed systems, etc.
+- **"Tech debt factory"** - Rework outweighs productivity gains
+
+## The Solution: Advanced Context Engineering
+
+### What We Achieved
+
+- ✅ **Works in Brownfield Codebases** - 300k LOC Rust, complex Go systems
+- ✅ **Solves Complex Problems** - Race conditions, WASM support, cancellation
+- ✅ **No Slop** - PRs merged by maintainers who didn't know it was AI
+- ✅ **Maintains Mental Alignment** - Team stays in sync despite 10x velocity
+
+### The Core Insight
+
+<img width="1320" height="235" alt="Context equation" src="https://github.com/user-attachments/assets/a6ea98a6-665b-48af-983b-a1cb2c45e44c" />
+
+> **Context Window Quality = (Correctness × Completeness) / Noise**
+
+At any given point, a coding agent turn is a stateless function call. Context in, next step out. The ONLY lever you have is context quality.
+
+## The Three-Step Workflow
+
+### 1. Research
+Understand the codebase, find relevant files, trace information flow. [See our research prompt](https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/research_codebase.md).
+
+### 2. Plan
+Outline exact steps, files to edit, testing approach. [See our planning prompt](https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/create_plan.md).
+
+### 3. Implement
+Execute the plan phase by phase, compact progress back into plan. [See our implementation prompt](https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/implement_plan.md).
+
+## Key Principles
+
+### 1. Frequent Intentional Compaction
+Keep context utilization at 40-60%. Design your ENTIRE workflow around context management.
+
+### 2. Human Leverage at the Right Layer
+- Bad line of code = bad line of code
+- Bad line of plan = hundreds of bad lines
+- Bad line of research = thousands
+
+<img width="1331" height="745" alt="Human leverage pyramid" src="https://github.com/user-attachments/assets/305d3716-cb5c-4c1d-bb2b-bc035b35540b" />
+
+### 3. Subagents Are About Context, Not Roles
+[Don't play house](https://x.com/dexhorthy/status/1950288431122436597). Use subagents for context isolation and compaction.
+
+### 4. This Is Not Magic
+You MUST engage deeply. There's no magic prompt. This makes your performance better by building high-leverage human review into the pipeline.
+
+## Real Results
+
+- **Our team**: 3 engineers averaging $12k/month on Opus
+- **Intern**: 2 PRs on day 1, 10 PRs on day 8
+- **BAML bug**: Fixed in 300k LOC Rust, PR merged same day
+- **Complex features**: 35k LOC shipped in 7 hours (cancellation + WASM)
+
+## Getting Started
+
+### For Individual Developers
+Start with intentional compaction. When context fills up, write progress to a markdown file and start fresh.
+
+### For Teams
+1. Adopt research/plan/implement workflow
+2. Review plans, not just code
+3. Use specs as source of truth
+4. Build shared context through markdown artifacts
+
+### For Leaders
+The hard part isn't the tech—it's the transformation. Everything about collaboration changes when AI writes 99% of code. If you don't figure this out, you'll get lapped by someone who did.
+
+## What's Next
+
+We're building tools to make this easier. Join the waitlist for CodeLayer, our "Superhuman for Claude Code": [https://hlyr.dev/code](https://hlyr.dev/code)
+
+For engineering leaders ready to 10x productivity: we're forward-deploying to help teams make the culture/process/tech shift.
+
+## Related Resources
+
+- [12-Factor Agents](https://hlyr.dev/12fa) - The foundation for context engineering
+- [AI That Works](https://github.com/ai-that-works/ai-that-works) - Weekly live coding sessions
+- [Ralph Wiggum Technique](https://ghuntley.com/ralph/) - Hilariously simple context management
+- [Sean Grove: Specs Are The New Code](https://www.youtube.com/watch?v=8rABwKRsec4)
+- [Stanford Study on AI Developer Productivity](https://www.youtube.com/watch?v=tbDDYKRFjhk)
+- [BAML](https://github.com/boundaryml/baml) - Where we tested these techniques
+
+## Contributors
+
+This guide exists because smart people shared their workflows, challenged assumptions, and pushed boundaries.
+
+Thanks to everyone making AI coding actually work in production.
+
+---
+
+*Remember: The name of the game is ~170k context window. Use it wisely.*
@@ -0,0 +1,42 @@
+[← Back to README](../README.md)
+
+## From 12-Factor Agents to Context Engineering for Coding
+
+You may remember me from April's [12-factor agents](https://hlyr.dev/12fa) post, as the coiner of the term "context engineering" or from the [AI Engineer talk on the topic](https://www.youtube.com/watch?v=8kMaTybvDUw).
+
+Since then, we've been deep in the trenches figuring out how to make AI coding agents actually work in production. Not demos. Not greenfield projects. Real, messy, complex brownfield code.
+
+### The Two Talks That Changed Everything
+
+I have 2 favorite talks from AI Engineer 2025 (incidentally, the only two AIE talks with [more views than 12-factor agents](https://www.youtube.com/@aiDotEngineer/videos)):
+
+1. **[Sean Grove's "Specs are the new code"](https://www.youtube.com/watch?v=8rABwKRsec4)** - The idea that chatting with an AI for hours then throwing away prompts while committing only code is like compiling a JAR and checking in the binary while throwing away the source.
+
+2. **[The Stanford study on AI's impact on developer productivity](https://www.youtube.com/watch?v=tbDDYKRFjhk)** - Analyzed commits from 100k developers and found AI tools often lead to rework, diminishing gains.
+
+### The Problem Everyone's Hitting
+
+<img width="1326" height="751" alt="Stanford study results" src="https://github.com/user-attachments/assets/06f03232-f9d9-4a92-a182-37056bf877a4" />
+
+This matched what I heard from founders everywhere:
+
+* "Too much slop."
+* "Tech debt factory."
+* "Doesn't work in big repos."
+* "Doesn't work for complex systems."
+
+The general vibe on AI coding for hard stuff tends to be:
+
+> Maybe someday, when models are smarter…
+
+Even [Amjad](https://x.com/amasad) was on [Lenny's podcast](https://www.lennysnewsletter.com/p/behind-the-product-replit-amjad-masad) talking about how PMs use Replit agent to prototype, then hand off to engineers for production.
+
+### The Context Engineering Answer
+
+Whenever I hear "Maybe someday when the models are smart" I leap to exclaim: **that's what context engineering is all about** - getting the most out of *today's* models.
+
+While 12-factor agents focused on building reliable agent systems, we discovered something even more powerful: applying these principles to how we USE coding agents.
+
+The transformation from "maybe someday" to "shipping 35k LOC in 7 hours" isn't about waiting for GPT-5. It's about being intentional with context.
+
+[← Back to README](../README.md) | [Stanford Study & Specs →](02-stanford-study-and-specs.md)
@@ -0,0 +1,43 @@
+[← Back to README](../README.md)
+
+## The Stanford Study & Sean Grove's Revelation
+
+### Sean Grove: "Specs Are The New Code"
+
+Sean argued that we're all *vibe coding wrong*. 
+
+Think about it: You chat with an AI agent for two hours, carefully specifying what you want, iterating on the approach, refining the solution... then you throw away all those prompts and commit only the final code.
+
+That's like a Java developer compiling a JAR and checking in the compiled binary while throwing away the source.
+
+> In two years, you'll be opening Python files in your IDE with about the same frequency that, today, you might open up a hex editor to read assembly (which, for most of us, is never).
+
+Sean proposes that in the AI future, **the specs will become the real code**. The implementation is just a build artifact.
+
+### The Stanford Study: Why AI Tools Fail
+
+Yegor's team analyzed commits from 100,000 developers and found sobering results:
+
+1. **AI tools often lead to massive rework** - The perceived productivity gains are often illusory when you factor in the fixes and refactoring needed later.
+
+2. **AI tools work well for greenfield, fail for brownfield** - Starting fresh? Great. Working in an existing codebase? Good luck.
+
+The data shows what we all feel: current approaches to AI coding aren't cutting it for real work.
+
+### The Gap We Need to Bridge
+
+What teams need:
+* AI that works well in brownfield codebases
+* AI that solves complex problems
+* No slop
+* Mental alignment across the team
+
+What they're getting:
+* Demos that work in toy repos
+* Solutions that create more problems
+* Tech debt factories
+* Teams losing track of their own codebases
+
+The gap isn't going to be bridged by waiting for smarter models. It's going to be bridged by being smarter about how we use today's models.
+
+[← From 12-Factor to Context Engineering](01-from-12factor-to-context-engineering.md) | [Our Weird Journey →](03-our-weird-journey.md)
@@ -0,0 +1,50 @@
+[← Back to README](../README.md)
+
+## Our Weird Journey to Spec-Driven Development
+
+### The 2000-Line PR Problem
+
+I was working with one of the most productive AI coders I've ever met. Every few days they'd drop **2000-line Go PRs**.
+
+And this wasn't a NextJS app or a CRUD API. This was complex, [race-prone systems code](https://github.com/humanlayer/humanlayer/blob/main/hld/daemon/daemon_subscription_integration_test.go#L45) that:
+- Managed JSON RPC over Unix sockets
+- Handled streaming stdio from forked Unix processes
+- Dealt with Claude Code SDK process management
+- Had all the fun concurrency bugs you'd expect
+
+### The Breaking Point
+
+The idea of carefully reading 2,000 lines of complex Go code every few days was simply not sustainable.
+
+I had two choices:
+1. Slow everything down
+2. Fundamentally change how we work
+
+We had no choice but to adopt **spec-driven development**.
+
+### The Uncomfortable Transformation
+
+It took about 8 weeks. It was incredibly uncomfortable for everyone involved, especially me.
+
+I had to learn to:
+- Let go of reading every line of PR code
+- Trust specs as the source of truth
+- Focus my attention on tests and interfaces
+- Review plans instead of implementations
+
+### The Results
+
+But now we're flying:
+- I shipped 6 PRs in a day
+- Our intern shipped 10 PRs on day 8
+- I can count on one hand the number of times I've opened a non-markdown file in an editor in the last two months
+
+The transformation wasn't just about productivity. It was about sustainability. We went from drowning in code reviews to surfing on specifications.
+
+### What This Means
+
+When you can't possibly review all the code being produced, you have to move up a level of abstraction. You have to review the plans, the specs, the research.
+
+This isn't giving up on code quality. It's recognizing that code quality comes from clear thinking, and clear thinking is best reviewed at the specification level.
+
+[← Stanford Study & Specs](02-stanford-study-and-specs.md) | [The Naive Way →](04-the-naive-way.md)
@@ -0,0 +1,41 @@
+[← Back to README](../README.md)
+
+## The Naive Way: Chat Until You Apologize
+
+Most of us start by using a coding agent like a chatbot. You talk (or shout) back and forth with it, vibing your way through a problem until you either:
+- Run out of context
+- Give up
+- The agent starts apologizing
+
+<img width="1328" height="741" alt="Naive chat approach" src="https://github.com/user-attachments/assets/51a46854-c542-4515-afbb-a2fe26970809" />
+
+### The Slightly Smarter Restart
+
+A slightly smarter approach is to just start over when you get off track. Discard your session and start fresh, perhaps with more steering:
+
+> [original prompt], but make sure not to use XYZ approach, that won't work
+
+<img width="1331" height="744" alt="Restart with steering" src="https://github.com/user-attachments/assets/c96f9b42-0801-428a-b366-af871d1f97af" />
+
+### Why This Doesn't Scale
+
+The naive approach has fundamental problems:
+
+1. **Context Pollution** - Every failed attempt, every "oops", every correction stays in context
+2. **Trajectory Lock-in** - Once the agent starts down a path, it's hard to redirect
+3. **No Learning Transfer** - When you restart, you lose everything you learned
+4. **Cognitive Load** - You're simultaneously debugging the problem AND managing the agent
+
+The chat paradigm is great for exploration but terrible for execution. It's like using a REPL in production.
+
+### The Missing Piece
+
+What's missing is intentionality. When you're just chatting, you're not being deliberate about:
+- What information goes into context
+- When to preserve vs discard context
+- How to structure information for reuse
+- Where human review adds the most value
+
+The solution isn't to chat better. It's to stop chatting and start engineering.
+
+[← Our Weird Journey](03-our-weird-journey.md) | [Intentional Compaction →](05-intentional-compaction.md)