You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ace-fca.md
+28-26Lines changed: 28 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,7 @@ I still read the tests pretty carefully, but the specs became our source of trut
63
63
The transformation took about 8 weeks.
64
64
It was incredibly uncomfortable for everyone involved, not least of all for me.
65
65
But now we're flying. A few weeks back, I shipped 6 PRs in a day.
66
-
I can count on one hand the number of times I've opened a non-markdown file in an editor in the last two months.
66
+
I can count on one hand the number of times I've edited a non-markdown file by hand in the last two months.
67
67
68
68
## Advanced Context Engineering for Coding Agents
69
69
@@ -91,14 +91,14 @@ Most of us start by using a coding agent like a chatbot. You talk (or shout) bac
91
91
92
92
A slightly smarter way is to just start over when you get off track, discarding your session and starting a new one, perhaps with a little more steering in the prompt.
93
93
94
-
> [original prompt], but make sure not to use XYZ approach, that won't work
94
+
> [original prompt], but make sure you use XYZ approach, because ABC approach won't work
95
95
96
96
<imgwidth="1331"height="744"alt="Screenshot 2025-08-29 at 11 08 55 AM"src="https://github.com/user-attachments/assets/c96f9b42-0801-428a-b366-af871d1f97af" />
97
97
98
98
99
99
### Slightly Smarter: Intentional Compaction
100
100
101
-
You have probably done something I've come to call "intentional compaction". Whether you're on track or not, as your context starts to fill up, you probably want to pause your work and start over with a fresh context window. To do this, you might drop a prompt like
101
+
You have probably done something I've come to call "intentional compaction". Whether you're on track or not, as your context starts to fill up, you probably want to pause your work and start over with a fresh context window. To do this, you might use a prompt like
102
102
103
103
> "Write everything we did so far to progress.md, ensure to note the end goal, the approach we're taking, the steps we've done so far, and the current failure we're working on"
104
104
@@ -230,13 +230,13 @@ I will skip this part for brevity but feel free to launch a claude session in [h
230
230
231
231
### Putting this into practice
232
232
233
-
I do a [weekly live-coding session](https://github.com/ai-that-works/ai-that-works) with @vaibhav where we whiteboard and code up a solution to an advanced AI Engineering problem. It's one of the highlights of my week.
233
+
I do a [weekly live-coding session](https://github.com/ai-that-works/ai-that-works) with [@vaibhav](https://www.linkedin.com/in/vaigup/) where we whiteboard and code up a solution to an advanced AI Engineering problem. It's one of the highlights of my week.
234
234
235
-
A few weeks ago, I [decided to share some more about the process](https://hlyr.dev/he-gh), curious if our in-house techniques could one-shot a fix to a 300k LOC Rust codebase for BAML, a programming language for working with LLMs. I picked out [an (admittedly small-ish) bug](https://github.com/BoundaryML/baml/issues/1252) from the @BoundaryML repo and got to work.
235
+
Several weeks ago, I [decided to share some more about the process](https://hlyr.dev/he-gh), curious if our in-house techniques could one-shot a fix to a 300k LOC Rust codebase for BAML, a programming language for working with LLMs. I picked out [an (admittedly small-ish) bug](https://github.com/BoundaryML/baml/issues/1252) from the @BoundaryML repo and got to work.
236
236
237
237
You can [watch the episode](https://hlyr.dev/he-yt) to learn more about the process, but to outline it:
238
238
239
-
Worth noting: I am at best an amateur Rust dev, and I have never worked in the BAML codebase before.
239
+
**Worth noting**: I am at best an amateur Rust dev, and I have never worked in the BAML codebase before.
240
240
241
241
#### The research
242
242
@@ -247,13 +247,13 @@ Worth noting: I am at best an amateur Rust dev, and I have never worked in the B
247
247
#### The plans
248
248
249
249
- While the research was running, I got impatient and kicked off a plan, with no research, to see if claude could go straight to an implementation plan - [you can see it here](https://github.com/ai-that-works/ai-that-works/blob/main/2025-08-05-advanced-context-engineering-for-coding-agents/thoughts/shared/plans/fix-assert-syntax-validation-no-research.md)
250
-
- When the research was done, I kicked off another implementation plan - [you can see it here](https://github.com/ai-that-works/ai-that-works/blob/main/2025-08-05-advanced-context-engineering-for-coding-agents/thoughts/shared/plans/baml-test-assertion-validation-with-research.md)
250
+
- When the research was done, I kicked off another implementation plan that used the research results - [you can see it here](https://github.com/ai-that-works/ai-that-works/blob/main/2025-08-05-advanced-context-engineering-for-coding-agents/thoughts/shared/plans/baml-test-assertion-validation-with-research.md)
251
251
252
-
The plans are both fairly short, but they differ significantly. They fix the issue in different ways, and have different testing approaches. Without going too much into detail, they both "would have worked" but the one built with research fixed the problem in the *best* place and prescribed testing more in line with the codebase conventions.
252
+
The plans are both fairly short, but they differ significantly. They fix the issue in different ways, and have different testing approaches. Without going too much into detail, they both "would have worked" but the one built with research fixed the problem in the *best* place and prescribed testing that was in line with the codebase conventions.
253
253
254
254
#### The implementation
255
255
256
-
- This was all happening the night before the podcast recording. I ran both plans in parallel and submitted both as PRs before heading to bed.
256
+
- This was all happening the night before the podcast recording. I ran both plans in parallel and submitted both as PRs before signing off for the night.
257
257
258
258
By the time we were on the show at 10am PT the next day, [the PR from the plan with the research was already approved by @aaron](https://github.com/BoundaryML/baml/pull/2259#issuecomment-3155883849), who didn't even know I was doing a bit for a podcast 🙂. We [closed the other one](https://github.com/BoundaryML/baml/pull/2258/files).
259
259
@@ -266,13 +266,14 @@ So out of our original 4 goals, we hit:
266
266
267
267
### Solving complex problems
268
268
269
-
Vaibhav was still skeptical, and I wanted to see if we could solve a more complex problem.
270
-
So a few weeks later, the two of us spent 7 hours (3 hours on research/plans, 4 hours on implementation) and shipped 35k LOC to add cancellation and wasm support to BAML.
271
-
The [cancelation PR just got merged last week](https://github.com/BoundaryML/baml/pull/2357).
269
+
Vaibhav was still skeptical, and I wanted to see if we could solve a more complex problem.
272
270
273
-
✅ So we can solve complex problems too.
271
+
So a few weeks later, the two of us spent 7 hours (3 hours on research/plans, 4 hours on implementation) and shipped 35k LOC to add cancellation and wasm support to BAML.
272
+
The [cancelation PR just got merged last week](https://github.com/BoundaryML/baml/pull/2357). [The WASM one is still open](https://github.com/BoundaryML/baml/pull/2330), but has a working demo of calling the wasm-compiled rust runtime from a JS app in the browser.
274
273
274
+
While the cancelation PR required a little more love to take things over the line, we got incredible progress in just a day. Vaibhav estimated that each of these PRs would have been 3-5 days of work for a senior engineer on the BAML team to complete.
275
275
276
+
✅ So we can solve complex problems too.
276
277
277
278
278
279
### This is not Magic
@@ -281,24 +282,23 @@ Remember that part in the example where I read the research and threw it out cau
281
282
282
283
There's a certain type of person who is always looking for the one magic prompt that will solve all their problems. It doesn't exist.
283
284
284
-
Frequent Intentional Compaction via a research/plan/implement flow will make your performance better, but what makes it good is that you build high-leverage human review into your pipeline.
285
+
Frequent Intentional Compaction via a research/plan/implement flow will make your performance **better**, but what makes it **good enough for hard problems** is that you build high-leverage human review into your pipeline.
285
286
286
287
<imgwidth="1331"height="748"alt="Screenshot 2025-08-29 at 11 16 08 AM"src="https://github.com/user-attachments/assets/f12a10e2-7ffe-44c5-9d9a-b6e42ff5251e" />
287
288
288
289
### Eggs on Faces
289
290
290
-
Last week, @blakesmith and I sat down for 7 hours and [tried to remove hadoop dependencies from parquet java]() - the deep dive on everything that went wrong and my theories as to why, I'll save for another post, suffice it to say that it did not go well.
291
-
292
-
There are big hard problems you cannot just prompt your way through in 7 hours, and I'm still curiously and excitedly hacking on pushing the boundaries
291
+
A few weeks back, [@blakesmith](https://www.linkedin.com/in/bhsmith/) and I sat down for 7 hours and [tried to remove hadoop dependencies from parquet java](https://github.com/dexhorthy/parquet-java/blob/remove-hadoop/thoughts/shared/plans/remove-hadoop-dependencies.md) - the deep dive on everything that went wrong and my theories as to why, I'll save for another post, suffice it to say that it did not go well. The tl;dr is that the research steps didn't go deep enough through the dependency tree, and assumed classes could be moved upstream without introducing deeply nested hadoop dependencies.
293
292
293
+
There are big hard problems you cannot just prompt your way through in 7 hours, and we're still curiously and excitedly hacking on pushing the boundaries with friends and partners. I think the other learning here is that you probably need at least one person who is an expert in the codebase, and for this case, that was neither of us.
294
294
295
295
### On Human Leverage
296
296
297
297
If there's one thing you take away from all this, let it be this:
298
298
299
299
A bad line of code is… a bad line of code.
300
-
But a bad line of **plan**= hundreds of bad lines of code.
301
-
And a bad line of **research** = thousands.
300
+
But a bad line of a **plan**could lead to hundreds of bad lines of code.
301
+
And a bad line of **research**, a misunderstanding of how the codebase works or where certain functionality is located, could land you with thousands of bad lines of code.
302
302
303
303
<imgwidth="1333"height="746"alt="Screenshot 2025-08-29 at 11 17 00 AM"src="https://github.com/user-attachments/assets/54a09c99-b177-41b2-a43d-04d6b94bc56e" />
304
304
@@ -307,7 +307,7 @@ So you want to **focus human effort and attention** on the HIGHEST LEVERAGE part
307
307
<imgwidth="1331"height="745"alt="Screenshot 2025-08-29 at 11 17 13 AM"src="https://github.com/user-attachments/assets/305d3716-cb5c-4c1d-bb2b-bc035b35540b" />
308
308
309
309
310
-
When you review the research and the plans, you get more leverage than you do when you review the code. (By the way, one of our primary focuses @ [humanlayer](https://hlyr.dev/code) is helping teams build and leverage high-quality workflow prompts, and in creating great collaboration workflows for ai-generated code and specs).
310
+
When you review the research and the plans, you get more leverage than you do when you review the code. (By the way, one of our primary focuses @ [humanlayer](https://hlyr.dev/code) is helping teams build and leverage high-quality workflow prompts and crafting great collaboration workflows for ai-generated code and specs).
311
311
312
312
### What is code review for?
313
313
@@ -317,7 +317,8 @@ I prefer [Blake Smith's framing in Code Review Essentials for Software Teams](ht
Remember those 2k line golang PRs? I cared about them being correct and well designed, but the biggest source of internal unrest and frustration on the team was the lack of mental alignment. I was starting to lose touch with what our product was and how it worked.
320
+
Remember those 2k line golang PRs? I cared about them being correct and well designed, but the biggest source of internal unrest and frustration on the team was the lack of mental alignment. **I was starting to lose touch with what our product was and how it worked.**
321
+
321
322
I would expect that anyone who's worked with a very productive AI coder has had this experience.
322
323
323
324
This is actually the most important part of research/plan/implement to us.
@@ -345,10 +346,11 @@ Basically we got everything we needed.
345
346
346
347
(oh, and yeah, our team of three is averaging about $12k on opus per month)
347
348
348
-
So you don't think I'm just another [hyped up mustachio'd sales guy](https://www.youtube.com/watch?v=IS_y40zY-hc&lc=UgzFldRM6LU5unLuFn54AaABAg.AMKlTmJAT5ZAMKrOOAMw3I), I'll note that this does not work perfectly for every problem.
349
-
In August the whole team spent 2 weeks spinning circles on a really tricky race condition that spiraled into a rabbit hole of issues with MCP sHTTP keepalives in golang and a whole bunch of other race-y nonsense.
349
+
So you don't think I'm just another [hyped up mustachio'd sales guy](https://www.youtube.com/watch?v=IS_y40zY-hc&lc=UgzFldRM6LU5unLuFn54AaABAg.AMKlTmJAT5ZAMKrOOAMw3I), I'll note that this does not work perfectly for every problem (we'll be back for another round sound, parquet-java).
350
+
351
+
In August the whole team spent 2 weeks spinning circles on a really tricky race condition that spiraled into a rabbit hole of issues with MCP sHTTP keepalives in golang and a whole bunch of other dead ends.
350
352
351
-
But that's the exception now. In general, this works well for us. Our intern shipped 2 PRs on his first day, and 10 on his 8th day. I was genuinely skeptical that it would work for anyone else, but me and Vaibhav shipped 35k LOC of working BAML code in 7 hours. (And if you haven't met Vaibhav, he's one of the most meticulous engineers I know when it comes to code design and quality)
353
+
But that's the exception now. In general, this works well for us. Our intern shipped 2 PRs on his first day, and 10 on his 8th day. I was genuinely skeptical that it would work for anyone else, but me and Vaibhav shipped 35k LOC of working BAML code in 7 hours. (And if you haven't met Vaibhav, he's one of the most meticulous engineers I know when it comes to code design and quality.)
352
354
353
355
### What's coming
354
356
@@ -368,10 +370,10 @@ Today, we're launching CodeLayer, our new "post-IDE IDE" in private beta - think
368
370
369
371
## For Engineering Leaders
370
372
371
-
If you or someone you know is an engineering leader that wants to 10x their team's productivity with AI, we're forward-deploying with everyone from 6-person startups to 1000-employee public companies to help teams make the culture/process/tech shift to transition to the ai-first coding world.
373
+
If you or someone you know is an engineering leader that wants to 10x their team's productivity with AI, we're forward-deploying with ~10-25 person eng orgs to help teams make the culture/process/tech shift needed to transition to the ai-first coding world.
372
374
373
375
### Thanks
374
376
375
-
- Thanks to all the founders who've listened through early ramble-y versions of this post @adamsusskin@joshpurtell@andrewchurchill, many many more
377
+
- Thanks to all the founders who've listened through early ramble-y versions of this post - Adam, Josh, Andrew, and many many more
376
378
- Thanks Sundeep for weathering this wacky storm
377
379
- Thanks Allison, Geoff, and Gerred for dragging us kicking and screaming into the future
0 commit comments