Skip to content

Commit 74f77ec

Browse files
author
Marvin Zhang
committed
feat: Add article on AI agent performance and context engineering strategies
1 parent 0783aac commit 74f77ec

File tree

3 files changed

+291
-16
lines changed

3 files changed

+291
-16
lines changed
Lines changed: 274 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,274 @@
1+
---
2+
slug: ai-agent-performance
3+
title: Why Your AI Agent Gets Dumber with Large Specs (And How to Fix It)
4+
authors: [marvin]
5+
tags: [ai, context-engineering, best-practices]
6+
---
7+
8+
Your spec fits in the context window. So why does your AI agent make mistakes, ignore instructions, and produce worse code?
9+
10+
You paste a detailed 2,000-line architecture document into Cursor. The context window can handle it—200K tokens, plenty of room. But something's off. The AI suggests an approach you explicitly ruled out on page 3. It asks questions you already answered. The code it generates contradicts the design decisions you documented.
11+
12+
**The problem isn't context size. It's context quality.**
13+
14+
{/* truncate */}
15+
16+
## The Real Problem: Performance Degradation
17+
18+
Modern AI models have massive context windows—Claude has 200K tokens, GPT has 128K, and newer models are pushing toward 1M+. But here's what the marketing doesn't tell you: **AI performance degrades significantly as context grows**, even when you're nowhere near the limit.
19+
20+
The research is clear:
21+
22+
**Databricks found** that Llama 3.1 405B shows quality degradation starting around 32K tokens—far below its theoretical limit. Smaller models degrade even earlier.
23+
24+
**Berkeley's Function-Calling Leaderboard** confirmed that ALL models perform worse when given more tools or options to choose from. More context = more confusion = lower accuracy.
25+
26+
**Microsoft and Salesforce research** showed a 39% performance drop when models need to gather information across multiple context turns or conflicting sources.
27+
28+
### Why This Happens
29+
30+
It comes down to fundamental constraints:
31+
32+
1. **Attention dilution** - Transformer attention has N² complexity. More tokens = harder to focus on what matters.
33+
34+
2. **Context rot** - With large context, models start ignoring their training and just repeat patterns from the context history. They become less intelligent, not more.
35+
36+
3. **Option overload** - Too many choices (tools, patterns, approaches) leads to wrong selections. This isn't unique to AI—it's a cognitive constraint.
37+
38+
4. **Token economics** - Every extra token costs money and time. A 2,000-line spec costs 6x more to process than a 300-line spec.
39+
40+
### What This Means For You
41+
42+
When you're using AI coding assistants:
43+
44+
- **Cursor, Copilot, Claude** start making basic mistakes they wouldn't make with smaller context
45+
- **Code generation** becomes less accurate and more likely to contradict your requirements
46+
- **Responses slow down** as the model processes more irrelevant information
47+
- **Costs scale up** linearly with context size
48+
- **You spend more time** fixing AI mistakes than you save from AI assistance
49+
50+
The irony: You write detailed specs to help the AI, but the detail makes the AI worse.
51+
52+
## The Solution: Context Engineering
53+
54+
Context engineering is the practice of managing AI working memory to maximize effectiveness. It's not about squeezing into context limits—it's about **maintaining AI performance** at any scale.
55+
56+
Here are four strategies that actually work, backed by research and real-world usage:
57+
58+
### 1. Partitioning - Split and Load Selectively
59+
60+
**What it is**: Break content into focused chunks, load only what's needed for the current task.
61+
62+
**Example**:
63+
```
64+
# Instead of one 1,200-line spec:
65+
specs/dashboard/README.md (200 lines - overview)
66+
specs/dashboard/DESIGN.md (350 lines - architecture)
67+
specs/dashboard/IMPLEMENTATION.md (150 lines - plan)
68+
specs/dashboard/TESTING.md (180 lines - tests)
69+
70+
# AI loads only what it needs
71+
# Working on architecture? Read DESIGN.md only
72+
# Writing tests? Read TESTING.md only
73+
```
74+
75+
**The benefit**: AI processes 200-350 lines instead of 1,200. Faster, more focused, fewer mistakes.
76+
77+
### 2. Compaction - Remove Redundancy
78+
79+
**What it is**: Eliminate duplicate or inferable content.
80+
81+
**Before**:
82+
```markdown
83+
## Authentication
84+
The authentication system uses JWT tokens. JWT tokens are
85+
industry-standard and provide stateless authentication. The
86+
benefit of JWT tokens is that they don't require server-side
87+
session storage...
88+
89+
## Implementation
90+
We'll implement JWT authentication. JWT was chosen because...
91+
[repeats same rationale]
92+
```
93+
94+
**After**:
95+
```markdown
96+
## Authentication
97+
Uses JWT tokens (stateless, no session storage).
98+
99+
## Implementation
100+
[links to Authentication section for rationale]
101+
```
102+
103+
**The benefit**: Higher signal-to-noise ratio. AI focuses on unique information, not repetition.
104+
105+
### 3. Compression - Summarize What's Done
106+
107+
**What it is**: Condense completed work while preserving essential decisions.
108+
109+
**Before**:
110+
```markdown
111+
## Phase 1: Infrastructure Setup
112+
Set up project structure:
113+
- Create src/ directory
114+
- Create tests/ directory
115+
- Configure TypeScript with tsconfig.json
116+
- Set up ESLint with .eslintrc
117+
[50 lines of detailed steps...]
118+
```
119+
120+
**After** (once completed):
121+
```markdown
122+
## ✅ Phase 1: Infrastructure (Completed 2025-10-15)
123+
Project structure established with TypeScript, testing, and CI.
124+
See commit abc123 for details.
125+
```
126+
127+
**The benefit**: Keep project history without bloat. AI knows what happened without drowning in details.
128+
129+
### 4. Isolation - Separate Unrelated Concerns
130+
131+
**What it is**: Move independent features into separate specs with clear relationships.
132+
133+
**Before**: One 1,200-line spec covering dashboard UI, metrics API, health scoring algorithm, and chart library evaluation.
134+
135+
**After**: Four focused specs, each under 400 lines:
136+
- `dashboard-ui` - User interface and interactions
137+
- `metrics-api` - Data endpoint design
138+
- `health-scoring` - Algorithm details
139+
- `chart-evaluation` - Library comparison (can be archived after decision)
140+
141+
**The benefit**: Independent evolution. When the algorithm changes, the UI spec stays untouched.
142+
143+
### The Key Insight
144+
145+
**Keep context dense (high signal), not just small.**
146+
147+
It's not about arbitrary line limits. It's about removing anything that doesn't directly inform the current decision. Every word that doesn't help the AI make better choices is making it worse.
148+
149+
## Real Results from Dogfooding
150+
151+
We built LeanSpec using LeanSpec itself—the ultimate test of whether this methodology actually works.
152+
153+
**The velocity**: 6 days from zero to production
154+
- Full-featured CLI with 15+ commands
155+
- MCP server for Claude Desktop integration
156+
- Documentation site with comprehensive guides
157+
- 54 specs written and implemented with AI agents
158+
159+
**Then we violated our own principles**: Some specs grew to 1,166 lines. We hit the exact problems we were solving:
160+
- AI agents started corrupting specs during edits
161+
- Code generation became less reliable
162+
- Responses slowed down noticeably
163+
- We spent more time fixing mistakes
164+
165+
**We applied context engineering**: Split large specs, removed redundancy, compressed historical sections.
166+
- Largest spec went from 1,166 lines → 378 lines (largest partition)
167+
- AI agents work reliably again
168+
- Faster iterations, accurate output
169+
- Can confidently say: "We practice what we preach"
170+
171+
### Concrete Benefits You'll See
172+
173+
When you apply context engineering to your specs:
174+
175+
**Fewer AI mistakes** - Focused context produces accurate, consistent output
176+
**Faster iterations** - Less processing time per AI request
177+
**Lower costs** - Fewer tokens = cheaper API calls (6x savings on 2,000→300 line reduction)
178+
**Better understanding** - AI actually follows your requirements instead of hallucinating
179+
**Maintainable by humans** - Specs you can read in 5-10 minutes stay in sync with code
180+
181+
### Works With Your Tools
182+
183+
This isn't about a specific AI tool—it's about how all transformer-based models handle context:
184+
185+
- **Cursor** - Reads markdown specs for context
186+
- **GitHub Copilot** - Uses workspace files for suggestions
187+
- **Claude** - Via MCP server integration
188+
- **Aider** - Processes project documentation
189+
- **Windsurf** - Analyzes codebase context
190+
191+
Any AI coding assistant benefits from well-engineered context.
192+
193+
## Getting Started
194+
195+
LeanSpec gives you both the **methodology** and the **tooling** to apply context engineering to your specs.
196+
197+
### The Methodology
198+
199+
Five principles guide decision-making:
200+
201+
1. **Context Economy** - Fit in working memory (human + AI)
202+
2. **Signal-to-Noise** - Every word informs decisions
203+
3. **Progressive Disclosure** - Add structure when needed
204+
4. **Intent Over Implementation** - Capture why, not just how
205+
5. **Bridge the Gap** - Both human and AI understand
206+
207+
These aren't arbitrary rules—they're derived from real constraints (transformer attention, cognitive limits, token costs).
208+
209+
### The Tooling
210+
211+
CLI commands help you detect and fix context issues:
212+
213+
```bash
214+
# Install
215+
npm install -g lean-spec
216+
217+
# Initialize in your project
218+
cd your-project
219+
lean-spec init
220+
221+
# Detect issues
222+
lean-spec validate # Check for problems
223+
lean-spec complexity <spec> # Analyze size/structure
224+
225+
# Fix issues
226+
lean-spec split <spec> # Guided splitting workflow
227+
228+
# Track progress
229+
lean-spec board # Kanban view of all specs
230+
```
231+
232+
### Start Simple, Grow as Needed
233+
234+
**Solo developer?** Just use `status` and `created` fields. Keep specs focused.
235+
236+
**Small team?** Add `tags` and `priority`. Use the CLI for visibility.
237+
238+
**Enterprise?** Add custom fields (`epic`, `sprint`, `assignee`). Integrate with your workflow.
239+
240+
The structure adapts to your needs—you never add complexity "just in case."
241+
242+
### Try It Today
243+
244+
```bash
245+
npm install -g lean-spec
246+
cd your-project
247+
lean-spec init
248+
lean-spec create user-authentication
249+
```
250+
251+
Your AI coding assistant will thank you.
252+
253+
## The Bottom Line
254+
255+
**Your AI tools are only as good as the context you give them.**
256+
257+
A 2,000-line spec that fits in the context window will still produce worse results than a 300-line spec with the same essential information. It's not about limits—it's about performance.
258+
259+
Context engineering isn't optimization. It's fundamental to making AI-assisted development work reliably.
260+
261+
LeanSpec is a context engineering methodology for human-AI collaboration on software specs. It gives you:
262+
- Principles derived from real constraints
263+
- Patterns that scale from solo to enterprise
264+
- Tools that detect and prevent context problems
265+
- Proof from building the tool with the methodology
266+
267+
**The choice**: Keep writing large specs and fighting with unreliable AI output, or engineer your context for the tools you actually use.
268+
269+
---
270+
271+
**Learn more**:
272+
- GitHub: [github.com/codervisor/lean-spec](https://github.com/codervisor/lean-spec)
273+
- Docs: [lean-spec.dev](https://lean-spec.dev)
274+
- Research: [Context Engineering Guide](/docs/guide/context-engineering)

specs/043-official-launch-02/MARKETING.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -122,13 +122,14 @@ Positioning, messaging, channels, and content strategy for v0.2.0 launch.
122122
## Marketing Content Checklist
123123

124124
### Pre-Launch
125-
- [ ] **Branding & Assets**
126-
- [ ] Export logo in all required formats (SVG, PNG sizes)
127-
- [ ] Create favicon files
128-
- [ ] Update docs site with logo
129-
- [ ] Add logo to README.md
130-
- [ ] Create social media assets (Twitter card, OG image, GitHub preview)
131-
- [ ] Write launch blog post
125+
- [x] **Branding & Assets** - ✅ COMPLETE (spec 052)
126+
- [x] Export logo in all required formats (SVG: 4 variants, PNG: 16/32/64/128/256/512)
127+
- [x] Create favicon files (16x16, 32x32, .ico)
128+
- [x] Update docs site with logo (theme-aware variants configured)
129+
- [x] Add logo to README.md (centered with badges)
130+
- [x] Create social media assets (social-card.png, social-github.png)
131+
- [x] Branding guidelines documented (BRANDING.md)
132+
- [x] Write launch blog post - ✅ DRAFTED (blog/2025-11-10-ai-agent-performance.mdx, 275 lines)
132133
- [ ] Create demo video showing principle validation
133134
- [ ] Record GIF demos: spec creation, validation, complexity checks
134135
- [ ] Prepare social media posts
@@ -174,9 +175,9 @@ Positioning, messaging, channels, and content strategy for v0.2.0 launch.
174175
## Community Building
175176

176177
### GitHub Setup
177-
- [ ] Set up GitHub Discussions
178-
- [ ] Create issue templates (bug, feature, question)
179-
- [ ] Enhance CONTRIBUTING.md with first principles guidance
178+
- [ ] Set up GitHub Discussions (needs repo settings access)
179+
- [ ] Create issue templates (bug, feature, question) in `.github/ISSUE_TEMPLATE/`
180+
- [x] Enhanced CONTRIBUTING.md - ✅ EXISTS (needs review for first principles guidance)
180181
- [ ] Create CODE_OF_CONDUCT.md
181182
- [ ] Set up GitHub Actions for community management
182183

@@ -211,10 +212,10 @@ Positioning, messaging, channels, and content strategy for v0.2.0 launch.
211212
- Community engagement (issues, PRs, discussions)
212213

213214
### Tracking Setup
214-
- [ ] Google Analytics for docs site
215-
- [ ] npm download tracking
216-
- [ ] GitHub star notifications
217-
- [ ] Social media mention tracking
215+
- [ ] Google Analytics for docs site (gtag not configured in docusaurus.config.ts)
216+
- [ ] npm download tracking (can use npm-stat.com post-launch)
217+
- [ ] GitHub star notifications (GitHub watch settings)
218+
- [ ] Social media mention tracking (manual or tools like Brand24)
218219
- [ ] Sentiment analysis (manual)
219220

220221
### Success Indicators

specs/043-official-launch-02/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ v1.0.0 (Future) → Feature-complete milestone with enterprise features
119119
**Phase 3: 🟡 READY TO START** - Dogfooding checkpoint next, then launch prep
120120

121121
**Blocking Issue for Launch:**
122-
- ⚠️ **Spec 061 (AI-assisted spec writing)** - Fundamentally changes how we position LeanSpec and "When to Use" docs. Must be resolved before launch messaging.
122+
- **All critical specs complete!** Ready for dogfooding checkpoint and launch prep.
123123

124124
## Dependencies
125125

@@ -137,7 +137,7 @@ v1.0.0 (Future) → Feature-complete milestone with enterprise features
137137
- [x] Spec 024: Pattern-aware list grouping - ✅ COMPLETE
138138
- [x] Spec 044: Spec relationships clarity - ✅ COMPLETE
139139
- [x] Spec 056: Docs site accuracy audit - ✅ COMPLETE
140-
- [ ] **Spec 061: AI-assisted spec writing** - 🔥 CRITICAL (blocks launch positioning)
140+
- [x] Spec 061: AI-assisted spec writing - ✅ COMPLETE
141141
- [ ] Dogfooding checkpoint: Split large specs
142142

143143
**Nice-to-have:**

0 commit comments

Comments
 (0)