You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/blog/act-via-code.mdx
+56-53Lines changed: 56 additions & 53 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,85 +5,88 @@ iconType: "solid"
5
5
description: "The path to advanced code manipulation agents"
6
6
---
7
7
8
-
The future of AI-powered software development isn't just about understanding code—it's about manipulating it effectively. As AI models become increasingly sophisticated in comprehending codebases, we're discovering that the real bottleneck isn't their "intelligence," but rather their ability to make precise, reliable changes to code. This is where the concept of "acting via code" becomes crucial.
8
+
<Framecaption="Voyager (Jim Fan)">
9
+
<imgsrc="/images/nether-portal.png" />
10
+
</Frame>
9
11
10
-
## The Current Landscape
11
12
12
-
Today's AI coding assistants typically operate through:
13
+
# Act via Code
13
14
14
-
- Generating complete code snippets
15
-
- Suggesting text-based changes
16
-
- Producing diffs for review
15
+
Two and a half years since the launch of the GPT-3 API, code assistants have emerged as the most powerful and practically useful applications of LLMs. The rapid adoption of AI-powered IDEs and prototype builders isn't surprising — code is structured, deterministic, and rich with patterns, making it an ideal domain for machine learning. As model capabilities continue to scale, we're seeing compounding improvements in code understanding and generation.
17
16
18
-
While these approaches work for simple tasks, they break down when dealing with complex, multi-file changes or large-scale refactors. The fundamental issue? They're trying to manipulate code as text, rather than as the structured data it really is.
17
+
Yet there's a striking gap between what AI agents can understand and what they can actually do. While they can reason about complex architectural changes, debug intricate issues, and propose sophisticated refactors, they often can't execute these ideas. The ceiling isn't intelligence or context—it's the ability to manipulate code at scale. Large-scale modifications remain unreliable or impossible, not because agents don't understand what to do, but because they lack the right interfaces to do it.
19
18
20
-
## Why Acting via Code Matters
19
+
The bottleneck isn't intelligence — it's tooling. By giving AI models the ability to write and execute code that modifies code, we're about to unlock an entire class of tasks that agents already understand but can't yet perform. Code execution environments represent the most expressive tool we could offer an agent—enabling composition, abstraction, and systematic manipulation of complex systems. When paired with ever-improving language models, this will unlock another step function improvement in AI capabilities.
21
20
22
-
Acting via code means providing AI agents with programmatic interfaces to manipulate codebases. Instead of generating text patches, agents can express transformations through code itself. This approach offers several key advantages:
21
+
## Beating Minecraft with Code Execution
23
22
24
-
1.**Precision and Reliability**
23
+
In mid-2023, a research project called [Voyager](https://voyager.minedojo.org) made waves: it effectively solved Minecraft, performing several multiples better than the prior SOTA on many important dimensions. This was a massive breakthrough — previous reinforcement learning systems had struggled for years with even basic Minecraft tasks.
25
24
26
-
- Changes are expressed through well-defined operations
27
-
- Dependencies and references are handled automatically
28
-
- Transformations are composable and reusable
25
+
While the AI community was focused on scaling intelligence, Voyager demonstrated something more fundamental: the right tools can unlock entirely new tiers of capability. The same GPT-4 model that struggled with Minecraft using traditional frameworks achieved remarkable results when allowed to write and execute code. This wasn't about raw intelligence—it was about giving the agent a more expressive way to act.
29
26
30
-
2.**Scale and Consistency**
27
+
<Frame>
28
+
<imgsrc="/images/voyager-performance.png" />
29
+
</Frame>
31
30
32
-
- Changes can be applied across massive codebases
33
-
- Transformations remain consistent across files
34
-
- Complex refactors become tractable
31
+
The breakthrough came from a simple yet powerful insight: let the AI write code. Instead of limiting the agent to primitive "tools," Voyager allowed GPT-4 to write and execute [JS programs](https://github.com/MineDojo/Voyager/tree/main/skill_library/trial2/skill/code) that controlled Minecraft actions through a clean API:
35
32
36
-
3.**Verifiability and Safety**
37
-
- Changes are expressed in a reviewable format
38
-
- Transformations can be tested before application
bot.chat("Already have 3 spruce logs in inventory.");
43
+
}
44
+
}
45
+
```
40
46
41
-
## Building Blocks for AI Agents
47
+
This approach transformed the agent's capabilities. Rather than being constrained to atomic actions like `equipItem(...)`, it could create higher-level operations like [`craftShieldWithFurnace()`](https://github.com/MineDojo/Voyager/blob/main/skill_library/trial2/skill/code/craftShieldWithFurnace.js) through composing JS APIs. The system also implemented a memory mechanism, storing successful programs for reuse in similar situations—effectively building its own library of proven solutions it could later refer to and adapt to similar circumstances.
42
48
43
-
For AI agents to effectively manipulate code, they need:
49
+
<Frame>
50
+
<imgsrc="/images/voyager-retrieval.png" />
51
+
</Frame>
44
52
45
-
1.**A Natural Mental Model**
53
+
As the Voyager authors noted:
46
54
47
-
- Operations that match how developers think about code changes
48
-
- High-level abstractions for common patterns
49
-
- Clear semantics for transformations
55
+
<Tip>*"We opt to use code as the action space instead of low-level motor commands because programs can naturally represent temporally extended and compositional actions, which are essential for many long-horizon tasks in Minecraft."*</Tip>
50
56
51
-
2.**Composable Primitives**
57
+
## Code is an Ideal Action Space
52
58
53
-
- Basic operations that can be combined
54
-
- Tools for building higher-level abstractions
55
-
- Ways to express complex transformations
59
+
The implications of code as an action space extend far beyond gaming. Code provides a uniquely powerful interface between AI and real-world systems. When an agent writes code, it gains several critical advantages over traditional atomic tools.
56
60
57
-
3.**Rich Static Analysis**
58
-
- Understanding of dependencies and references
59
-
- Analysis of control flow and types
60
-
- Knowledge of cross-file relationships
61
+
First, code enforces correctness through syntax and type systems. Second, it enables effective retrieval and composition—AI models excel at understanding, adapting, and combining existing code patterns. Third, code execution provides immediate, objective feedback through errors and outputs. Finally, and perhaps most importantly, code is inherently composable—any tool can be wrapped in a function and used as a building block for more complex operations.
61
62
62
-
## The Path Forward
63
+
Programs are also a natural medium of interaction between humans and agents. Code explicitly encodes reasoning in a human-readable format, making the agent's actions transparent and reviewable. There's no magic—just deterministic program execution that can be debugged, modified, and improved. In this paradigm, the agent becomes a sophisticated program search mechanism, exploring the space of possible solutions while maintaining the reliability of traditional software.
63
64
64
-
As we move toward more advanced AI coding agents, the ability to act via code becomes increasingly critical. Future AI systems will need to:
65
+
## For Software Engineering
65
66
66
-
1.**Build Their Own Tools**
67
+
This brings us to software engineering, where we see a massive gap between AI's theoretical capabilities and practical achievements. Many code modification tasks are fundamentally programmatic—dependency analysis, refactors, control flow analysis—yet we lack the tools to express them properly.
67
68
68
-
- Create custom abstractions for common patterns
69
-
- Develop specialized transformation utilities
70
-
- Maintain their own libraries of operations
69
+
Consider how a developer thinks about refactoring: it's rarely about direct text manipulation. Instead, we think in terms of high-level operations: "move this function," "rename this variable everywhere," "split this module." These operations can be encoded into a powerful Python API:
71
70
72
-
2.**Reason About Changes**
71
+
```python
72
+
# simple access to high-level code constructs
73
+
for component in codebase.jsx_components:
74
+
# access detailed code structure and relations
75
+
iflen(component.usages) ==0:
76
+
# powerful edit APIs that handle edge cases
77
+
component.rename(component.name +'Page')
78
+
```
73
79
74
-
- Understand the impact of transformations
75
-
- Plan complex refactoring operations
76
-
- Verify the correctness of changes
80
+
This isn't just another code manipulation library—it's a scriptable language server that builds on proven foundations like LSP and codemods, but designed specifically for programmatic analysis and refactoring.
77
81
78
-
3.**Learn From Experience**
79
-
- Improve transformation strategies over time
80
-
- Develop better abstractions through use
81
-
- Share knowledge across different contexts
82
+
## What does this look like?
82
83
83
-
## Conclusion
84
+
At Codegen, we've built exactly this system. Our approach centers on four key principles:
84
85
85
-
The path to advanced AI coding agents isn't just about better language models—it's about giving those models the right tools to manipulate code effectively. By enabling AI to "act via code," we create a foundation for more sophisticated, reliable, and scalable code transformation capabilities.
86
+
The foundation must be Python, enabling easy composition with existing tools and workflows. Operations must be in-memory for performance, handling large-scale changes efficiently. The system must be open source, allowing developers and AI researchers to extend and enhance it. And perhaps most importantly, it must be thoroughly documented—not just for humans, but for the next generation of AI agents that will build upon it.
86
87
87
-
Just as self-driving cars need sophisticated controls to navigate the physical world, AI coding agents need powerful, precise interfaces to manipulate codebases. This programmatic approach creates a shared language that both humans and AI can use to express, verify, and apply code changes reliably at scale.
88
+
## What does this enable?
88
89
89
-
The future of AI-powered development lies not in generating better patches or diffs, but in enabling AI to work with code the way developers do: through code itself.
90
+
We've already used this approach to merge hundreds of thousands of lines of code in enterprise codebases. Our tools have automated complex tasks like feature flag deletion, test suite reorganization, import cycle elimination, and dead code removal. But more importantly, we've proven that code-as-action-space isn't just theoretical—it's a practical approach to scaling software engineering.
91
+
92
+
This is just the beginning. With Codegen, we're providing the foundation for the next generation of code manipulation tools—built for both human developers and AI agents. We believe this approach will fundamentally change how we think about and implement large-scale code changes, making previously impossible tasks not just possible, but routine.
0 commit comments