Skip to content

Commit d1067be

Browse files
committed
Add Codex Agent cookbook documentation
- Add new Cookbooks section with Codex Agent guide - Include complete example for local and hub modes - Fix shell tool root user check and error handling
1 parent a7de04d commit d1067be

File tree

4 files changed

+755
-4
lines changed

4 files changed

+755
-4
lines changed

docs/cookbooks/codex-coding.mdx

Lines changed: 368 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,368 @@
1+
---
2+
title: "Codex Agent"
3+
description: "Build coding agents with OpenAI's native shell and apply_patch tools"
4+
icon: "code"
5+
---
6+
7+
HUD provides native support for OpenAI's coding tools (`shell` and `apply_patch`), enabling you to build powerful coding agents that can create, modify, and execute code.
8+
9+
<Card
10+
title="Example Code"
11+
icon="github"
12+
href="https://github.com/hud-evals/hud-python/blob/main/examples/06_codex_coding_agent.py"
13+
>
14+
Follow along with the full working example on GitHub.
15+
</Card>
16+
17+
## Overview
18+
19+
OpenAI's Responses API includes specialized tools for coding tasks:
20+
21+
| Tool | Purpose | HUD Implementation |
22+
| ------------- | ------------------------------------------------------ | -------------------------------------- |
23+
| `shell` | Execute shell commands in a persistent bash session | `hud.tools.shell.ShellTool` |
24+
| `apply_patch` | Create, update, and delete files using V4A diff format | `hud.tools.apply_patch.ApplyPatchTool` |
25+
26+
When you register tools named `shell` or `apply_patch` in your environment, the `OpenAIAgent` automatically converts them to OpenAI's native tool types for optimal performance.
27+
28+
## Two Modes
29+
30+
HUD supports two execution modes for coding agents:
31+
32+
| Mode | Tools Run On | Inference Via | API Keys Required |
33+
| --------------------- | ------------ | --------------- | ----------------- |
34+
| **Local** (`--local`) | Your machine | OpenAI directly | `OPENAI_API_KEY` |
35+
| **Hub** (default) | HUD Cloud | HUD Gateway | `HUD_API_KEY` |
36+
37+
Both modes support traces on hud.ai if `HUD_API_KEY` is set.
38+
39+
## Quick Start
40+
41+
### Local Mode (No Docker)
42+
43+
Run coding agents directly on your machine without any infrastructure:
44+
45+
```python
46+
import hud
47+
from hud.agents.openai import OpenAIAgent
48+
from hud.tools.shell import ShellTool
49+
from hud.tools.apply_patch import ApplyPatchTool
50+
51+
# Create environment with coding tools
52+
env = hud.Environment("coding")
53+
shell_tool = ShellTool()
54+
apply_patch_tool = ApplyPatchTool(base_path="/path/to/workspace")
55+
56+
@env.tool()
57+
async def shell(commands: list[str], timeout_ms: int | None = None):
58+
"""Execute shell commands."""
59+
result = await shell_tool(commands=commands, timeout_ms=timeout_ms)
60+
return result.to_dict()
61+
62+
@env.tool()
63+
async def apply_patch(type: str, path: str, diff: str | None = None):
64+
"""Apply file patches."""
65+
result = await apply_patch_tool(type=type, path=path, diff=diff)
66+
return result.to_dict()
67+
68+
# Run with OpenAI agent (calls OpenAI directly)
69+
agent = OpenAIAgent.create(model="gpt-5.1")
70+
71+
async with hud.eval(env(), name="coding-task") as ctx:
72+
result = await agent.run(ctx, max_steps=20)
73+
```
74+
75+
### Hub Mode (Cloud Execution)
76+
77+
Connect to HUD Hub for full cloud execution and telemetry:
78+
79+
```python
80+
import hud
81+
from hud.agents.openai import OpenAIAgent
82+
from hud.settings import settings
83+
from openai import AsyncOpenAI
84+
85+
# Connect to HUD Hub environment
86+
env = hud.Environment()
87+
env.connect_hub("codex_sandbox_environment")
88+
89+
# Use HUD Gateway for inference (full telemetry)
90+
model_client = AsyncOpenAI(
91+
base_url=settings.hud_gateway_url,
92+
api_key=settings.api_key,
93+
)
94+
agent = OpenAIAgent.create(
95+
model="gpt-5.1",
96+
model_client=model_client,
97+
validate_api_key=False,
98+
)
99+
100+
async with hud.eval(env(), name="coding-task") as ctx:
101+
result = await agent.run(ctx, max_steps=20)
102+
```
103+
104+
<Note>
105+
The first request may take a few seconds while the environment spins up in the
106+
cloud. Subsequent requests will be faster.
107+
</Note>
108+
109+
## Tool Specifications
110+
111+
### Shell Tool
112+
113+
The `ShellTool` provides a persistent bash session for executing commands.
114+
115+
**Features:**
116+
117+
- Auto-restart on error (session automatically restarts if needed)
118+
- Dynamic timeout via `timeout_ms` parameter
119+
- Persistent environment (exported variables, working directory)
120+
- Concurrent command execution support
121+
122+
**Input Schema:**
123+
124+
```python
125+
{
126+
"commands": ["ls -la", "cat file.py"], # List of commands
127+
"timeout_ms": 30000, # Optional timeout per command
128+
"max_output_length": 10000 # Optional output limit
129+
}
130+
```
131+
132+
**Output Format:**
133+
134+
```python
135+
{
136+
"output": [
137+
{
138+
"stdout": "file1.py\nfile2.py",
139+
"stderr": "",
140+
"outcome": {"type": "exit", "exit_code": 0}
141+
}
142+
]
143+
}
144+
```
145+
146+
### Apply Patch Tool
147+
148+
The `ApplyPatchTool` creates, updates, and deletes files using OpenAI's V4A diff format.
149+
150+
**Operations:**
151+
152+
| Operation | Description | Diff Required |
153+
| ------------- | -------------------- | ------------- |
154+
| `create_file` | Create a new file | Yes |
155+
| `update_file` | Modify existing file | Yes |
156+
| `delete_file` | Remove a file | No |
157+
158+
**Input Schema:**
159+
160+
```python
161+
{
162+
"type": "update_file",
163+
"path": "src/main.py",
164+
"diff": "..." # V4A diff content
165+
}
166+
```
167+
168+
**V4A Diff Format Example:**
169+
170+
```diff
171+
@@ def hello():
172+
- print("Hello")
173+
+ print("Hello, World!")
174+
```
175+
176+
**Output Format:**
177+
178+
```python
179+
{
180+
"status": "completed", # or "failed"
181+
"output": "Updated src/main.py"
182+
}
183+
```
184+
185+
## Agent Integration
186+
187+
The `OpenAIAgent` automatically detects `shell` and `apply_patch` tools and converts them to OpenAI's native types:
188+
189+
```python
190+
# In hud/agents/openai.py
191+
def _to_openai_tool(self, tool):
192+
if tool.name == "shell":
193+
return FunctionShellToolParam(type="shell")
194+
if tool.name == "apply_patch":
195+
return ApplyPatchToolParam(type="apply_patch")
196+
# ... regular function tools
197+
```
198+
199+
This means:
200+
201+
1. The model sees native `shell` and `apply_patch` tools
202+
2. Responses include `shell_call` and `apply_patch_call` output types
203+
3. The agent routes these back to your registered tools
204+
205+
## Complete Example
206+
207+
Here's the full local mode example with a working directory:
208+
209+
```python
210+
import asyncio
211+
import os
212+
import tempfile
213+
214+
from dotenv import load_dotenv
215+
from openai import AsyncOpenAI
216+
217+
load_dotenv() # Load .env file
218+
219+
import hud
220+
from hud.agents.openai import OpenAIAgent
221+
from hud.tools.shell import ShellTool
222+
from hud.tools.apply_patch import ApplyPatchTool
223+
224+
225+
async def main():
226+
# Set up working directory
227+
work_dir = "./codex_output"
228+
os.makedirs(work_dir, exist_ok=True)
229+
base_path = os.path.abspath(work_dir)
230+
231+
# Initialize tools
232+
shell_tool = ShellTool()
233+
apply_patch_tool = ApplyPatchTool(base_path=base_path)
234+
235+
# Create environment with local tools
236+
env = hud.Environment("local-codex")
237+
238+
@env.tool()
239+
async def shell(
240+
commands: list[str],
241+
timeout_ms: int | None = None,
242+
max_output_length: int | None = None,
243+
) -> dict:
244+
"""Execute shell commands in a bash session."""
245+
# Change to working directory before executing
246+
prefixed_commands = [f"cd {base_path} && {cmd}" for cmd in commands]
247+
result = await shell_tool(
248+
commands=prefixed_commands,
249+
timeout_ms=timeout_ms,
250+
max_output_length=max_output_length,
251+
)
252+
return result.to_dict()
253+
254+
@env.tool()
255+
async def apply_patch(
256+
type: str,
257+
path: str,
258+
diff: str | None = None,
259+
) -> dict:
260+
"""Apply file operations using V4A diff format."""
261+
result = await apply_patch_tool(type=type, path=path, diff=diff)
262+
return result.to_dict()
263+
264+
# Define scenario
265+
@env.scenario("coding_task")
266+
async def coding_task_scenario(task_description: str):
267+
yield f"""You are a skilled software developer. Complete the following task:
268+
269+
{task_description}
270+
271+
Use the available tools:
272+
- `shell` to run commands (ls, cat, python, etc.)
273+
- `apply_patch` to create or modify files
274+
275+
Work in the current directory. When done, verify your work runs correctly."""
276+
277+
yield 1.0
278+
279+
# Create agent
280+
agent = OpenAIAgent.create(model="gpt-5.1", verbose=True)
281+
282+
# Run the task
283+
task = "Create a Python script called main.py that prints Hello World"
284+
eval_task = env("coding_task", task_description=task)
285+
286+
async with hud.eval(eval_task, name="codex-coding-local") as ctx:
287+
await agent.run(ctx, max_steps=20)
288+
289+
print(f"Reward: {ctx.reward}")
290+
print(f"Files created in: {base_path}")
291+
292+
# Show created files
293+
for f in os.listdir(base_path):
294+
print(f" - {f}")
295+
296+
297+
asyncio.run(main())
298+
```
299+
300+
## CLI Usage
301+
302+
### Setting Up API Keys
303+
304+
Create a `.env` file in your project root:
305+
306+
```bash
307+
# For local mode (calls OpenAI directly)
308+
OPENAI_API_KEY=sk-...
309+
310+
# For hub mode OR traces (recommended)
311+
HUD_API_KEY=sk-hud-...
312+
```
313+
314+
Get your keys:
315+
316+
- **HUD_API_KEY**: [hud.ai/project/api-keys](https://hud.ai/project/api-keys)
317+
- **OPENAI_API_KEY**: [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
318+
319+
<Tip>
320+
If you have both keys set, you get local execution with cloud traces - the
321+
best of both worlds!
322+
</Tip>
323+
324+
### Running the Example
325+
326+
```bash
327+
# Local mode - tools run on your machine
328+
uv run python examples/06_codex_coding_agent.py --local
329+
330+
# Local mode with persistent output directory
331+
uv run python examples/06_codex_coding_agent.py --local --work-dir ./codex_output
332+
333+
# Hub mode - full cloud execution (default)
334+
uv run python examples/06_codex_coding_agent.py
335+
336+
# Custom task
337+
uv run python examples/06_codex_coding_agent.py --local \
338+
--task "Create a Python script that prints the Fibonacci sequence up to 10 numbers"
339+
340+
# Verbose output
341+
uv run python examples/06_codex_coding_agent.py --local --verbose
342+
```
343+
344+
### CLI Options
345+
346+
| Flag | Default | Description |
347+
| ------------- | ------------------ | -------------------------------------------------- |
348+
| `--local` | Off | Run locally (tools on your machine, OpenAI direct) |
349+
| `--task` | Hello World script | The coding task to complete |
350+
| `--model` | `gpt-5.1` | Codex-capable model (`gpt-5.1`, `gpt-5.1-codex`) |
351+
| `--work-dir` | Temp directory | Working directory (local mode only) |
352+
| `--max-steps` | `20` | Maximum agent steps |
353+
| `--verbose` | Off | Enable verbose output |
354+
355+
## Security Considerations
356+
357+
<Warning>
358+
The shell and apply_patch tools can execute arbitrary commands and modify
359+
files. Use them in sandboxed environments for untrusted tasks.
360+
</Warning>
361+
362+
## See Also
363+
364+
- [Codex-capable models](https://platform.openai.com/docs/guides/tools-shell#supported-models) - OpenAI models that support native shell and apply_patch tools
365+
- [Tools Reference](/reference/tools) - Complete tool documentation
366+
- [OpenAI Agent](/reference/agents#openaiagent) - Agent configuration options
367+
- [Integrations](/guides/integrations) - Using HUD with other frameworks
368+
- [Sandboxing](/guides/sandboxing) - Running agents safely

docs/docs.json

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,12 @@
6060
"migration"
6161
]
6262
},
63+
{
64+
"group": "Cookbooks",
65+
"pages": [
66+
"cookbooks/codex-coding"
67+
]
68+
},
6369
{
6470
"group": "Advanced",
6571
"pages": [
@@ -231,4 +237,4 @@
231237
"twitter:description": "OSS Evaluations and RL Environments SDK"
232238
}
233239
}
234-
}
240+
}

0 commit comments

Comments
 (0)