|
| 1 | +--- |
| 2 | +title: "Codex Agent" |
| 3 | +description: "Build coding agents with OpenAI's native shell and apply_patch tools" |
| 4 | +icon: "code" |
| 5 | +--- |
| 6 | + |
| 7 | +HUD provides native support for OpenAI's coding tools (`shell` and `apply_patch`), enabling you to build powerful coding agents that can create, modify, and execute code. |
| 8 | + |
| 9 | +<Card |
| 10 | + title="Example Code" |
| 11 | + icon="github" |
| 12 | + href="https://github.com/hud-evals/hud-python/blob/main/examples/06_codex_coding_agent.py" |
| 13 | +> |
| 14 | + Follow along with the full working example on GitHub. |
| 15 | +</Card> |
| 16 | + |
| 17 | +## Overview |
| 18 | + |
| 19 | +OpenAI's Responses API includes specialized tools for coding tasks: |
| 20 | + |
| 21 | +| Tool | Purpose | HUD Implementation | |
| 22 | +| ------------- | ------------------------------------------------------ | -------------------------------------- | |
| 23 | +| `shell` | Execute shell commands in a persistent bash session | `hud.tools.shell.ShellTool` | |
| 24 | +| `apply_patch` | Create, update, and delete files using V4A diff format | `hud.tools.apply_patch.ApplyPatchTool` | |
| 25 | + |
| 26 | +When you register tools named `shell` or `apply_patch` in your environment, the `OpenAIAgent` automatically converts them to OpenAI's native tool types for optimal performance. |
| 27 | + |
| 28 | +## Two Modes |
| 29 | + |
| 30 | +HUD supports two execution modes for coding agents: |
| 31 | + |
| 32 | +| Mode | Tools Run On | Inference Via | API Keys Required | |
| 33 | +| --------------------- | ------------ | --------------- | ----------------- | |
| 34 | +| **Local** (`--local`) | Your machine | OpenAI directly | `OPENAI_API_KEY` | |
| 35 | +| **Hub** (default) | HUD Cloud | HUD Gateway | `HUD_API_KEY` | |
| 36 | + |
| 37 | +Both modes support traces on hud.ai if `HUD_API_KEY` is set. |
| 38 | + |
| 39 | +## Quick Start |
| 40 | + |
| 41 | +### Local Mode (No Docker) |
| 42 | + |
| 43 | +Run coding agents directly on your machine without any infrastructure: |
| 44 | + |
| 45 | +```python |
| 46 | +import hud |
| 47 | +from hud.agents.openai import OpenAIAgent |
| 48 | +from hud.tools.shell import ShellTool |
| 49 | +from hud.tools.apply_patch import ApplyPatchTool |
| 50 | + |
| 51 | +# Create environment with coding tools |
| 52 | +env = hud.Environment("coding") |
| 53 | +shell_tool = ShellTool() |
| 54 | +apply_patch_tool = ApplyPatchTool(base_path="/path/to/workspace") |
| 55 | + |
| 56 | +@env.tool() |
| 57 | +async def shell(commands: list[str], timeout_ms: int | None = None): |
| 58 | + """Execute shell commands.""" |
| 59 | + result = await shell_tool(commands=commands, timeout_ms=timeout_ms) |
| 60 | + return result.to_dict() |
| 61 | + |
| 62 | +@env.tool() |
| 63 | +async def apply_patch(type: str, path: str, diff: str | None = None): |
| 64 | + """Apply file patches.""" |
| 65 | + result = await apply_patch_tool(type=type, path=path, diff=diff) |
| 66 | + return result.to_dict() |
| 67 | + |
| 68 | +# Run with OpenAI agent (calls OpenAI directly) |
| 69 | +agent = OpenAIAgent.create(model="gpt-5.1") |
| 70 | + |
| 71 | +async with hud.eval(env(), name="coding-task") as ctx: |
| 72 | + result = await agent.run(ctx, max_steps=20) |
| 73 | +``` |
| 74 | + |
| 75 | +### Hub Mode (Cloud Execution) |
| 76 | + |
| 77 | +Connect to HUD Hub for full cloud execution and telemetry: |
| 78 | + |
| 79 | +```python |
| 80 | +import hud |
| 81 | +from hud.agents.openai import OpenAIAgent |
| 82 | +from hud.settings import settings |
| 83 | +from openai import AsyncOpenAI |
| 84 | + |
| 85 | +# Connect to HUD Hub environment |
| 86 | +env = hud.Environment() |
| 87 | +env.connect_hub("codex_sandbox_environment") |
| 88 | + |
| 89 | +# Use HUD Gateway for inference (full telemetry) |
| 90 | +model_client = AsyncOpenAI( |
| 91 | + base_url=settings.hud_gateway_url, |
| 92 | + api_key=settings.api_key, |
| 93 | +) |
| 94 | +agent = OpenAIAgent.create( |
| 95 | + model="gpt-5.1", |
| 96 | + model_client=model_client, |
| 97 | + validate_api_key=False, |
| 98 | +) |
| 99 | + |
| 100 | +async with hud.eval(env(), name="coding-task") as ctx: |
| 101 | + result = await agent.run(ctx, max_steps=20) |
| 102 | +``` |
| 103 | + |
| 104 | +<Note> |
| 105 | + The first request may take a few seconds while the environment spins up in the |
| 106 | + cloud. Subsequent requests will be faster. |
| 107 | +</Note> |
| 108 | + |
| 109 | +## Tool Specifications |
| 110 | + |
| 111 | +### Shell Tool |
| 112 | + |
| 113 | +The `ShellTool` provides a persistent bash session for executing commands. |
| 114 | + |
| 115 | +**Features:** |
| 116 | + |
| 117 | +- Auto-restart on error (session automatically restarts if needed) |
| 118 | +- Dynamic timeout via `timeout_ms` parameter |
| 119 | +- Persistent environment (exported variables, working directory) |
| 120 | +- Concurrent command execution support |
| 121 | + |
| 122 | +**Input Schema:** |
| 123 | + |
| 124 | +```python |
| 125 | +{ |
| 126 | + "commands": ["ls -la", "cat file.py"], # List of commands |
| 127 | + "timeout_ms": 30000, # Optional timeout per command |
| 128 | + "max_output_length": 10000 # Optional output limit |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +**Output Format:** |
| 133 | + |
| 134 | +```python |
| 135 | +{ |
| 136 | + "output": [ |
| 137 | + { |
| 138 | + "stdout": "file1.py\nfile2.py", |
| 139 | + "stderr": "", |
| 140 | + "outcome": {"type": "exit", "exit_code": 0} |
| 141 | + } |
| 142 | + ] |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +### Apply Patch Tool |
| 147 | + |
| 148 | +The `ApplyPatchTool` creates, updates, and deletes files using OpenAI's V4A diff format. |
| 149 | + |
| 150 | +**Operations:** |
| 151 | + |
| 152 | +| Operation | Description | Diff Required | |
| 153 | +| ------------- | -------------------- | ------------- | |
| 154 | +| `create_file` | Create a new file | Yes | |
| 155 | +| `update_file` | Modify existing file | Yes | |
| 156 | +| `delete_file` | Remove a file | No | |
| 157 | + |
| 158 | +**Input Schema:** |
| 159 | + |
| 160 | +```python |
| 161 | +{ |
| 162 | + "type": "update_file", |
| 163 | + "path": "src/main.py", |
| 164 | + "diff": "..." # V4A diff content |
| 165 | +} |
| 166 | +``` |
| 167 | + |
| 168 | +**V4A Diff Format Example:** |
| 169 | + |
| 170 | +```diff |
| 171 | +@@ def hello(): |
| 172 | +- print("Hello") |
| 173 | ++ print("Hello, World!") |
| 174 | +``` |
| 175 | + |
| 176 | +**Output Format:** |
| 177 | + |
| 178 | +```python |
| 179 | +{ |
| 180 | + "status": "completed", # or "failed" |
| 181 | + "output": "Updated src/main.py" |
| 182 | +} |
| 183 | +``` |
| 184 | + |
| 185 | +## Agent Integration |
| 186 | + |
| 187 | +The `OpenAIAgent` automatically detects `shell` and `apply_patch` tools and converts them to OpenAI's native types: |
| 188 | + |
| 189 | +```python |
| 190 | +# In hud/agents/openai.py |
| 191 | +def _to_openai_tool(self, tool): |
| 192 | + if tool.name == "shell": |
| 193 | + return FunctionShellToolParam(type="shell") |
| 194 | + if tool.name == "apply_patch": |
| 195 | + return ApplyPatchToolParam(type="apply_patch") |
| 196 | + # ... regular function tools |
| 197 | +``` |
| 198 | + |
| 199 | +This means: |
| 200 | + |
| 201 | +1. The model sees native `shell` and `apply_patch` tools |
| 202 | +2. Responses include `shell_call` and `apply_patch_call` output types |
| 203 | +3. The agent routes these back to your registered tools |
| 204 | + |
| 205 | +## Complete Example |
| 206 | + |
| 207 | +Here's the full local mode example with a working directory: |
| 208 | + |
| 209 | +```python |
| 210 | +import asyncio |
| 211 | +import os |
| 212 | +import tempfile |
| 213 | + |
| 214 | +from dotenv import load_dotenv |
| 215 | +from openai import AsyncOpenAI |
| 216 | + |
| 217 | +load_dotenv() # Load .env file |
| 218 | + |
| 219 | +import hud |
| 220 | +from hud.agents.openai import OpenAIAgent |
| 221 | +from hud.tools.shell import ShellTool |
| 222 | +from hud.tools.apply_patch import ApplyPatchTool |
| 223 | + |
| 224 | + |
| 225 | +async def main(): |
| 226 | + # Set up working directory |
| 227 | + work_dir = "./codex_output" |
| 228 | + os.makedirs(work_dir, exist_ok=True) |
| 229 | + base_path = os.path.abspath(work_dir) |
| 230 | + |
| 231 | + # Initialize tools |
| 232 | + shell_tool = ShellTool() |
| 233 | + apply_patch_tool = ApplyPatchTool(base_path=base_path) |
| 234 | + |
| 235 | + # Create environment with local tools |
| 236 | + env = hud.Environment("local-codex") |
| 237 | + |
| 238 | + @env.tool() |
| 239 | + async def shell( |
| 240 | + commands: list[str], |
| 241 | + timeout_ms: int | None = None, |
| 242 | + max_output_length: int | None = None, |
| 243 | + ) -> dict: |
| 244 | + """Execute shell commands in a bash session.""" |
| 245 | + # Change to working directory before executing |
| 246 | + prefixed_commands = [f"cd {base_path} && {cmd}" for cmd in commands] |
| 247 | + result = await shell_tool( |
| 248 | + commands=prefixed_commands, |
| 249 | + timeout_ms=timeout_ms, |
| 250 | + max_output_length=max_output_length, |
| 251 | + ) |
| 252 | + return result.to_dict() |
| 253 | + |
| 254 | + @env.tool() |
| 255 | + async def apply_patch( |
| 256 | + type: str, |
| 257 | + path: str, |
| 258 | + diff: str | None = None, |
| 259 | + ) -> dict: |
| 260 | + """Apply file operations using V4A diff format.""" |
| 261 | + result = await apply_patch_tool(type=type, path=path, diff=diff) |
| 262 | + return result.to_dict() |
| 263 | + |
| 264 | + # Define scenario |
| 265 | + @env.scenario("coding_task") |
| 266 | + async def coding_task_scenario(task_description: str): |
| 267 | + yield f"""You are a skilled software developer. Complete the following task: |
| 268 | +
|
| 269 | +{task_description} |
| 270 | +
|
| 271 | +Use the available tools: |
| 272 | +- `shell` to run commands (ls, cat, python, etc.) |
| 273 | +- `apply_patch` to create or modify files |
| 274 | +
|
| 275 | +Work in the current directory. When done, verify your work runs correctly.""" |
| 276 | + |
| 277 | + yield 1.0 |
| 278 | + |
| 279 | + # Create agent |
| 280 | + agent = OpenAIAgent.create(model="gpt-5.1", verbose=True) |
| 281 | + |
| 282 | + # Run the task |
| 283 | + task = "Create a Python script called main.py that prints Hello World" |
| 284 | + eval_task = env("coding_task", task_description=task) |
| 285 | + |
| 286 | + async with hud.eval(eval_task, name="codex-coding-local") as ctx: |
| 287 | + await agent.run(ctx, max_steps=20) |
| 288 | + |
| 289 | + print(f"Reward: {ctx.reward}") |
| 290 | + print(f"Files created in: {base_path}") |
| 291 | + |
| 292 | + # Show created files |
| 293 | + for f in os.listdir(base_path): |
| 294 | + print(f" - {f}") |
| 295 | + |
| 296 | + |
| 297 | +asyncio.run(main()) |
| 298 | +``` |
| 299 | + |
| 300 | +## CLI Usage |
| 301 | + |
| 302 | +### Setting Up API Keys |
| 303 | + |
| 304 | +Create a `.env` file in your project root: |
| 305 | + |
| 306 | +```bash |
| 307 | +# For local mode (calls OpenAI directly) |
| 308 | +OPENAI_API_KEY=sk-... |
| 309 | + |
| 310 | +# For hub mode OR traces (recommended) |
| 311 | +HUD_API_KEY=sk-hud-... |
| 312 | +``` |
| 313 | + |
| 314 | +Get your keys: |
| 315 | + |
| 316 | +- **HUD_API_KEY**: [hud.ai/project/api-keys](https://hud.ai/project/api-keys) |
| 317 | +- **OPENAI_API_KEY**: [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |
| 318 | + |
| 319 | +<Tip> |
| 320 | + If you have both keys set, you get local execution with cloud traces - the |
| 321 | + best of both worlds! |
| 322 | +</Tip> |
| 323 | + |
| 324 | +### Running the Example |
| 325 | + |
| 326 | +```bash |
| 327 | +# Local mode - tools run on your machine |
| 328 | +uv run python examples/06_codex_coding_agent.py --local |
| 329 | + |
| 330 | +# Local mode with persistent output directory |
| 331 | +uv run python examples/06_codex_coding_agent.py --local --work-dir ./codex_output |
| 332 | + |
| 333 | +# Hub mode - full cloud execution (default) |
| 334 | +uv run python examples/06_codex_coding_agent.py |
| 335 | + |
| 336 | +# Custom task |
| 337 | +uv run python examples/06_codex_coding_agent.py --local \ |
| 338 | + --task "Create a Python script that prints the Fibonacci sequence up to 10 numbers" |
| 339 | + |
| 340 | +# Verbose output |
| 341 | +uv run python examples/06_codex_coding_agent.py --local --verbose |
| 342 | +``` |
| 343 | + |
| 344 | +### CLI Options |
| 345 | + |
| 346 | +| Flag | Default | Description | |
| 347 | +| ------------- | ------------------ | -------------------------------------------------- | |
| 348 | +| `--local` | Off | Run locally (tools on your machine, OpenAI direct) | |
| 349 | +| `--task` | Hello World script | The coding task to complete | |
| 350 | +| `--model` | `gpt-5.1` | Codex-capable model (`gpt-5.1`, `gpt-5.1-codex`) | |
| 351 | +| `--work-dir` | Temp directory | Working directory (local mode only) | |
| 352 | +| `--max-steps` | `20` | Maximum agent steps | |
| 353 | +| `--verbose` | Off | Enable verbose output | |
| 354 | + |
| 355 | +## Security Considerations |
| 356 | + |
| 357 | +<Warning> |
| 358 | + The shell and apply_patch tools can execute arbitrary commands and modify |
| 359 | + files. Use them in sandboxed environments for untrusted tasks. |
| 360 | +</Warning> |
| 361 | + |
| 362 | +## See Also |
| 363 | + |
| 364 | +- [Codex-capable models](https://platform.openai.com/docs/guides/tools-shell#supported-models) - OpenAI models that support native shell and apply_patch tools |
| 365 | +- [Tools Reference](/reference/tools) - Complete tool documentation |
| 366 | +- [OpenAI Agent](/reference/agents#openaiagent) - Agent configuration options |
| 367 | +- [Integrations](/guides/integrations) - Using HUD with other frameworks |
| 368 | +- [Sandboxing](/guides/sandboxing) - Running agents safely |
0 commit comments