Skip to content

Commit 971f613

Browse files
committed
feat: add multi-agent orchestration pattern
- Add multi-agent example with conductor and sub-agents - Add skills documentation (overview, authoring guide, orchestration) - Fix codex model routing to check only matched ID - Remove unsafe eval() from docs example
1 parent 227ea8a commit 971f613

File tree

5 files changed

+716
-1
lines changed

5 files changed

+716
-1
lines changed

docs/cookbooks/multi-agent.mdx

Lines changed: 398 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,398 @@
1+
---
2+
title: "Multi-Agent Orchestration"
3+
description: "Compose specialized agents into coordinated multi-agent systems"
4+
icon: "diagram-project"
5+
---
6+
7+
Multi-agent systems let you combine specialized agents—each with their own environment, tools, and model—into a coordinated workflow. A "conductor" agent orchestrates the specialists, dispatching tasks and synthesizing results.
8+
9+
<Card
10+
title="Example Code"
11+
icon="github"
12+
href="https://github.com/hud-evals/hud-python/blob/main/examples/07_multi_agent.py"
13+
>
14+
Follow along with the full working example on GitHub.
15+
</Card>
16+
17+
## Overview
18+
19+
The multi-agent pattern solves a common problem: as agent capabilities grow, a single agent with 50+ tools becomes unwieldy. By splitting responsibilities across specialized agents, each one stays focused and effective.
20+
21+
```mermaid
22+
flowchart TD
23+
subgraph orch["Coordinator (Conductor)"]
24+
O["2 sub-agent tools"]
25+
end
26+
27+
subgraph browser["Browser Agent"]
28+
B1["navigate"]
29+
B2["click"]
30+
B3["extract_text"]
31+
end
32+
33+
subgraph coding["Coding Agent"]
34+
C1["shell"]
35+
C2["apply_patch"]
36+
C3["read_file"]
37+
end
38+
39+
O --> browser
40+
O --> coding
41+
```
42+
43+
The conductor sees only 2 tools—one per specialist. Each specialist has a focused toolset for its domain.
44+
45+
## Key Concepts
46+
47+
| Concept | Description |
48+
|---------|-------------|
49+
| **Coordinator** | An Environment with sub-agents registered as tools |
50+
| **AgentTool** | Wraps an environment + scenario as a callable tool |
51+
| **Conductor** | The agent that runs the coordinator (makes decisions) |
52+
| **Sub-agent** | A specialized agent wrapped as a tool |
53+
| **Eval-only params** | Parameters hidden from conductor but available for evaluation |
54+
55+
## Quick Start
56+
57+
### Prerequisites
58+
59+
```bash
60+
export HUD_API_KEY="sk-hud-..."
61+
```
62+
63+
Get your API key at [hud.ai/project/api-keys](https://hud.ai/project/api-keys).
64+
65+
<Note>
66+
**Prerequisites**: You must deploy two hub environments before running this example:
67+
68+
1. **Remote Browser**: Go to [hud-evals/hud-remote-browser](https://github.com/hud-evals/hud-remote-browser) → Fork to your GitHub → [hud.ai](https://hud.ai)**New****Environment** → Import from your repo. Set required browser provider API keys (e.g., `ANCHOR_API_KEY`).
69+
70+
2. **Codex Sandbox**: Go to [hud.ai](https://hud.ai)**New****Environment** → Import from [hud-evals/codex_environment_sandbox](https://github.com/hud-evals/codex_environment_sandbox).
71+
72+
Once deployed, update the `connect_hub()` calls to use your environment slugs (e.g., `my-org/remote-browser`).
73+
</Note>
74+
75+
### Running the Example
76+
77+
```bash
78+
# Default task: research and save to markdown
79+
uv run python examples/07_multi_agent.py
80+
81+
# Custom research task
82+
uv run python examples/07_multi_agent.py \
83+
--task "Find current prices of Bitcoin and Ethereum and save to crypto.md"
84+
85+
# Verbose mode
86+
uv run python examples/07_multi_agent.py --verbose
87+
```
88+
89+
## Building a Multi-Agent System
90+
91+
The pattern is simple:
92+
1. Create `AgentTool`s that wrap environments + models
93+
2. Register them on a coordinator `Environment`
94+
3. Run a "conductor" agent that dispatches work to sub-agents
95+
96+
### Step 1: Create Sub-Agent Environments
97+
98+
Each sub-agent is an `Environment` with its own tools and scenario. Connect to HUD Hub environments or define local tools:
99+
100+
```python
101+
from hud import Environment
102+
from hud.tools.agent import AgentTool
103+
104+
105+
def create_browser_agent() -> AgentTool:
106+
"""Create a browser sub-agent for web research."""
107+
env = Environment("browser")
108+
env.connect_hub("hud-remote-browser-2")
109+
110+
@env.scenario()
111+
async def web_research(
112+
task: str,
113+
start_url: str | None = None,
114+
expected_outcome: str | None = None, # Eval-only (hidden from conductor)
115+
):
116+
"""Research information on the web."""
117+
prompt = f"""You are a web research agent with browser access.
118+
119+
Research Task: {task}
120+
"""
121+
if start_url:
122+
prompt += f"\nStart URL: {start_url}"
123+
124+
prompt += """
125+
126+
Find relevant information, extract key data, and return structured findings."""
127+
128+
yield prompt
129+
yield 1.0
130+
131+
return AgentTool(
132+
env("web_research"),
133+
model="claude-sonnet-4-5", # Good at browser navigation
134+
name="web_research",
135+
description="Research information on the web. Use for finding articles, "
136+
"scraping data, comparing prices, and extracting structured information.",
137+
)
138+
```
139+
140+
### Step 2: Define the Coding Agent
141+
142+
```python
143+
def create_coding_agent() -> AgentTool:
144+
"""Create a coding sub-agent for file operations."""
145+
env = Environment("coding")
146+
env.connect_hub("codex_environment_sandbox")
147+
148+
@env.scenario()
149+
async def create_markdown(
150+
filename: str,
151+
content: str,
152+
expected_result: str | None = None, # Eval-only
153+
):
154+
"""Create a markdown file with the given content."""
155+
prompt = f"""You are a file creation assistant.
156+
157+
Task: Create a markdown file named '{filename}' with the following content:
158+
159+
{content}
160+
161+
IMPORTANT: Use the `apply_patch` tool to create the file.
162+
163+
Steps:
164+
1. Use apply_patch to create '{filename}' with the content above
165+
2. Confirm it was created successfully
166+
167+
Return a confirmation message."""
168+
169+
yield prompt
170+
yield 1.0
171+
172+
return AgentTool(
173+
env("create_markdown"),
174+
model="gpt-5.1", # Codex-capable for native shell/apply_patch
175+
name="create_markdown",
176+
description="Create a markdown file with specified content. Use for "
177+
"saving research findings, creating reports, and documenting results.",
178+
)
179+
```
180+
181+
### Step 3: Create the Coordinator
182+
183+
Create an `Environment` with sub-agents as tools, then run a conductor agent:
184+
185+
```python
186+
import hud
187+
from hud import Environment
188+
from hud.agents import create_agent
189+
190+
191+
async def run_research(task: str):
192+
# Create sub-agents as tools
193+
browser_agent = create_browser_agent()
194+
coding_agent = create_coding_agent()
195+
196+
# Create coordinator environment with sub-agents as tools
197+
coordinator = Environment("coordinator")
198+
coordinator.add_tool(browser_agent)
199+
coordinator.add_tool(coding_agent)
200+
201+
# Define the coordination scenario
202+
@coordinator.scenario()
203+
async def coordinate(prompt: str):
204+
yield prompt
205+
yield 1.0
206+
207+
# System prompt for the conductor
208+
system_prompt = """You are a research assistant coordinating specialized agents.
209+
210+
Available sub-agents (call as tools):
211+
- web_research: Find information on the web
212+
- create_markdown: Create markdown files
213+
214+
CRITICAL: Sub-agents don't share context. When calling create_markdown,
215+
you MUST pass the content you want to save.
216+
217+
Workflow:
218+
1. web_research: Gather data
219+
2. Format the data into markdown content
220+
3. create_markdown: Save the formatted content
221+
"""
222+
223+
# Run with eval context
224+
async with hud.eval(
225+
coordinator("coordinate", prompt=task),
226+
name="multi-agent-research",
227+
) as ctx:
228+
conductor = create_agent("gpt-4o", system_prompt=system_prompt)
229+
result = await conductor.run(ctx, max_steps=10)
230+
231+
print(f"Reward: {ctx.reward}")
232+
print(f"Result: {result.content}")
233+
```
234+
235+
## AgentTool API
236+
237+
`AgentTool` wraps an environment's scenario as a callable tool:
238+
239+
```python
240+
from hud.tools.agent import AgentTool
241+
242+
tool = AgentTool(
243+
env("scenario_name"), # Task from environment
244+
model="claude-sonnet-4-5", # Model for this sub-agent
245+
name="tool_name", # Name shown to conductor
246+
description="...", # Description for conductor
247+
agent=None, # Or provide custom agent class
248+
agent_params={}, # Params passed to agent
249+
trace=False, # Enable separate tracing
250+
)
251+
```
252+
253+
### Eval-Only Parameters
254+
255+
Parameters with `| None = None` are automatically hidden from the conductor's tool schema:
256+
257+
```python
258+
@env.scenario()
259+
async def investigate(
260+
query: str, # Visible to conductor
261+
expected_finding: str | None = None, # Hidden (eval-only)
262+
):
263+
response = yield f"Investigate: {query}"
264+
265+
# Use expected_finding for scoring
266+
if expected_finding and response:
267+
yield 1.0 if expected_finding.lower() in response.lower() else 0.0
268+
else:
269+
yield 1.0
270+
```
271+
272+
This lets you include ground truth for evaluations without exposing it to the conductor.
273+
274+
## Context Isolation
275+
276+
<Warning>
277+
**Sub-agents don't share context.** Each sub-agent runs in its own isolated environment. The conductor must explicitly pass all necessary data when calling a sub-agent.
278+
</Warning>
279+
280+
```python
281+
# ❌ Wrong: Assuming sub-agent knows about previous results
282+
result = await ctx.call_tool(name="web_research", arguments={"task": "Find stock prices"})
283+
# The create_markdown agent won't know what web_research found!
284+
await ctx.call_tool(name="create_markdown", arguments={"filename": "report.md"})
285+
286+
# ✅ Correct: Pass data explicitly
287+
result = await ctx.call_tool(name="web_research", arguments={"task": "Find stock prices"})
288+
await ctx.call_tool(name="create_markdown", arguments={
289+
"filename": "report.md",
290+
"content": result.content # Pass the data!
291+
})
292+
```
293+
294+
Your system prompt should remind the conductor about this:
295+
296+
```python
297+
system_prompt="""...
298+
CRITICAL: Sub-agents don't share context. When calling create_markdown,
299+
you MUST pass the content you want to save.
300+
..."""
301+
```
302+
303+
## Trace Continuity
304+
305+
All sub-agent activity appears in a single trace on the HUD platform. When the conductor calls a sub-agent tool, the inference and tool calls are recorded under the parent trace—no separate URLs to track.
306+
307+
```
308+
🎭 Coordinator Trace
309+
├── 🤖 Conductor: "I'll research GOOGL prices first..."
310+
│ └── 🔧 web_research(task="Find GOOGL price")
311+
│ ├── 🤖 Browser Agent: "Navigating to finance site..."
312+
│ │ └── 🔧 navigate(url="https://finance.google.com")
313+
│ │ └── 🔧 extract_text(selector=".price")
314+
│ └── ✅ "GOOGL: $185.42"
315+
├── 🤖 Conductor: "Now I'll save to markdown..."
316+
│ └── 🔧 create_markdown(filename="googl.md", content="# GOOGL Price\n...")
317+
│ ├── 🤖 Coding Agent: "Creating file..."
318+
│ │ └── 🔧 apply_patch(type="create_file", path="googl.md", ...)
319+
│ └── ✅ "Created googl.md"
320+
└── ✅ "Research complete!"
321+
```
322+
323+
## Advanced Patterns
324+
325+
### Custom Conductor Agent
326+
327+
Use a custom agent class for the conductor:
328+
329+
```python
330+
from hud.agents.claude import ClaudeAgent
331+
332+
# Create and run with a custom agent
333+
async with hud.eval(coordinator("coordinate", prompt=task)) as ctx:
334+
conductor = ClaudeAgent.create(
335+
checkpoint_name="claude-sonnet-4-5",
336+
system_prompt=system_prompt,
337+
max_tokens=8192,
338+
)
339+
result = await conductor.run(ctx, max_steps=10)
340+
```
341+
342+
### Multiple Scenarios
343+
344+
Define multiple scenarios on the coordinator:
345+
346+
```python
347+
@coordinator.scenario()
348+
async def research(prompt: str):
349+
yield prompt
350+
yield 1.0
351+
352+
@coordinator.scenario()
353+
async def summarize(topic: str, length: str = "short"):
354+
yield f"Summarize {topic} in a {length} format"
355+
yield 1.0
356+
357+
# Use different scenarios
358+
async with hud.eval(coordinator("research", prompt="Find Python frameworks")) as ctx:
359+
...
360+
361+
async with hud.eval(coordinator("summarize", topic="ML", length="detailed")) as ctx:
362+
...
363+
```
364+
365+
### Mixing AgentTools with Regular Tools
366+
367+
You can add both AgentTools (sub-agents) and regular tools:
368+
369+
```python
370+
from hud.tools.base import BaseTool
371+
372+
class CalculatorTool(BaseTool):
373+
def __init__(self):
374+
super().__init__(name="calculator", description="Add two numbers")
375+
376+
async def __call__(self, a: float, b: float) -> str:
377+
return str(a + b)
378+
379+
coordinator = Environment("hybrid")
380+
coordinator.add_tool(browser_agent) # AgentTool (spawns sub-agent)
381+
coordinator.add_tool(CalculatorTool()) # Regular tool (runs directly)
382+
```
383+
384+
## CLI Options
385+
386+
| Flag | Default | Description |
387+
|------|---------|-------------|
388+
| `--task` | Stock research | The task for the coordinator |
389+
| `--conductor` | `gpt-4o` | Model for the conductor agent |
390+
| `--max-steps` | `10` | Maximum conductor steps |
391+
| `--verbose` | Off | Enable verbose output |
392+
393+
## See Also
394+
395+
- [Ops Diagnostics](/cookbooks/ops-diagnostics) - A more complex multi-agent example
396+
- [AgentTool Reference](/reference/tools#agenttool) - Detailed AgentTool API
397+
- [Building Environments](/build-environments) - Creating custom environments
398+
- [Scenarios](/reference/environments#scenarios) - Scenario patterns and best practices

0 commit comments

Comments
 (0)