Skip to content

Commit afeb3a5

Browse files
committed
Merge pull request #261 from hud-evals/feat/scenario-improvements
Feat/scenario improvements
1 parent 88f5732 commit afeb3a5

File tree

24 files changed

+1855
-74
lines changed

24 files changed

+1855
-74
lines changed

docs/cookbooks/ops-diagnostics.mdx

Lines changed: 478 additions & 0 deletions
Large diffs are not rendered by default.

docs/docs.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
"icon": "code",
3434
"versions": [
3535
{
36-
"version": "0.5.2",
36+
"version": "0.5.3",
3737
"groups": [
3838
{
3939
"group": "Get Started",
@@ -63,7 +63,8 @@
6363
{
6464
"group": "Cookbooks",
6565
"pages": [
66-
"cookbooks/codex-coding"
66+
"cookbooks/codex-coding",
67+
"cookbooks/ops-diagnostics"
6768
]
6869
},
6970
{

docs/reference/environments.mdx

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,81 @@ env.unmock() # Disable mock mode
266266
| `mock_tool(name, output)` | Set specific mock output |
267267
| `is_mock` | Check if mock mode is enabled |
268268

269+
## Serving as MCP Server
270+
271+
Environment can serve its tools over MCP protocols, either standalone or mounted on an existing server.
272+
273+
### serve()
274+
275+
Start a standalone MCP server:
276+
277+
```python
278+
from hud import Environment
279+
280+
env = Environment("my-env")
281+
282+
@env.tool()
283+
def greet(name: str) -> str:
284+
return f"Hello, {name}!"
285+
286+
# Run as MCP server (blocking)
287+
env.serve()
288+
```
289+
290+
| Parameter | Type | Description | Default |
291+
|-----------|------|-------------|---------|
292+
| `transport` | `Literal["stdio", "sse", "streamable-http"]` | Transport protocol | `"streamable-http"` |
293+
| `host` | `str` | Host address to bind | `"0.0.0.0"` |
294+
| `port` | `int` | Port to bind | `8000` |
295+
296+
```python
297+
# Serve over stdio (for CLI tools)
298+
env.serve(transport="stdio")
299+
300+
# Serve over HTTP on custom port
301+
env.serve(transport="streamable-http", host="0.0.0.0", port=8765)
302+
```
303+
304+
### http_app()
305+
306+
Get a Starlette/ASGI app to mount on an existing FastAPI server:
307+
308+
```python
309+
from fastapi import FastAPI
310+
from hud import Environment
311+
312+
app = FastAPI()
313+
env = Environment("my-env")
314+
315+
@env.tool()
316+
def my_tool(arg: str) -> str:
317+
return f"Got: {arg}"
318+
319+
# Mount the HUD environment's MCP endpoint at /mcp
320+
app.mount("/mcp", env.http_app())
321+
322+
# Your other FastAPI routes work normally
323+
@app.get("/health")
324+
def health():
325+
return {"status": "ok"}
326+
```
327+
328+
| Parameter | Type | Description | Default |
329+
|-----------|------|-------------|---------|
330+
| `path` | `str \| None` | Internal path for the MCP endpoint | `"/"` |
331+
| `transport` | `Literal["http", "streamable-http", "sse"]` | Transport protocol | `"http"` |
332+
| `middleware` | `list[ASGIMiddleware] \| None` | Starlette middleware | `None` |
333+
| `json_response` | `bool \| None` | Use JSON response format | `None` |
334+
| `stateless_http` | `bool \| None` | Use stateless HTTP mode | `None` |
335+
336+
MCP clients can then connect at `http://your-server/mcp`:
337+
338+
```python
339+
# Client connecting to mounted environment
340+
env.connect_url("http://localhost:8000/mcp")
341+
```
342+
343+
269344
## Properties
270345

271346
| Property | Type | Description |

docs/reference/tools.mdx

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,93 @@ async def url_match(url: str) -> EvaluationResult:
6969
# Agents call: evaluators(name="url_match", arguments={"url": "..."})
7070
```
7171

72+
## Agent Tools
73+
74+
### AgentTool
75+
76+
```python
77+
from hud.tools import AgentTool
78+
```
79+
80+
Wraps a scenario as a tool that can be called by another agent. Essential for building **hierarchical agent systems** where an orchestrator delegates to specialized subagents.
81+
82+
**Constructor Parameters:**
83+
| Parameter | Type | Description | Default |
84+
|-----------|------|-------------|---------|
85+
| `task` | `Task` | Task template from `env("scenario_name")` | Required |
86+
| `model` | `str` | Model for subagent (via gateway) | `None` |
87+
| `agent` | `type[MCPAgent]` | Custom agent class | `None` |
88+
| `agent_params` | `dict` | Additional agent parameters | `{}` |
89+
| `name` | `str` | Tool name for orchestrator | From scenario |
90+
| `description` | `str` | Tool description | Auto-generated |
91+
| `trace` | `bool` | Enable tracing for standalone runs | `False` |
92+
93+
<Note>Must provide either `model` or `agent`, not both.</Note>
94+
95+
**Eval-Only Parameters:**
96+
97+
Parameters with `| None = None` are hidden from the orchestrator but available for evaluation:
98+
99+
```python
100+
@env.scenario("investigate")
101+
async def investigate(
102+
query: str, # Visible - orchestrator passes this
103+
expected_finding: str | None = None, # Hidden - only used in eval scoring
104+
):
105+
response = yield f"Investigate: {query}"
106+
107+
# Scoring uses expected_finding but orchestrator never sees it
108+
if expected_finding and response:
109+
yield 1.0 if expected_finding in response else 0.5
110+
else:
111+
yield 1.0 if response else 0.0
112+
```
113+
114+
**Usage:**
115+
```python
116+
from hud import Environment
117+
from hud.tools import AgentTool
118+
119+
# Subagent environment with scenario
120+
sentry_env = Environment(name="sentry-agent")
121+
122+
@sentry_env.scenario("investigate")
123+
async def investigate_sentry(query: str):
124+
yield f"Investigate Sentry: {query}"
125+
126+
# Create orchestrator
127+
orchestrator = Environment(name="orchestrator")
128+
129+
# Wrap subagent scenario as tool
130+
tool = AgentTool(
131+
sentry_env("investigate"), # Task template
132+
model="gpt-4o-mini",
133+
name="investigate_sentry",
134+
description="Investigate errors in Sentry",
135+
)
136+
orchestrator.add_tool(tool.mcp)
137+
138+
# Now orchestrator agent can call investigate_sentry(query="...")
139+
```
140+
141+
**Trace Continuity:**
142+
143+
When called from within an eval context, AgentTool automatically:
144+
1. Inherits the parent's trace_id
145+
2. Skips duplicate trace registration
146+
3. Routes all inference/tool calls to the parent trace
147+
148+
```python
149+
async with hud.eval(task) as ctx:
150+
agent = create_agent("gpt-4o")
151+
result = await agent.run(ctx)
152+
# All subagent activity appears in this single trace
153+
```
154+
155+
**See Also:** [Ops Diagnostics Cookbook](/cookbook/ops-diagnostics) for a complete hierarchical agent example.
156+
157+
---
158+
72159
## Core Tools
73160

74161
### BashTool

hud/agents/__init__.py

Lines changed: 69 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,82 @@
11
from __future__ import annotations
22

3+
from typing import Any
4+
35
from .base import MCPAgent
46
from .openai import OpenAIAgent
57
from .openai_chat import OpenAIChatAgent
68
from .operator import OperatorAgent
79

8-
# Note: These agents are not exported here to avoid requiring optional dependencies.
9-
# Import directly if needed:
10-
# from hud.agents.claude import ClaudeAgent # requires anthropic
11-
# from hud.agents.gemini import GeminiAgent # requires google-genai
12-
# from hud.agents.gemini_cua import GeminiCUAAgent # requires google-genai
13-
1410
__all__ = [
1511
"MCPAgent",
1612
"OpenAIAgent",
1713
"OpenAIChatAgent",
1814
"OperatorAgent",
15+
"create_agent",
1916
]
17+
18+
19+
def create_agent(model: str, **kwargs: Any) -> MCPAgent:
20+
"""Create an agent for a gateway model.
21+
22+
This routes ALL requests through the HUD gateway. For direct API access
23+
(using your own API keys), use the agent classes directly.
24+
25+
Args:
26+
model: Model name (e.g., "gpt-4o", "claude-sonnet-4-5").
27+
**kwargs: Additional params passed to agent.create().
28+
29+
Returns:
30+
Configured MCPAgent instance with gateway routing.
31+
32+
Example:
33+
```python
34+
# Gateway routing (recommended)
35+
agent = create_agent("gpt-4o")
36+
agent = create_agent("claude-sonnet-4-5", temperature=0.7)
37+
38+
# Direct API access (use agent classes)
39+
from hud.agents.claude import ClaudeAgent
40+
41+
agent = ClaudeAgent.create(model="claude-sonnet-4-5")
42+
```
43+
"""
44+
from hud.agents.gateway import build_gateway_client
45+
from hud.agents.resolver import resolve_cls
46+
47+
# Resolve class and gateway info
48+
agent_cls, gateway_info = resolve_cls(model)
49+
50+
# Get model ID from gateway info or use input
51+
model_id = model
52+
if gateway_info:
53+
model_id = gateway_info.get("model") or gateway_info.get("id") or model
54+
55+
# Determine provider: from gateway info, or infer from agent class
56+
if gateway_info:
57+
provider = gateway_info.get("provider") or "openai"
58+
else:
59+
# Map agent class to provider for known types
60+
from hud.agents.claude import ClaudeAgent
61+
from hud.agents.gemini import GeminiAgent
62+
63+
_AGENT_TO_PROVIDER = {
64+
ClaudeAgent: "anthropic",
65+
GeminiAgent: "google",
66+
}
67+
provider = _AGENT_TO_PROVIDER.get(agent_cls, "openai")
68+
69+
client = build_gateway_client(provider)
70+
71+
# Set up kwargs
72+
kwargs.setdefault("model", model_id)
73+
74+
# Use correct client key based on agent type
75+
if agent_cls == OpenAIChatAgent:
76+
kwargs.setdefault("openai_client", client)
77+
else:
78+
# Claude and other agents use model_client and validate_api_key
79+
kwargs.setdefault("model_client", client)
80+
kwargs.setdefault("validate_api_key", False)
81+
82+
return agent_cls.create(**kwargs)

hud/agents/gateway.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
"""Gateway client utilities for HUD inference gateway."""
2+
3+
from __future__ import annotations
4+
5+
from typing import Any
6+
7+
8+
def build_gateway_client(provider: str) -> Any:
9+
"""Build a client configured for HUD gateway routing.
10+
11+
Args:
12+
provider: Provider name ("anthropic", "openai", "gemini", etc.)
13+
14+
Returns:
15+
Configured async client for the provider.
16+
"""
17+
from hud.settings import settings
18+
19+
provider = provider.lower()
20+
21+
if provider == "anthropic":
22+
from anthropic import AsyncAnthropic
23+
24+
return AsyncAnthropic(api_key=settings.api_key, base_url=settings.hud_gateway_url)
25+
26+
if provider == "gemini":
27+
from google import genai
28+
from google.genai.types import HttpOptions
29+
30+
return genai.Client(
31+
api_key="PLACEHOLDER",
32+
http_options=HttpOptions(
33+
api_version="v1beta",
34+
base_url=settings.hud_gateway_url,
35+
headers={"Authorization": f"Bearer {settings.api_key}"},
36+
),
37+
)
38+
39+
# OpenAI-compatible (openai, azure, together, groq, fireworks, etc.)
40+
from openai import AsyncOpenAI
41+
42+
return AsyncOpenAI(api_key=settings.api_key, base_url=settings.hud_gateway_url)

hud/agents/resolver.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
"""Model resolution - maps model strings to agent classes."""
2+
3+
from __future__ import annotations
4+
5+
from typing import TYPE_CHECKING, Any
6+
7+
if TYPE_CHECKING:
8+
from hud.agents.base import MCPAgent
9+
10+
__all__ = ["resolve_cls"]
11+
12+
_models_cache: list[dict[str, Any]] | None = None
13+
14+
# Provider name → AgentType value (only anthropic differs)
15+
_PROVIDER_TO_AGENT = {"anthropic": "claude"}
16+
17+
18+
def _fetch_gateway_models() -> list[dict[str, Any]]:
19+
"""Fetch available models from HUD gateway (cached)."""
20+
global _models_cache
21+
if _models_cache is not None:
22+
return _models_cache
23+
24+
import httpx
25+
26+
from hud.settings import settings
27+
28+
if not settings.api_key:
29+
return []
30+
31+
try:
32+
resp = httpx.get(
33+
f"{settings.hud_gateway_url}/models",
34+
headers={"Authorization": f"Bearer {settings.api_key}"},
35+
timeout=10.0,
36+
)
37+
resp.raise_for_status()
38+
data = resp.json()
39+
_models_cache = data.get("data", data) if isinstance(data, dict) else data
40+
return _models_cache or []
41+
except Exception:
42+
return []
43+
44+
45+
def resolve_cls(model: str) -> tuple[type[MCPAgent], dict[str, Any] | None]:
46+
"""Resolve model string to (agent_class, gateway_info).
47+
48+
Returns:
49+
(agent_class, None) for known AgentTypes
50+
(agent_class, gateway_model_info) for gateway models
51+
"""
52+
from hud.types import AgentType
53+
54+
# Known AgentType → no gateway info
55+
try:
56+
return AgentType(model).cls, None
57+
except ValueError:
58+
pass
59+
60+
# Gateway lookup
61+
for m in _fetch_gateway_models():
62+
if model in (m.get("id"), m.get("name"), m.get("model")):
63+
provider = (m.get("provider") or "openai_compatible").lower()
64+
agent_str = _PROVIDER_TO_AGENT.get(provider, provider)
65+
try:
66+
return AgentType(agent_str).cls, m
67+
except ValueError:
68+
return AgentType.OPENAI_COMPATIBLE.cls, m
69+
70+
raise ValueError(f"Model '{model}' not found")

0 commit comments

Comments
 (0)