You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the `Environment` class—tools, setup, and scoring live together.
8
8
9
9
<Warning>
10
-
**Deprecation Notice**: `LegacyTask`, `setup_tool`, and `evaluate_tool` are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Use `Task.from_v4()` for quick migration or`@env.scenario()` for new code.
10
+
**Deprecation Notice**: `LegacyTask`, `setup_tool`, and `evaluate_tool` are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Migrate to`@env.scenario()` for new code.
11
11
</Warning>
12
12
13
-
## Good News: Your Code Still Works
13
+
## MCPServer → Environment
14
14
15
15
`Environment` inherits from `MCPServer`. Same API, same behavior. Just change the import:
16
16
@@ -38,77 +38,78 @@ env.run()
38
38
39
39
That's it. Your Dockerfile, your tools, your `run()` call—all unchanged. Environment adds scenarios, connectors, and integrations on top.
40
40
41
-
## Migration Path 1: Quick Conversion with Task.from_v4()
41
+
## Migrating Tasks: Prompt Passthrough Pattern
42
42
43
-
The fastest way to migrate existing v4 code—no changes to task definitions needed:
43
+
The recommended migration uses the **prompt passthrough pattern**—scenario arguments become the prompt content:
answer =yield"Verify the settings page loaded correctly"
130
+
answer =yieldinstruction
133
131
134
-
result =await env.call_tool("check_settings")
132
+
result =await env.call_tool("check_completion")
135
133
yield1.0if result else0.0
136
134
```
137
135
136
+
## JSON Task Format (Platform Ready)
137
+
138
+
For JSON-based task definitions that can be uploaded to the HUD platform, use this format:
139
+
140
+
```json
141
+
{
142
+
"env": {
143
+
"name": "hud-evals/browser"
144
+
},
145
+
"scenario": "web-task",
146
+
"args": {
147
+
"instruction": "Find the contact page and extract the support email",
148
+
"start_url": "https://example.com"
149
+
}
150
+
}
151
+
```
152
+
153
+
This maps directly to the scenario call: `env("web-task", instruction="...", start_url="...")`.
154
+
155
+
**Example: Task set for platform upload**
156
+
157
+
```json
158
+
[
159
+
{
160
+
"env": { "name": "hud-ops-diagnostics-sentry" },
161
+
"scenario": "sentry-agent:investigate",
162
+
"args": {
163
+
"issue_id": "PROJ-1234",
164
+
"max_depth": 3
165
+
}
166
+
},
167
+
{
168
+
"env": { "name": "hud-evals/browser" },
169
+
"scenario": "web-task",
170
+
"args": {
171
+
"instruction": "Add a MacBook Pro to cart and proceed to checkout"
172
+
}
173
+
}
174
+
]
175
+
```
176
+
177
+
The `args` field uses prompt passthrough—the values flow directly into the scenario's yield statement.
178
+
138
179
## Using with Built-in Agents
139
180
140
-
Built-in agents (ClaudeAgent, OpenAIAgent, etc.) work with both patterns:
181
+
Built-in agents work with scenarios:
141
182
142
183
```python
143
184
from hud.agents import ClaudeAgent
144
185
145
186
agent = ClaudeAgent.create()
146
-
147
-
# Works with Task from scenario
148
-
result =await agent.run(env("navigate-google"))
149
-
150
-
# Works with Task.from_v4() conversion
151
-
result =await agent.run(Task.from_v4(legacy_task))
187
+
result =await agent.run(env("web-task", instruction="Find the pricing page"))
152
188
```
153
189
154
-
## Optional: Bring Your Own Agent
190
+
## Bring Your Own Agent
155
191
156
192
v5 gives you the `hud.eval()` context manager for maximum flexibility:
157
193
158
194
```python
159
-
asyncwith hud.eval(env("checkout", product="laptop")) as ctx:
195
+
asyncwith hud.eval(env("shopping", task="Add item to cart", shop_url="https://shop.example.com")) as ctx:
160
196
# Use OpenAI, Anthropic, your own agent—whatever you want
161
197
response =await client.chat.completions.create(
162
198
model="gpt-4o",
@@ -170,14 +206,12 @@ async with hud.eval(env("checkout", product="laptop")) as ctx:
170
206
print(ctx.reward)
171
207
```
172
208
173
-
The old `ClaudeAgent` and `OperatorAgent` still work—even with the new `hud.eval()` system. But now you're not locked into a specific agent spec. Pair with the [Gateway](/quick-links/gateway) to use any model through one API.
174
-
175
209
## Quick Reference
176
210
177
-
| v4 (deprecated in v0.6.0) | v5 |
178
-
|---------------------------|-----|
179
-
|`LegacyTask(...)`|`Task.from_v4(...)` (quick) or `env("scenario", ...)`(recommended)|
0 commit comments