hud-evals
diff --git a/‎docs/reference/agents.mdx‎
Lines changed: 15 additions & 14 deletions b/‎docs/reference/agents.mdx‎
Lines changed: 15 additions & 14 deletions
diff --git a/‎docs/reference/api/agents.mdx‎
Lines changed: 31 additions & 12 deletions b/‎docs/reference/api/agents.mdx‎
Lines changed: 31 additions & 12 deletions
diff --git a/‎examples/README.md‎
Lines changed: 20 additions & 30 deletions b/‎examples/README.md‎
Lines changed: 20 additions & 30 deletions
diff --git a/‎examples/openai_compatible_2048.py‎
Lines changed: 0 additions & 120 deletions b/‎examples/openai_compatible_2048.py‎
Lines changed: 0 additions & 120 deletions
@@ -139,29 +139,31 @@ OpenAI's Operator agent implementation.
 from hud.agents import GenericOpenAIChatAgent
 ```
 
-Generic OpenAI chat completion agent for any OpenAI-compatible API.
+OpenAI-compatible chat.completions agent that works with any API implementing the OpenAI schema (e.g., OpenAI, vLLM, Ollama, Together, etc.).
 
 **Constructor Parameters:**
 | Parameter | Type | Description | Default |
 |-----------|------|-------------|---------|
-| `model_client` | `AsyncOpenAI` | OpenAI-compatible client | Required |
-| `model` | `str` | Model name | Required |
-| `max_tokens` | `int` | Maximum response tokens | `4096` |
+| `openai_client` | `AsyncOpenAI` | OpenAI-compatible client instance | Required |
+| `model_name` | `str` | Chat model name | `"gpt-4o-mini"` |
+| `parallel_tool_calls` | `bool` | Allow multiple tool calls per turn | `False` |
+| `completion_kwargs` | `dict[str, Any]` | Extra args for `chat.completions.create` (e.g., temperature) | `{}` |
 
-**Example:**
+**Example (local or custom endpoint):**
 ```python
-# Use with local LLM
 from openai import AsyncOpenAI
 
-client = AsyncOpenAI(
-    base_url="http://localhost:11434/v1",  # Ollama
-    api_key="not-needed"
+openai_client = AsyncOpenAI(
+    base_url="http://localhost:11434/v1",  # e.g., Ollama
+    api_key="not-needed",
 )
 
-agent = GenericOpenAIChatAgent(
-    model_client=client,
-    model="llama3.1"
-)
+ agent = GenericOpenAIChatAgent(
+     openai_client=openai_client,
+     model_name="llama3.1",
+     parallel_tool_calls=False,
+     completion_kwargs={"temperature": 0.2},  # forwarded to OpenAI
+ )
 ```
 
 ### LangChainAgent
@@ -385,4 +387,3 @@ class MyCustomAgent(MCPAgent):
 - [Create Agents](/evaluate-agents/create-agents) - Tutorial on building agents
 - [Tasks](/reference/tasks) - Task configuration reference
 - [Architecture](/core-concepts/architecture) - How agents fit in HUD
-
 
@@ -190,7 +190,7 @@ agent = OperatorAgent(
 
 ## GenericOpenAIChatAgent
 
-Generic OpenAI-compatible chat agent for custom models.
+OpenAI-compatible chat.completions agent for custom models and endpoints.
 
 ```python
 from hud.agents import GenericOpenAIChatAgent
@@ -200,27 +200,48 @@ from hud.agents import GenericOpenAIChatAgent
 
 ```python
 GenericOpenAIChatAgent(
-    model: str = "gpt-4o",
-    base_url: str | None = None,
+    *,
+    openai_client: AsyncOpenAI,
+    model_name: str = "gpt-4o-mini",
+    parallel_tool_calls: bool = False,
+    completion_kwargs: dict[str, Any] | None = None,
     **kwargs
 )
 ```
 
-<ParamField body="model" type="str" default="gpt-4o">
-  Model name to use
+<!-- MCP client is optional via kwargs; typically auto-created from Task.mcp_config -->
+
+<ParamField body="openai_client" type="AsyncOpenAI" required>
+  OpenAI-compatible client (can target OpenAI, vLLM, Ollama, etc.)
 </ParamField>
 
-<ParamField body="base_url" type="str" optional>
-  Custom API endpoint for OpenAI-compatible servers
+<ParamField body="model_name" type="str" default="gpt-4o-mini">
+  Name of the chat model to use
+</ParamField>
+
+<ParamField body="parallel_tool_calls" type="bool" default="false">
+  Whether to execute multiple tool calls in a single step
+</ParamField>
+
+<ParamField body="completion_kwargs" type="dict[str, Any]" optional>
+  Extra keyword arguments forwarded to `openai.chat.completions.create` (e.g., `temperature`, `top_p`).
+  Core fields (`model`, `messages`, `tools`, `parallel_tool_calls`) are protected and cannot be overridden. Use this field to set `logprobs` if needed.
 </ParamField>
 
 ### Example
 
 ```python
-# Custom OpenAI-compatible model
+from openai import AsyncOpenAI
+
+openai_client = AsyncOpenAI(
+    base_url="http://localhost:8000/v1",  # Custom server
+    api_key="local-key",
+)
+
 agent = GenericOpenAIChatAgent(
-    model="custom-model",
-    base_url="http://localhost:8000/v1"
+    openai_client=openai_client,
+    model_name="custom-model",
+    completion_kwargs={"temperature": 0.1, "seed": 7},
 )
 ```
 
@@ -344,5 +365,3 @@ except Exception as e:
 <Card title="Environments API" icon="cube" href="/reference/api/environments">
   Environment and server API reference
 </Card>
-
-
 
@@ -20,6 +20,19 @@ python examples/01_hello_2048.py
 
 > | Requires Docker and `ANTHROPIC_API_KEY` environment variable.
 
+### 03_browser_agent_loop.py
+Quick start for the browser environment (Claude). Supports multiple demo apps.
+
+```bash
+# 2048 (default)
+python examples/03_browser_agent_loop.py
+
+# Todo app
+python examples/03_browser_agent_loop.py --app todo
+```
+
+> | Requires Docker (exposes port 8080) and `ANTHROPIC_API_KEY`.
+
 ## Core Patterns
 
 ### 02_agent_lifecycle.py
@@ -50,35 +63,12 @@ Using the legacy `mcp_use` client for multi-server setups.
 ### integration_otel.py
 Custom OpenTelemetry backend integration (e.g., Jaeger).
 
-## Prerequisites
-
-| Requirement | Used For |
-|-------------|----------|
-| Docker | Running environment containers |
-| `HUD_API_KEY` | Cloud deployments and telemetry |
-| `ANTHROPIC_API_KEY` | Claude agent examples |
+### openai_compatible_agent.py
+OpenAI-compatible chat.completions agent with both text and browser 2048 environments.
 
-## Common Pattern
-
-All examples follow this structure:
-
-```python
-import asyncio, hud
-from hud.datasets import Task
-from hud.agents import ClaudeAgent
-
-async def main():
-    with hud.trace("example-name"):
-        task = Task(
-            prompt="Your task here",
-            mcp_config={...}
-        )
-        
-        agent = ClaudeAgent()
-        result = await agent.run(task)
-        print(f"Reward: {result.reward}")
-
-asyncio.run(main())
+```bash
+export OPENAI_API_KEY=your-key           # or dummy value for local servers
+# export OPENAI_BASE_URL=http://localhost:8000/v1  # e.g., vllm
+python examples/openai_compatible_agent.py --mode text     # text environment
+python examples/openai_compatible_agent.py --mode browser  # browser environment
 ```
-
-> | The agent automatically creates an MCP client from `task.mcp_config` if none is provided.