Skip to content

Commit b03dc85

Browse files
authored
Merge pull request #79 from OpenManus/murphy/dev-0813
GAIA OpenManus Rollout
2 parents e75e0d7 + f3d77c9 commit b03dc85

File tree

45 files changed

+6148
-115
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+6148
-115
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,7 @@ verl.egg-info/
3535

3636
test_memory.md
3737

38-
trajectories/traj_*.json
38+
trajectories/
39+
40+
AGENTS.md
41+
CLAUDE.md

data/gaia/val.json

Lines changed: 1907 additions & 0 deletions
Large diffs are not rendered by default.

data/gaia/val.parquet

96.2 KB
Binary file not shown.

docs/DOCKER_SETUP.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# OpenManus-RL Docker Setup for AMD GPUs
2+
3+
This setup allows you to run OpenManus-RL alfworld rollouts in a Docker container without affecting your existing verl-agent environment.
4+
5+
## Prerequisites
6+
7+
- Docker installed and running
8+
- AMD GPU with ROCm support
9+
- The `verl-agent:rocm-snap1` Docker image (from your previous verl-agent setup)
10+
- Models stored in `/root/models/`
11+
12+
## Setup Instructions
13+
14+
### 1. Initial Setup
15+
16+
First, run the setup script to create and configure the Docker container:
17+
18+
```bash
19+
cd /root/OpenManus-RL
20+
./scripts/docker_setup.sh
21+
```
22+
23+
This will:
24+
- Create a new Docker container named `openmanus-rl`
25+
- Install all required dependencies
26+
- Set up a virtual environment at `/opt/openmanus-venv`
27+
- Port 8001 on host will map to 8000 in container (to avoid conflict with verl-agent)
28+
29+
### 2. Start/Access the Container
30+
31+
If you need to enter the container manually:
32+
33+
```bash
34+
docker exec -it openmanus-rl bash
35+
source /opt/openmanus-venv/bin/activate
36+
cd /workspace
37+
```
38+
39+
Then you can run commands directly.
40+
41+
### 3. Run Rollouts (Unified Script)
42+
43+
See ROLLOUT_GUIDE.md for detailed examples. A few quick starters:
44+
45+
- GAIA dry‑run:
46+
- `python scripts/rollout/unified_rollout.py --env gaia --batch_size 2 --total_envs 4 --dry_run`
47+
48+
- AlfWorld small run (OpenAI):
49+
- `python scripts/rollout/unified_rollout.py --env alfworld --model gpt-4o-mini --batch_size 1 --total_envs 2 --max_steps 20 --dump_path logs/alfworld/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`
50+
51+
- GAIA small run (local vLLM):
52+
- `./scripts/serve_model.sh` (in another shell)
53+
- `python scripts/rollout/unified_rollout.py --env gaia --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 --gaia_tools python_code_generator --batch_size 1 --total_envs 2 --max_steps 30 --dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`
54+
55+
### 4. Running GAIA (Tool-Use) Rollouts
56+
57+
GAIA uses the tool-use environment and the dataset in `data/gaia/val.json`. Some tools need extra API keys.
58+
59+
Required packages for common tools are already listed in `requirements_docker.txt` (requests, python-dotenv, wikipedia). For Google search, set:
60+
61+
```bash
62+
export GOOGLE_API_KEY=your-google-api-key
63+
export GOOGLE_CX=your-custom-search-engine-id
64+
```
65+
66+
There are two ways to run GAIA:
67+
68+
Use the unified script. Examples:
69+
70+
1) OpenAI API
71+
```bash
72+
export OPENAI_API_KEY="your-openai-api-key"
73+
python scripts/rollout/unified_rollout.py \
74+
--env gaia --model gpt-4o-mini \
75+
--gaia_tools python_code_generator \
76+
--total_envs 50 --batch_size 10 --max_steps 30 --concurrency 8 \
77+
--dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl \
78+
--chat_root /workspace
79+
```
80+
81+
2) Local model via vLLM (OpenAI-compatible)
82+
83+
First start the vLLM server (see above), then:
84+
```bash
85+
python scripts/rollout/unified_rollout.py \
86+
--env gaia --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 \
87+
--gaia_tools python_code_generator \
88+
--total_envs 50 --batch_size 10 --max_steps 30 --concurrency 8 \
89+
--dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl \
90+
--chat_root /workspace
91+
```
92+
93+
Notes:
94+
- Default GAIA tools used in examples: `python_code_generator`(避免外部 API 依赖)。
95+
- If a tool needs external access (web APIs), ensure the container has outbound network connectivity and env vars are set.
96+
- Chat histories and logs are saved under `logs/gaia` and `trajectories/<timestamp>/gaia/<model>/` when `--chat_root` is provided.
97+
98+
## Container Management
99+
100+
### Stop the container
101+
```bash
102+
docker stop openmanus-rl
103+
```
104+
105+
### Start the container again
106+
```bash
107+
docker start openmanus-rl
108+
```
109+
110+
### Remove the container
111+
```bash
112+
docker stop openmanus-rl
113+
docker rm openmanus-rl
114+
```
115+
116+
### Check container logs
117+
```bash
118+
docker logs openmanus-rl
119+
```
120+
121+
## Troubleshooting
122+
123+
### If vLLM fails to start
124+
1. Check GPU memory usage: `rocm-smi`
125+
2. Adjust `--gpu-memory-utilization` in `serve_model.sh`
126+
3. Make sure no other process is using port 8000 in the container
127+
128+
### If rollout fails
129+
1. Check that all dependencies are installed: `pip list`
130+
2. Verify AlfWorld data is downloaded: `ls ~/.cache/alfworld` or re‑run `alfworld-download -f`
131+
3. Check logs under `/workspace/logs/<env>/`
132+
133+
### Port conflicts
134+
- Default: container 8000 → host 8001 (configured by `docker_setup.sh`)
135+
- Adjust mapping via `-p` flag if needed.
136+
137+
## Output Files
138+
139+
- Trajectory files: `/root/OpenManus-RL/logs/alfworld/trajectory_*.jsonl`
140+
- Chat histories: `/root/OpenManus-RL/trajectories/<timestamp>/`
141+
- Log files: `/root/OpenManus-RL/logs/alfworld/run_log_*.log`

docs/ROLLOUT_GUIDE.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Rollout Guide (AlfWorld, GAIA, WebShop)
2+
3+
This guide shows how to run rollouts for the three environments using a single unified script. The script supports both OpenAI API and local OpenAI‑compatible endpoints (e.g., vLLM).
4+
5+
## Prerequisites
6+
7+
- Python venv prepared via Docker setup (see DOCKER_SETUP.md)
8+
- .env at repo root (auto‑loaded) for API keys:
9+
- `OPENAI_API_KEY` for OpenAI
10+
- Optional tool keys (e.g., GAIA Google tools): `GOOGLE_API_KEY`, `GOOGLE_CX`
11+
- For local inference (vLLM), start the server first (see DOCKER_SETUP.md or `serve_model.sh`).
12+
13+
## Unified Script
14+
15+
- Entry: `python scripts/rollout/unified_rollout.py`
16+
- Core flags:
17+
- `--env {alfworld,gaia,webshop}` choose environment
18+
- `--model <name>` model name (OpenAI or local)
19+
- `--base_url <url>` set when using local server (e.g., `http://127.0.0.1:8000/v1`)
20+
- `--batch_size`, `--total_envs`, `--max_steps`, `--concurrency`
21+
- `--dump_path <jsonl>` save trajectories
22+
- `--chat_root <dir>` save chat histories under `trajectories/<ts>/<env>/<model>/`
23+
- `--dry_run` plan batches without creating envs/calling models
24+
- `--unique_envs` ensure unique task/game sampling where supported
25+
26+
## GAIA
27+
28+
Data path default: `data/gaia/val.json`
29+
30+
- Dry‑run (no model calls):
31+
- `python scripts/rollout/openmanus_rollout.py --env gaia --batch_size 2 --total_envs 4 --dry_run`
32+
33+
- OpenAI small run (minimal tools):
34+
- `python scripts/rollout/openmanus_rollout.py \
35+
--env gaia --model gpt-4o \
36+
--gaia_tools python_code_generator \
37+
--batch_size 1 --total_envs 2 --max_steps 30 --concurrency 2 \
38+
--dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`
39+
40+
- Local vLLM small run:
41+
- `python scripts/rollout/openmanus_rollout.py \
42+
--env gaia --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 \
43+
--gaia_tools python_code_generator \
44+
--batch_size 1 --total_envs 2 --max_steps 30 --concurrency 2 \
45+
--dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`
46+
47+
## AlfWorld
48+
49+
Make sure AlfWorld is installed and game data downloaded (`alfworld-download -f`).
50+
51+
- Dry‑run (unique game files sampling):
52+
- `python scripts/rollout/unified_rollout.py --env alfworld --unique_envs --batch_size 2 --total_envs 4 --dry_run`
53+
54+
- OpenAI small run:
55+
- `python scripts/rollout/openmanus_rollout.py \
56+
--env alfworld --model gpt-4o \
57+
--batch_size 1 --total_envs 2 --max_steps 30 --concurrency 2 \
58+
--dump_path logs/alfworld/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`
59+
60+
- Local vLLM small run:
61+
- `python scripts/rollout/openmanus_rollout.py \
62+
--env alfworld --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 \
63+
--batch_size 1 --total_envs 2 --max_steps 20 --concurrency 2 \
64+
--dump_path logs/alfworld/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`
65+
66+
## WebShop (optional)
67+
68+
To run WebShop, follow data/index setup in DOCKER_SETUP.md, then use:
69+
70+
- Dry‑run:
71+
- `python scripts/rollout/openmanus_rollout.py --env webshop --batch_size 2 --total_envs 4 --dry_run`
72+
73+
- OpenAI:
74+
- `python scripts/rollout/openmanus_rollout.py \
75+
--env webshop --model gpt-4o \
76+
--batch_size 2 --total_envs 4 --max_steps 30 --concurrency 2 \
77+
--dump_path logs/webshop/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`
78+
79+
- Local vLLM:
80+
- `python scripts/rollout/openmanus_rollout.py \
81+
--env webshop --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 \
82+
--batch_size 2 --total_envs 4 --max_steps 30 --concurrency 2 \
83+
--dump_path logs/webshop/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`
84+
85+
## Outputs
86+
87+
- Logs: `logs/<env>/unified_run_*.log`
88+
- Trajectory: `--dump_path` JSONL
89+
- Chats: `trajectories/<timestamp>/<env>/<model>/` when `--chat_root` is set
90+

openmanus_rl/engines/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
"""LLM engine interfaces and factories.
2+
3+
This package provides lightweight wrappers around OpenAI-compatible
4+
chat completion APIs and a simple factory used by tool modules.
5+
"""
6+

openmanus_rl/engines/factory.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
"""Engine factory helpers.
2+
3+
Exposes `create_llm_engine` returning a callable that maps prompt -> text using
4+
the minimal `ChatOpenAI` wrapper. Keep the surface small and stable so tools
5+
can depend on it without heavy coupling.
6+
"""
7+
8+
from typing import Callable, Optional
9+
from .openai import ChatOpenAI
10+
11+
12+
def create_llm_engine(model_string: str = "gpt-4o-mini", is_multimodal: bool = False, base_url: Optional[str] = None) -> Callable[[str], str]:
13+
chat = ChatOpenAI(model=model_string, base_url=base_url)
14+
15+
def _engine(prompt: str) -> str:
16+
# Tools currently call engine(prompt) for text-only flows.
17+
# If multimodal is needed later, extend by adding optional image args.
18+
return chat(prompt)
19+
20+
return _engine
21+

openmanus_rl/engines/openai.py

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
"""Minimal OpenAI chat wrapper.
2+
3+
Provides a small surface compatible with internal code paths that expect
4+
`ChatOpenAI` with a callable interface. Supports OpenAI-compatible backends
5+
such as vLLM by honoring `OPENAI_BASE_URL`.
6+
"""
7+
8+
from typing import Optional, List, Dict, Any, Type
9+
import json
10+
import re
11+
try:
12+
from pydantic import BaseModel # type: ignore
13+
except Exception: # pragma: no cover
14+
BaseModel = object # type: ignore
15+
import os
16+
17+
try:
18+
from openai import OpenAI # type: ignore
19+
except Exception as exc: # pragma: no cover
20+
OpenAI = None # type: ignore
21+
22+
23+
class ChatOpenAI:
24+
"""Thin wrapper around OpenAI's Chat Completions API.
25+
26+
The instance is callable and returns plain text. Images are not sent as
27+
binary by design to remain compatible with OpenAI-compatible servers that
28+
do not support multimodal content; image paths are appended as text hints.
29+
"""
30+
31+
def __init__(
32+
self,
33+
model: str = "gpt-4o-mini",
34+
base_url: Optional[str] = None,
35+
api_key: Optional[str] = None,
36+
temperature: float = 0.0,
37+
) -> None:
38+
if OpenAI is None:
39+
raise RuntimeError("openai package is not installed")
40+
41+
self.model = model
42+
self.temperature = temperature
43+
self.base_url = base_url or os.getenv("OPENAI_BASE_URL")
44+
self.api_key = api_key or os.getenv("OPENAI_API_KEY", "EMPTY")
45+
self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
46+
47+
def __call__(
48+
self,
49+
prompt: str,
50+
images: Optional[List[str]] = None,
51+
system: Optional[str] = None,
52+
response_format: Optional[Type] = None,
53+
**_: Any,
54+
) -> Any:
55+
messages: List[Dict[str, Any]] = []
56+
if system:
57+
messages.append({"role": "system", "content": system})
58+
59+
if not images:
60+
messages.append({"role": "user", "content": prompt})
61+
else:
62+
# Safe multimodal fallback: append image paths as text hints.
63+
content = prompt
64+
for p in images:
65+
content += f"\n[Image: {p}]"
66+
messages.append({"role": "user", "content": content})
67+
68+
resp = self.client.chat.completions.create(
69+
model=self.model,
70+
messages=messages,
71+
temperature=self.temperature,
72+
n=1,
73+
)
74+
text = (resp.choices[0].message.content or "").strip()
75+
76+
# Best-effort structured parsing when a pydantic model is requested
77+
try:
78+
if response_format and isinstance(response_format, type) and issubclass(response_format, BaseModel):
79+
# Try JSON first
80+
try:
81+
data = json.loads(text)
82+
if isinstance(data, dict):
83+
return response_format(**data)
84+
if isinstance(data, list):
85+
# Common pattern: patch list
86+
payload: Dict[str, Any] = {}
87+
if hasattr(response_format, "model_fields") and "patch" in response_format.model_fields: # pydantic v2
88+
payload["patch"] = data
89+
elif hasattr(response_format, "__fields__") and "patch" in getattr(response_format, "__fields__"):
90+
payload["patch"] = data
91+
if payload:
92+
return response_format(**payload)
93+
except Exception:
94+
pass
95+
96+
# Special-case: AnswerVerification(analysis: str, true_false: bool)
97+
if getattr(response_format, "__name__", "") == "AnswerVerification":
98+
analysis = ""
99+
tf = False
100+
m = re.search(r"<analysis>\s*(.*?)\s*</analysis>", text, re.DOTALL)
101+
if m:
102+
analysis = m.group(1).strip()
103+
m2 = re.search(r"<true_false>\s*(.*?)\s*</true_false>", text, re.DOTALL)
104+
if m2:
105+
val = m2.group(1).strip().lower()
106+
tf = val in ("true", "1", "yes")
107+
if not analysis:
108+
analysis = text
109+
return response_format(analysis=analysis, true_false=tf)
110+
111+
# Fallback: try to populate known common fields
112+
payload: Dict[str, Any] = {}
113+
for field in ("analysis", "text"):
114+
if (hasattr(response_format, "model_fields") and field in response_format.model_fields) or (
115+
hasattr(response_format, "__fields__") and field in getattr(response_format, "__fields__")
116+
):
117+
payload[field] = text
118+
if payload:
119+
return response_format(**payload)
120+
except Exception:
121+
# Swallow parsing errors and return raw text
122+
pass
123+
124+
return text

0 commit comments

Comments
 (0)