Skip to content

Commit fe45908

Browse files
committed
feat: vllm speedup evaluation and evluation guide
1 parent cc6e9d4 commit fe45908

File tree

5 files changed

+574
-147
lines changed

5 files changed

+574
-147
lines changed

docs/EVALUATION_GUIDE_EN.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Evaluation Guide
2+
3+
This guide explains how to quickly set up and run WebShop environment evaluations using vLLM for accelerated throughput, as well as fallback options for the classic (slower) evaluation scripts.
4+
5+
---
6+
7+
## Prerequisites
8+
9+
1. **Python 3.8+** installed on your system.
10+
2. **pip** package manager.
11+
3. **vLLM Server** (OpenAI-compatible) or local vLLM endpoint.
12+
4. **AgentENV–WebShop** service.
13+
14+
---
15+
16+
## 1. Install and Prepare the Package
17+
18+
From the root of the repository:
19+
20+
```bash
21+
# Install the agentgym package in editable mode
22+
cd openmanus_rl/agentgym/agentenv/
23+
pip install -e .
24+
```
25+
26+
This ensures `agentenv` is available on your `PYTHONPATH` without extra hacks.
27+
28+
---
29+
30+
## 2. Start the WebShop Environment Server
31+
32+
1. Navigate to the WebShop service directory:
33+
34+
```bash
35+
cd agentenv/agentenv-webshop
36+
```
37+
38+
2. Launch the server on port **36001** (or your preferred port):
39+
40+
```bash
41+
webshop --host 0.0.0.0 --port 36001
42+
```
43+
44+
Leave this process running in the background or in a separate terminal.
45+
46+
---
47+
48+
## 3. Run vLLM‑Accelerated Evaluation
49+
50+
1. Execute the helper script to start or configure your vLLM server:
51+
52+
```bash
53+
bash openmanus_rl/evaluation/run_vllm.sh
54+
```
55+
56+
2. Ensure your `PYTHONPATH` includes the `agentenv` package:
57+
58+
```bash
59+
export PYTHONPATH=./openmanus_rl/agentgym/agentenv:$PYTHONPATH
60+
```
61+
62+
3. Launch the evaluation driver:
63+
64+
```bash
65+
python openmanus_rl/evaluation/vllm_eval_webshop.py
66+
```
67+
68+
This will run tasks against the WebShop environment via your vLLM endpoint for maximum speed.
69+
70+
---
71+
72+
## 4. Legacy Evaluation Scripts (Slower)
73+
74+
If you prefer the classic evaluation (single‑model, non‑vLLM), use one of these scripts:
75+
76+
* **Basic (single‑threaded):**
77+
78+
```bash
79+
bash agentgym/agentenv/examples/basic/base_eval_webshop.sh
80+
```
81+
82+
* **Distributed (multi‑worker):**
83+
84+
```bash
85+
bash agentgym/agentenv/examples/distributed_eval_scripts/distributed_eval_webshop.sh
86+
```
87+
88+
Expect these to run significantly slower than the vLLM‑driven workflow.
89+
90+
---
91+
92+
## 5. Troubleshooting
93+
94+
* **ModuleNotFoundError:**
95+
Make sure you ran `pip install -e .` in `openmanus_rl/agentgym` and removed any stale `PYTHONPATH` overrides.
96+
97+
* **Port conflicts on 36001:**
98+
Either kill the process using that port or choose a different port and update both the WebShop server and evaluation script arguments.
99+
100+
* **vLLM connection errors:**
101+
Verify that your vLLM server is up (`run_vllm.sh` logs) and that the `base_url` in your script matches its endpoint.
102+
103+
---
104+
105+
## License
106+
107+
This project is licensed under the MIT License. See [LICENSE](../../LICENSE) for details.
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from .agent import Agent
22
from .env import BaseEnvClient, StepOutput
3-
from .task import BaseTask, ConversationMessage, TokenizedConversationOutput
3+
# from .task import BaseTask, ConversationMessage, TokenizedConversationOutput
4+
from .task import BaseTask, ConversationMessage
45
from .utils import Evaluator

0 commit comments

Comments
 (0)