You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -146,11 +146,18 @@ Agents are equipped with action-space awareness, employing systematic exploratio
146
146
### Integration with RL Tuning Frameworks
147
147
We integrate insights and methodologies from leading RL tuning frameworks, including:
148
148
149
-
-**Verl**
149
+
-**Verl** - **Integrated as Git Submodule** - Our primary RL framework, providing advanced training capabilities for agent optimization
150
150
-**TinyZero**
151
151
-**OpenR1**
152
152
-**Trlx**
153
153
154
+
### Verl Integration
155
+
The `verl` submodule is fully integrated into OpenManus-RL, providing:
156
+
-**Advanced RL Algorithms** - PPO, DPO, and custom reward modeling
157
+
-**Efficient Training** - Optimized for large language model fine-tuning
158
+
-**Flexible Configuration** - Easy customization of training parameters
159
+
-**Production Ready** - Battle-tested framework from Bytedance
160
+
154
161
Through these frameworks, agents can effectively balance exploration and exploitation, optimize reasoning processes, and adapt dynamically to novel environments.
155
162
156
163
In summary, our method systematically integrates advanced reasoning paradigms, diverse rollout strategies, sophisticated reward modeling, and robust RL frameworks, significantly advancing the capability and adaptability of reasoning-enhanced LLM agents.
@@ -208,6 +215,18 @@ We are still laboriously developing this part, welcome feedback.
208
215
209
216
## Installation
210
217
218
+
### Prerequisites
219
+
This project uses git submodules. After cloning the repository, make sure to initialize and update the submodules:
Use `--extra` to download pre-trained checkpoints and seq2seq data.
265
285
266
-
### Launching the WebShop Server
286
+
##Quick Start
267
287
268
-
After setting up the environment, you can launch the WebShop server:
288
+
### 1. Environment Setup
289
+
Make sure you have the required environments set up (see Environment Setup section above).
269
290
270
-
```bash
271
-
# Make sure the webshop conda environment is activated
272
-
conda activate webshop
273
-
274
-
# Launch the server (default port: 36001)
275
-
webshop --port 36001
276
-
```
291
+
### 2. Data Preparation
292
+
Download the OpenManus-RL dataset from [Hugging Face](https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL).
277
293
278
-
Note: The WebShop environment requires specific versions of Python, PyTorch, Faiss, and Java. The setup script will handle these dependencies automatically.
294
+
### 3. Training Examples
279
295
280
-
## Quick start
281
-
282
-
Train a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.
283
-
284
-
(1) Download the indexing and corpus.
285
-
286
-
From https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL
287
-
288
-
(3) Launch a local AgentGym server.
289
-
290
-
(4) Run RL training (PPO).
296
+
#### ALFWorld RL Training (PPO)
291
297
```bash
292
298
conda activate openmanus-rl
293
299
bash scripts/ppo_train/train_alfworld.sh
@@ -379,6 +385,19 @@ Please cite the following paper if you find OpenManus helpful!
0 commit comments