You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -146,11 +146,18 @@ Agents are equipped with action-space awareness, employing systematic exploratio
146
146
### Integration with RL Tuning Frameworks
147
147
We integrate insights and methodologies from leading RL tuning frameworks, including:
148
148
149
-
-**Verl**
149
+
-**Verl** - **Integrated as Git Submodule** - Our primary RL framework, providing advanced training capabilities for agent optimization
150
150
-**TinyZero**
151
151
-**OpenR1**
152
152
-**Trlx**
153
153
154
+
### Verl Integration
155
+
The `verl` submodule is fully integrated into OpenManus-RL, providing:
156
+
-**Advanced RL Algorithms** - PPO, DPO, and custom reward modeling
157
+
-**Efficient Training** - Optimized for large language model fine-tuning
158
+
-**Flexible Configuration** - Easy customization of training parameters
159
+
-**Production Ready** - Battle-tested framework from Bytedance
160
+
154
161
Through these frameworks, agents can effectively balance exploration and exploitation, optimize reasoning processes, and adapt dynamically to novel environments.
155
162
156
163
In summary, our method systematically integrates advanced reasoning paradigms, diverse rollout strategies, sophisticated reward modeling, and robust RL frameworks, significantly advancing the capability and adaptability of reasoning-enhanced LLM agents.
@@ -208,6 +215,18 @@ We are still laboriously developing this part, welcome feedback.
208
215
209
216
## Installation
210
217
218
+
### Prerequisites
219
+
This project uses git submodules. After cloning the repository, make sure to initialize and update the submodules:
# Or if already cloned, initialize and update submodules
226
+
git submodule update --init --recursive
227
+
```
228
+
229
+
### Environment Setup
211
230
First, create a conda environment and activate it:
212
231
213
232
```bash
@@ -277,17 +296,17 @@ webshop --port 36001
277
296
278
297
Note: The WebShop environment requires specific versions of Python, PyTorch, Faiss, and Java. The setup script will handle these dependencies automatically.
279
298
280
-
## Quick start
299
+
## Quick Start
281
300
282
-
Train a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.
301
+
### 1. Environment Setup
302
+
Make sure you have the required environments set up (see Environment Setup section above).
283
303
284
-
(1) Download the indexing and corpus.
304
+
### 2. Data Preparation
305
+
Download the OpenManus-RL dataset from [Hugging Face](https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL).
285
306
286
-
From https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL
307
+
### 3. Training Examples
287
308
288
-
(3) Launch a local AgentGym server.
289
-
290
-
(4) Run RL training (PPO).
309
+
#### ALFWorld RL Training (PPO)
291
310
```bash
292
311
conda activate openmanus-rl
293
312
bash scripts/ppo_train/train_alfworld.sh
@@ -379,6 +398,19 @@ Please cite the following paper if you find OpenManus helpful!
0 commit comments