Skip to content

Commit 7f6f7f2

Browse files
authored
Merge branch 'main' into zijia/dev
2 parents 1c44610 + dd42ddb commit 7f6f7f2

File tree

3 files changed

+114
-49
lines changed

3 files changed

+114
-49
lines changed

README.md

Lines changed: 43 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ We are committed to regularly updating our exploration directions and results in
99

1010
We warmly welcome contributions from the broader community—join us in pushing the boundaries of agent reasoning and tool integration!
1111

12-
Code and dataset coming soon! Stay tuned!
12+
Code and dataset are now available! The `verl` submodule has been integrated for enhanced RL training capabilities.
1313

1414
<div style="display: flex; justify-content: center;">
1515
<div style="width: 100; transform: scale(1.0);">
@@ -59,7 +59,7 @@ Code and dataset coming soon! Stay tuned!
5959

6060

6161
## Current Team Members
62-
[@Kunlun Zhu](https://github.com/Kunlun-Zhu)(Ulab-UIUC), [@Jiayi Zhang](https://github.com/didiforgithub)(MetaGPT), [@Xinbing Liang](https://github.com/mannaandpoem),[@Xiangxin Zhou](https://github.com/zhouxiangxin1998), [@Yanfei Zhang](https://github.com/yanfei-zhang-95), [@Yingxuan Yang](https://github.com/zoe-yyx), [@Zeping Chen](https://github.com/rxdaozhang),[@Weijia Zhang](https://github.com/CharlieDreemur), [@Muxin Tian](https://github.com/realtmxi), [@Haofei Yu](https://github.com/lwaekfjlk)(Ulab-UIUC), [@Jinyu Xiang](https://github.com/XiangJinyu), [@Yifan Wu](https://github.com/Evanwu50020), [@Bowen Jin](https://github.com/PeterGriffinJin), [@Blair Yang](https://github.com/blairyeung), [@Zijia Liu](https://m-serious.github.io/)
62+
[@Kunlun Zhu](https://github.com/Kunlun-Zhu)(Ulab-UIUC), [@Muxin Tian](https://github.com/realtmxi), [@Zijia Liu](https://m-serious.github.io/)(Ulab-UIUC), [@Yingxuan Yang](https://github.com/zoe-yyx),[@Jiayi Zhang](https://github.com/didiforgithub)(MetaGPT), [@Xinbing Liang](https://github.com/mannaandpoem), [@Weijia Zhang](https://github.com/CharlieDreemur), [@Haofei Yu](https://github.com/lwaekfjlk)(Ulab-UIUC), [@Cheng Qian](https://qiancheng0.github.io/),[@Bowen Jin](https://github.com/PeterGriffinJin),
6363

6464
---
6565

@@ -146,11 +146,18 @@ Agents are equipped with action-space awareness, employing systematic exploratio
146146
### Integration with RL Tuning Frameworks
147147
We integrate insights and methodologies from leading RL tuning frameworks, including:
148148

149-
- **Verl**
149+
- **Verl** - **Integrated as Git Submodule** - Our primary RL framework, providing advanced training capabilities for agent optimization
150150
- **TinyZero**
151151
- **OpenR1**
152152
- **Trlx**
153153

154+
### Verl Integration
155+
The `verl` submodule is fully integrated into OpenManus-RL, providing:
156+
- **Advanced RL Algorithms** - PPO, DPO, and custom reward modeling
157+
- **Efficient Training** - Optimized for large language model fine-tuning
158+
- **Flexible Configuration** - Easy customization of training parameters
159+
- **Production Ready** - Battle-tested framework from Bytedance
160+
154161
Through these frameworks, agents can effectively balance exploration and exploitation, optimize reasoning processes, and adapt dynamically to novel environments.
155162

156163
In summary, our method systematically integrates advanced reasoning paradigms, diverse rollout strategies, sophisticated reward modeling, and robust RL frameworks, significantly advancing the capability and adaptability of reasoning-enhanced LLM agents.
@@ -208,6 +215,18 @@ We are still laboriously developing this part, welcome feedback.
208215

209216
## Installation
210217

218+
### Prerequisites
219+
This project uses git submodules. After cloning the repository, make sure to initialize and update the submodules:
220+
221+
```bash
222+
# Clone the repository with submodules
223+
git clone --recursive https://github.com/OpenManus/OpenManus-RL.git
224+
225+
# Or if already cloned, initialize and update submodules
226+
git submodule update --init --recursive
227+
```
228+
229+
### Environment Setup
211230
First, create a conda environment and activate it:
212231

213232
```bash
@@ -248,6 +267,7 @@ conda activate agentenv_webshop
248267
# Setup the environment
249268
bash ./setup.sh -d all
250269
```
270+
251271
### 2. ALFWorld
252272

253273
```bash
@@ -263,31 +283,17 @@ alfworld-download -f
263283
```
264284
Use `--extra` to download pre-trained checkpoints and seq2seq data.
265285

266-
### Launching the WebShop Server
286+
## Quick Start
267287

268-
After setting up the environment, you can launch the WebShop server:
288+
### 1. Environment Setup
289+
Make sure you have the required environments set up (see Environment Setup section above).
269290

270-
```bash
271-
# Make sure the webshop conda environment is activated
272-
conda activate webshop
273-
274-
# Launch the server (default port: 36001)
275-
webshop --port 36001
276-
```
291+
### 2. Data Preparation
292+
Download the OpenManus-RL dataset from [Hugging Face](https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL).
277293

278-
Note: The WebShop environment requires specific versions of Python, PyTorch, Faiss, and Java. The setup script will handle these dependencies automatically.
294+
### 3. Training Examples
279295

280-
## Quick start
281-
282-
Train a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.
283-
284-
(1) Download the indexing and corpus.
285-
286-
From https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL
287-
288-
(3) Launch a local AgentGym server.
289-
290-
(4) Run RL training (PPO).
296+
#### ALFWorld RL Training (PPO)
291297
```bash
292298
conda activate openmanus-rl
293299
bash scripts/ppo_train/train_alfworld.sh
@@ -379,6 +385,19 @@ Please cite the following paper if you find OpenManus helpful!
379385
</a>
380386
</p>
381387

388+
## Project Structure
389+
390+
```
391+
OpenManus-RL/
392+
├── verl/ # Verl RL framework submodule
393+
├── openmanus_rl/ # Main OpenManus-RL library
394+
├── scripts/ # Training and evaluation scripts
395+
├── configs/ # Configuration files
396+
├── environments/ # Agent environment implementations
397+
├── docs/ # Documentation
398+
└── examples/ # Usage examples
399+
```
400+
382401
## Documentation
383402
- [Development Guide (English)](docs/DEVELOPMENT_GUIDE_EN.md)
384403
- [Development Guide (Chinese)](docs/DEVELOPMENT_GUIDE_ZH.md)

requirements.txt

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,21 @@ datasets
44
dill
55
flash-attn
66
hydra-core
7+
liger-kernel
78
numpy
89
pandas
10+
peft
11+
pyarrow>=19.0.0
912
pybind11
10-
ray
11-
tensordict<0.6
12-
transformers<4.48
13-
vllm<=0.6.3
13+
pylatexenc
14+
pre-commit
15+
ray[default]
16+
tensordict<=0.6.2
17+
torchdata
18+
transformers==4.51.1
19+
# vllm==0.8.4
1420
wandb
15-
IPython
16-
matplotlib
17-
omegaconf
21+
packaging>=20.0
22+
uvicorn
23+
fastapi
24+
qwen-vl-utils[decord]

setup.py

Lines changed: 57 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -13,42 +13,81 @@
1313
# limitations under the License.
1414

1515
# setup.py is the fallback installation script when pyproject.toml does not work
16-
from setuptools import setup, find_packages
1716
import os
17+
from pathlib import Path
18+
19+
from setuptools import find_packages, setup
1820

1921
version_folder = os.path.dirname(os.path.join(os.path.abspath(__file__)))
2022

21-
with open(os.path.join(version_folder, 'verl/version/version')) as f:
23+
with open(os.path.join(version_folder, "verl/version/version")) as f:
2224
__version__ = f.read().strip()
2325

26+
install_requires = [
27+
"accelerate",
28+
"codetiming",
29+
"datasets",
30+
"dill",
31+
"hydra-core",
32+
"numpy",
33+
"pandas",
34+
"peft",
35+
"pyarrow>=19.0.0",
36+
"pybind11",
37+
"pylatexenc",
38+
"ray[default]>=2.41.0",
39+
"torchdata",
40+
"tensordict<=0.6.2",
41+
"transformers<=4.51.1",
42+
"wandb",
43+
"packaging>=20.0",
44+
"qwen-vl-utils[decord]",
45+
]
2446

25-
with open('requirements.txt') as f:
26-
required = f.read().splitlines()
27-
install_requires = [item.strip() for item in required if item.strip()[0] != '#']
47+
TEST_REQUIRES = ["pytest", "pre-commit", "py-spy"]
48+
PRIME_REQUIRES = ["pyext"]
49+
GEO_REQUIRES = ["mathruler"]
50+
GPU_REQUIRES = ["liger-kernel", "flash-attn"]
51+
MATH_REQUIRES = ["math-verify"] # Add math-verify as an optional dependency
52+
VLLM_REQUIRES = ["tensordict<=0.6.2", "vllm<=0.8.5"]
53+
SGLANG_REQUIRES = [
54+
"tensordict<=0.6.2",
55+
"sglang[srt,openai]==0.4.6.post5",
56+
"torch-memory-saver>=0.0.5",
57+
"torch==2.6.0",
58+
]
2859

2960
extras_require = {
30-
'test': ['pytest', 'yapf']
61+
"test": TEST_REQUIRES,
62+
"prime": PRIME_REQUIRES,
63+
"geo": GEO_REQUIRES,
64+
"gpu": GPU_REQUIRES,
65+
"math": MATH_REQUIRES,
66+
"vllm": VLLM_REQUIRES,
67+
"sglang": SGLANG_REQUIRES,
3168
}
3269

33-
from pathlib import Path
70+
3471
this_directory = Path(__file__).parent
3572
long_description = (this_directory / "README.md").read_text()
3673

3774
setup(
38-
name='verl',
75+
name="verl",
3976
version=__version__,
40-
package_dir={'': '.'},
41-
packages=find_packages(where='.'),
42-
url='https://github.com/volcengine/verl',
43-
license='Apache 2.0',
44-
author='Bytedance - Seed - MLSys',
45-
46-
description='veRL: Volcano Engine Reinforcement Learning for LLM',
77+
package_dir={"": "."},
78+
packages=find_packages(where="."),
79+
url="https://github.com/volcengine/verl",
80+
license="Apache 2.0",
81+
author="Bytedance - Seed - MLSys",
82+
83+
description="verl: Volcano Engine Reinforcement Learning for LLM",
4784
install_requires=install_requires,
4885
extras_require=extras_require,
49-
package_data={'': ['version/*'],
50-
'verl': ['trainer/config/*.yaml'],},
86+
package_data={
87+
"": ["version/*"],
88+
"verl": ["trainer/config/*.yaml"],
89+
},
5190
include_package_data=True,
5291
long_description=long_description,
53-
long_description_content_type='text/markdown'
92+
long_description_content_type="text/markdown",
5493
)

0 commit comments

Comments
 (0)