Merge branch 'main' into zijia/dev

realtmxi · web-flow · commit 7f6f7f20b88b · 2025-08-28T23:35:19.000+08:00
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@ We are committed to regularly updating our exploration directions and results in
 
 We warmly welcome contributions from the broader community—join us in pushing the boundaries of agent reasoning and tool integration!
 
-Code and dataset coming soon! Stay tuned!
+Code and dataset are now available! The `verl` submodule has been integrated for enhanced RL training capabilities.
 
 <div style="display: flex; justify-content: center;">
   <div style="width: 100; transform: scale(1.0);">
@@ -59,7 +59,7 @@ Code and dataset coming soon! Stay tuned!
 
 
 ## Current Team Members
-[@Kunlun Zhu](https://github.com/Kunlun-Zhu)(Ulab-UIUC), [@Jiayi Zhang](https://github.com/didiforgithub)(MetaGPT), [@Xinbing Liang](https://github.com/mannaandpoem),[@Xiangxin Zhou](https://github.com/zhouxiangxin1998), [@Yanfei Zhang](https://github.com/yanfei-zhang-95), [@Yingxuan Yang](https://github.com/zoe-yyx), [@Zeping Chen](https://github.com/rxdaozhang),[@Weijia Zhang](https://github.com/CharlieDreemur), [@Muxin Tian](https://github.com/realtmxi), [@Haofei Yu](https://github.com/lwaekfjlk)(Ulab-UIUC), [@Jinyu Xiang](https://github.com/XiangJinyu), [@Yifan Wu](https://github.com/Evanwu50020), [@Bowen Jin](https://github.com/PeterGriffinJin), [@Blair Yang](https://github.com/blairyeung), [@Zijia Liu](https://m-serious.github.io/)
+[@Kunlun Zhu](https://github.com/Kunlun-Zhu)(Ulab-UIUC), [@Muxin Tian](https://github.com/realtmxi), [@Zijia Liu](https://m-serious.github.io/)(Ulab-UIUC), [@Yingxuan Yang](https://github.com/zoe-yyx),[@Jiayi Zhang](https://github.com/didiforgithub)(MetaGPT), [@Xinbing Liang](https://github.com/mannaandpoem), [@Weijia Zhang](https://github.com/CharlieDreemur), [@Haofei Yu](https://github.com/lwaekfjlk)(Ulab-UIUC), [@Cheng Qian](https://qiancheng0.github.io/),[@Bowen Jin](https://github.com/PeterGriffinJin), 
 
 ---
 
@@ -146,11 +146,18 @@ Agents are equipped with action-space awareness, employing systematic exploratio
 ### Integration with RL Tuning Frameworks
 We integrate insights and methodologies from leading RL tuning frameworks, including:
 
-- **Verl**
+- **Verl** - **Integrated as Git Submodule** - Our primary RL framework, providing advanced training capabilities for agent optimization
 - **TinyZero**
 - **OpenR1**
 - **Trlx**
 
+### Verl Integration
+The `verl` submodule is fully integrated into OpenManus-RL, providing:
+- **Advanced RL Algorithms** - PPO, DPO, and custom reward modeling
+- **Efficient Training** - Optimized for large language model fine-tuning
+- **Flexible Configuration** - Easy customization of training parameters
+- **Production Ready** - Battle-tested framework from Bytedance
+
 Through these frameworks, agents can effectively balance exploration and exploitation, optimize reasoning processes, and adapt dynamically to novel environments.
 
 In summary, our method systematically integrates advanced reasoning paradigms, diverse rollout strategies, sophisticated reward modeling, and robust RL frameworks, significantly advancing the capability and adaptability of reasoning-enhanced LLM agents.
@@ -208,6 +215,18 @@ We are still laboriously developing this part, welcome feedback.
 
 ## Installation
 
+### Prerequisites
+This project uses git submodules. After cloning the repository, make sure to initialize and update the submodules:
+
+```bash
+# Clone the repository with submodules
+git clone --recursive https://github.com/OpenManus/OpenManus-RL.git
+
+# Or if already cloned, initialize and update submodules
+git submodule update --init --recursive
+```
+
+### Environment Setup
 First, create a conda environment and activate it:
 
 ```bash
@@ -248,6 +267,7 @@ conda activate agentenv_webshop
 # Setup the environment
 bash ./setup.sh -d all
 ```
+
 ### 2. ALFWorld
 
 ```bash
@@ -263,31 +283,17 @@ alfworld-download -f
 ```
 Use `--extra` to download pre-trained checkpoints and seq2seq data.
 
-### Launching the WebShop Server
+## Quick Start
 
-After setting up the environment, you can launch the WebShop server:
+### 1. Environment Setup
+Make sure you have the required environments set up (see Environment Setup section above).
 
-```bash
-# Make sure the webshop conda environment is activated
-conda activate webshop
-
-# Launch the server (default port: 36001)
-webshop --port 36001
-```
+### 2. Data Preparation
+Download the OpenManus-RL dataset from [Hugging Face](https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL).
 
-Note: The WebShop environment requires specific versions of Python, PyTorch, Faiss, and Java. The setup script will handle these dependencies automatically.
+### 3. Training Examples
 
-## Quick start
-
-Train a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.
-
-(1) Download the indexing and corpus.
-
-From https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL
-
-(3) Launch a local AgentGym server.
-
-(4) Run RL training (PPO).
+#### ALFWorld RL Training (PPO)
 ```bash
 conda activate openmanus-rl
 bash scripts/ppo_train/train_alfworld.sh
@@ -379,6 +385,19 @@ Please cite the following paper if you find OpenManus helpful!
 </a>
 </p>
 
+## Project Structure
+
+```
+OpenManus-RL/
+├── verl/                    # Verl RL framework submodule
+├── openmanus_rl/           # Main OpenManus-RL library
+├── scripts/                # Training and evaluation scripts
+├── configs/                # Configuration files
+├── environments/           # Agent environment implementations
+├── docs/                   # Documentation
+└── examples/               # Usage examples
+```
+
 ## Documentation
 - [Development Guide (English)](docs/DEVELOPMENT_GUIDE_EN.md)
 - [Development Guide (Chinese)](docs/DEVELOPMENT_GUIDE_ZH.md)
diff --git a/requirements.txt b/requirements.txt
@@ -4,14 +4,21 @@ datasets
 dill
 flash-attn
 hydra-core
+liger-kernel
 numpy
 pandas
+peft
+pyarrow>=19.0.0
 pybind11
-ray
-tensordict<0.6
-transformers<4.48
-vllm<=0.6.3
+pylatexenc
+pre-commit
+ray[default]
+tensordict<=0.6.2
+torchdata
+transformers==4.51.1
+# vllm==0.8.4
 wandb
-IPython
-matplotlib
-omegaconf
+packaging>=20.0
+uvicorn
+fastapi
+qwen-vl-utils[decord]
diff --git a/setup.py b/setup.py
@@ -13,42 +13,81 @@
 # limitations under the License.
 
 # setup.py is the fallback installation script when pyproject.toml does not work
-from setuptools import setup, find_packages
 import os
+from pathlib import Path
+
+from setuptools import find_packages, setup
 
 version_folder = os.path.dirname(os.path.join(os.path.abspath(__file__)))
 
-with open(os.path.join(version_folder, 'verl/version/version')) as f:
+with open(os.path.join(version_folder, "verl/version/version")) as f:
     __version__ = f.read().strip()
 
+install_requires = [
+    "accelerate",
+    "codetiming",
+    "datasets",
+    "dill",
+    "hydra-core",
+    "numpy",
+    "pandas",
+    "peft",
+    "pyarrow>=19.0.0",
+    "pybind11",
+    "pylatexenc",
+    "ray[default]>=2.41.0",
+    "torchdata",
+    "tensordict<=0.6.2",
+    "transformers<=4.51.1",
+    "wandb",
+    "packaging>=20.0",
+    "qwen-vl-utils[decord]",
+]
 
-with open('requirements.txt') as f:
-    required = f.read().splitlines()
-    install_requires = [item.strip() for item in required if item.strip()[0] != '#']
+TEST_REQUIRES = ["pytest", "pre-commit", "py-spy"]
+PRIME_REQUIRES = ["pyext"]
+GEO_REQUIRES = ["mathruler"]
+GPU_REQUIRES = ["liger-kernel", "flash-attn"]
+MATH_REQUIRES = ["math-verify"]  # Add math-verify as an optional dependency
+VLLM_REQUIRES = ["tensordict<=0.6.2", "vllm<=0.8.5"]
+SGLANG_REQUIRES = [
+    "tensordict<=0.6.2",
+    "sglang[srt,openai]==0.4.6.post5",
+    "torch-memory-saver>=0.0.5",
+    "torch==2.6.0",
+]
 
 extras_require = {
-    'test': ['pytest', 'yapf']
+    "test": TEST_REQUIRES,
+    "prime": PRIME_REQUIRES,
+    "geo": GEO_REQUIRES,
+    "gpu": GPU_REQUIRES,
+    "math": MATH_REQUIRES,
+    "vllm": VLLM_REQUIRES,
+    "sglang": SGLANG_REQUIRES,
 }
 
-from pathlib import Path
+
 this_directory = Path(__file__).parent
 long_description = (this_directory / "README.md").read_text()
 
 setup(
-    name='verl',
+    name="verl",
     version=__version__,
-    package_dir={'': '.'},
-    packages=find_packages(where='.'),
-    url='https://github.com/volcengine/verl',
-    license='Apache 2.0',
-    author='Bytedance - Seed - MLSys',
-    author_email='zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk',
-    description='veRL: Volcano Engine Reinforcement Learning for LLM',
+    package_dir={"": "."},
+    packages=find_packages(where="."),
+    url="https://github.com/volcengine/verl",
+    license="Apache 2.0",
+    author="Bytedance - Seed - MLSys",
+    author_email="zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk",
+    description="verl: Volcano Engine Reinforcement Learning for LLM",
     install_requires=install_requires,
     extras_require=extras_require,
-    package_data={'': ['version/*'],
-                  'verl': ['trainer/config/*.yaml'],},
+    package_data={
+        "": ["version/*"],
+        "verl": ["trainer/config/*.yaml"],
+    },
     include_package_data=True,
     long_description=long_description,
-    long_description_content_type='text/markdown'
+    long_description_content_type="text/markdown",
 )