Skip to content

Commit ad8ed2d

Browse files
committed
Update readme and necessary codebase
1 parent 8719f47 commit ad8ed2d

File tree

1,681 files changed

+747476
-1156
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,681 files changed

+747476
-1156
lines changed

Readme.md

Lines changed: 23 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -218,92 +218,41 @@ Then, install the required dependencies:
218218

219219

220220
```bash
221-
pip install -r requirements.txt
222-
```
223-
224-
Supervised Fine-Tuning (SFT)
225-
226-
Basic Usage
227-
228-
To fine-tune a model on a single GPU:
221+
# install torch [or you can skip this step and let vllm to install the correct version for you]
222+
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
223+
# install vllm
224+
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
229225

226+
# verl
227+
pip install -e .
230228

231-
```bash
232-
python -m openmanus_rl.sft \
233-
--model_name_or_path Qwen/Qwen2.5-1.5B-Instruct \
234-
--dataset_name CharlieDreemur/OpenManus-RL \
235-
--learning_rate 2.0e-5 \
236-
--num_train_epochs 1 \
237-
--packing \
238-
--max_seq_length 4096 \
239-
--per_device_train_batch_size 2 \
240-
--gradient_accumulation_steps 8 \
241-
--gradient_checkpointing \
242-
--bf16 \
243-
--logging_steps 5 \
244-
--output_dir data/sft-output
229+
# flash attention 2
230+
pip3 install flash-attn --no-build-isolation
231+
pip install wandb
245232
```
246233

247-
Distributed Training with Accelerate
234+
## Quick start
248235

249-
For multi-GPU training using Accelerate:
236+
Train a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.
250237

238+
(1) Download the indexing and corpus.
251239

252-
```bash
253-
accelerate launch --config_file=configs/accelerate_configs/zero3.yaml openmanus_rl/sft.py \
254-
--model_name_or_path Qwen/Qwen2.5-1.5B-Instruct \
255-
--dataset_name CharlieDreemur/OpenManus-RL \
256-
--learning_rate 2.0e-5 \
257-
--num_train_epochs 1 \
258-
--packing \
259-
--max_seq_length 4096 \
260-
--per_device_train_batch_size 2 \
261-
--gradient_accumulation_steps 8 \
262-
--gradient_checkpointing \
263-
--bf16 \
264-
--logging_steps 5 \
265-
--output_dir data/sft-output
266-
```
267-
268-
## Gradient-based Reinforcement for Policy Optimization (GRPO) for agent tunning
269-
Basic Usage
270-
To fine-tune a model using GRPO on a single GPU:
240+
From https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL
271241

242+
(3) Launch a local AgentGym server.
272243
```bash
273-
python -m openmanus_rl.grpo \
274-
--model_name_or_path Qwen/Qwen2.5-1.5B-Instruct \
275-
--dataset_name CharlieDreemur/OpenManus-RL-GRPO \
276-
--learning_rate 2.0e-5 \
277-
--num_train_epochs 1 \
278-
--max_seq_length 4096 \
279-
--per_device_train_batch_size 2 \
280-
--gradient_accumulation_steps 8 \
281-
--gradient_checkpointing \
282-
--bf16 \
283-
--reward_funcs accuracy format tag_count \
284-
--logging_steps 5 \
285-
--output_dir data/grpo-output
244+
todo here
286245
```
287-
Distributed Training with Accelerate
288-
For multi-GPU training using Accelerate:
289246

247+
(4) Run RL training (PPO) with Llama-3.2-3b-base.
290248
```bash
291-
accelerate launch --config_file=configs/accelerate_configs/zero3.yaml openmanus_rl/grpo.py \
292-
--model_name_or_path Qwen/Qwen2.5-1.5B-Instruct \
293-
--dataset_name CharlieDreemur/OpenManus-RL-GRPO \
294-
--learning_rate 2.0e-5 \
295-
--num_train_epochs 1 \
296-
--max_seq_length 4096 \
297-
--per_device_train_batch_size 2 \
298-
--gradient_accumulation_steps 8 \
299-
--gradient_checkpointing \
300-
--bf16 \
301-
--reward_funcs accuracy format tag_count \
302-
--logging_steps 5 \
303-
--output_dir data/grpo-output
249+
conda activate openmanus-rl
250+
bash train_ppo.sh
304251
```
305252

306253

254+
255+
307256
# Related Work
308257

309258
## Agent tuning
@@ -347,6 +296,9 @@ accelerate launch --config_file=configs/accelerate_configs/zero3.yaml openmanus_
347296

348297
# Acknowledgement
349298
We extend our thanks to ulab-uiuc (https://ulab-uiuc.github.io/) and Openmanus (https://github.com/mannaandpoem/OpenManus)) team from MetaGPT for their support and shared knowledge. Their mission and community contributions help drive innovations like OpenManus forward.
299+
300+
We also want to thank AgentGym(https://agentgym.github.io/) and Verl (https://github.com/volcengine/verl) for their opensource.
301+
350302
We welcome all developers who are interested in this project can reach out to ([email protected])
351303

352304
Stay tuned for updates and the official release of our repository. Together, let's build a thriving open-source agent ecosystem!

0 commit comments

Comments
 (0)