Skip to content

Commit f4e3063

Browse files
TongLi3701Tong Liduanjunwen
authored
[Ascend] Update README (#6331)
* update readme * [fix] add vllm & vllm-ascend installation --------- Co-authored-by: Tong Li <[email protected]> Co-authored-by: duanjunwen <[email protected]>
1 parent e1ca2d2 commit f4e3063

File tree

1 file changed

+29
-2
lines changed
  • applications/ColossalChat/coati/distributed

1 file changed

+29
-2
lines changed

applications/ColossalChat/coati/distributed/README.md

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
This repository implements a distributed Reinforcement Learning (RL) training framework designed to fine-tune large language models using algorithms such as **GRPO** and **DAPO**. It supports multi-node and multi-GPU setups, scalable rollout generation, and policy optimization using libraries like VLLM.
44

5+
**Please note that we are still under intensive development, stay tuned.**
6+
57
---
68

79
## 🚀 Features
@@ -28,6 +30,15 @@ pip install -e .
2830
cd ./applications/ColossalChat
2931
pip install -e .
3032
```
33+
34+
Install vllm and vllm-ascend
35+
```bash
36+
apt update -y
37+
apt install -y libnuma-dev
38+
pip install vllm==0.7.3
39+
pip install vllm-ascend==0.7.3 --extra-index https://download.pytorch.org/whl/cpu/
40+
```
41+
3142
Install Fuyao Ray.
3243
Please update CANN before install fuyao ray
3344
```bash
@@ -85,6 +96,23 @@ export HCCL_SOCKET_IFNAME=eno0
8596
export RAY_COLLECTIVE_MEET_TIMEOUT_SECONDS=7200
8697
```
8798

99+
100+
## Architecture Design
101+
102+
<div align="center">
103+
<p align="center">
104+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/producer-consumer-pattern.png" width=700/>
105+
</p>
106+
</div>
107+
Producer-Consumer Pattern: a classic software design pattern used for managing resources, data, or tasks between two different processes or threads.
108+
109+
* Producer: inference engine which rollouts out examples and saves them into a shared buffer.
110+
* Consumer: training framework which takes training examples from the shared buffer and train the policy model.
111+
112+
Key features for Producer-Consumer Pattern:
113+
* Buffer: Acts as a shared queue where the producer adds data and the consumer removes data.
114+
* Concurrency: Rollout and training can work concurrently.
115+
88116
## 🧠 Data Format
89117

90118
Each data sample in the training or evaluation `.jsonl` file should follow this format:
@@ -287,5 +315,4 @@ python rl_example.py
287315
```
288316

289317
## Acknowledgement
290-
291-
---
318+
Colossal-RL is a distributed version of ColossalChat and inspired by a few awesome open-source projects. We would like to express our gratitude to the Fuyao-ray team and the vllm-ascend team for their support throughout the development of the this project. We also thank the following awesome open-source projects and algorithms: GRPO, DAPO, TRL, Verl, OpenRLHF, StreamRL, Qwen, Logic-RL.

0 commit comments

Comments
 (0)