You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: applications/ColossalChat/coati/distributed/README.md
+29-2Lines changed: 29 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,8 @@
2
2
3
3
This repository implements a distributed Reinforcement Learning (RL) training framework designed to fine-tune large language models using algorithms such as **GRPO** and **DAPO**. It supports multi-node and multi-GPU setups, scalable rollout generation, and policy optimization using libraries like VLLM.
4
4
5
+
**Please note that we are still under intensive development, stay tuned.**
Producer-Consumer Pattern: a classic software design pattern used for managing resources, data, or tasks between two different processes or threads.
108
+
109
+
* Producer: inference engine which rollouts out examples and saves them into a shared buffer.
110
+
* Consumer: training framework which takes training examples from the shared buffer and train the policy model.
111
+
112
+
Key features for Producer-Consumer Pattern:
113
+
* Buffer: Acts as a shared queue where the producer adds data and the consumer removes data.
114
+
* Concurrency: Rollout and training can work concurrently.
115
+
88
116
## 🧠 Data Format
89
117
90
118
Each data sample in the training or evaluation `.jsonl` file should follow this format:
@@ -287,5 +315,4 @@ python rl_example.py
287
315
```
288
316
289
317
## Acknowledgement
290
-
291
-
---
318
+
Colossal-RL is a distributed version of ColossalChat and inspired by a few awesome open-source projects. We would like to express our gratitude to the Fuyao-ray team and the vllm-ascend team for their support throughout the development of the this project. We also thank the following awesome open-source projects and algorithms: GRPO, DAPO, TRL, Verl, OpenRLHF, StreamRL, Qwen, Logic-RL.
0 commit comments