Skip to content

Commit 535eba8

Browse files
committed
update readme
1 parent 40b6a91 commit 535eba8

File tree

2 files changed

+74
-0
lines changed

2 files changed

+74
-0
lines changed

applications/ColossalChat/coati/distributed/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ This repository implements a distributed Reinforcement Learning (RL) training fr
1414
* **Rollout and Policy Decoupling**: Efficient generation and consumption of data through parallel inferencer-trainer architecture.
1515
* **Evaluation Integration**: Easily plug in task-specific eval datasets.
1616
* **Checkpoints and Logging**: Configurable intervals and directories.
17+
* **[New]**: Zero Bubble training framework that supports GRPO and DAPO. [(read more)](./zero_bubble/README.md)
1718

1819
---
1920

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Zero Bubble Distributed RL Framework for Language Model Fine-Tuning
2+
3+
This folder contains code for the Zero Bubble distributed RL framework. It currently supports **GRPO** and **DAPO**. See the [main README](../README.md) for general installation instructions and usage.
4+
5+
**Note:** This project is under active development — expect changes.
6+
7+
## 🛠 Installation
8+
9+
1. Follow the general installation guide in the [main README](../README.md).
10+
2. Install [pygloo](https://github.com/ray-project/pygloo). Build pygloo for Ray from source following the instructions in its repository README.
11+
12+
## Design idea
13+
14+
We aim to reduce the *“bubble”* — the idle time that occurs between rollouts and training steps (illustrated in Fig. 1).
15+
16+
<div align="center">
17+
<p align="center">
18+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/all_sync.png" width=700/>
19+
</p>
20+
</div>
21+
22+
**Fig. 1** - In an all-sync online RL framework, rollout workers wait for the trainer to finish training and synchronize weights, and the trainer waits for rollouts. This causes large GPU idle time.
23+
24+
<div align="center">
25+
<p align="center">
26+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/zero_bubble.png" width=700/>
27+
</p>
28+
</div>
29+
30+
**Fig. 2** - Our Zero Bubble pipeline follows a producer–consumer pattern:
31+
32+
* A global **data buffer** temporarily stores rollouts produced by inference workers.
33+
* A **weights distributor** buffers updated model weights and distributes them to inference workers.
34+
* When the data buffer has enough data, the trainer continuously consumes from it and pushes updated weights to the weights distributor.
35+
* After finishing a mini-batch, each inference worker checks the weights distributor and synchronizes to a newer weight version if available.
36+
37+
Under ideal conditions (inference workers produce data at the same rate the trainer consumes it), the pipeline eliminates idle time. We call it *zero bubble* because, with an unlimited data buffer, inference and training can run indefinitely without waiting. In practice, to avoid wasted compute and stale/off-policy data, we set a bounded buffer size so inference workers will briefly wait when the buffer is full.
38+
39+
## Usage
40+
41+
In addition to the general parameters (see the main README), the Zero Bubble pipeline introduces one additional parameter:
42+
43+
* **`data_actor_buffer_size_limit`** - Maximum number of rollout batches the data buffer may hold. Defaults to **twice** the trainer’s mini-batch size. Avoid setting this too large — a very large buffer increases off-policy training. For DAPO, since only effective prompts count, you may need to raise `data_actor_buffer_size_limit` depending on sample utility.
44+
45+
Example: RL training on 8 GPUs with Zero Bubble (zero2)
46+
47+
```bash
48+
python rl_example_zero_bubble.py \
49+
--dataset /path/to/your/dataset.jsonl \
50+
--model /path/to/your/model \
51+
-t 4 -i 4 -b vllm -a DAPO \
52+
-imbs 8 -ibs 8 -tbs 8 -e 2 -rt boxed \
53+
-si 25 -s "Please reason step by step, and put your final answer within \\boxed{}." \
54+
-tMbs 2 -tmbs 2 -p Rebase_Experiments -zero 2 -mpt 512 -mnt 3584
55+
```
56+
57+
## Performance
58+
59+
<div align="center">
60+
<p align="center">
61+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/zero_bubble_gpu_util.png" width=700/>
62+
</p>
63+
</div>
64+
65+
**Fig. 3** - Performance of the Zero Bubble pipeline tested with an unlimited buffer size.
66+
67+
---
68+
69+
If you'd like, I can:
70+
71+
* Produce a short "What changed" summary for the repo (listing grammar/clarity edits).
72+
* Create a compact one-paragraph summary for the project page.
73+
* Convert this into a prettier doc with badges, table of contents, or a changelog. Which would you prefer?

0 commit comments

Comments
 (0)