Skip to content

Commit 7ff8661

Browse files
Fsdp (#268)
* update with miles * adding FSDP in miles * delete link * add ack * folding details * solve miles * fix complie --------- Co-authored-by: zhaochenyang20 <[email protected]>
1 parent 66e5de8 commit 7ff8661

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

blog/2025-12-03-miles-fsdp.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ The robust FSDP design leaves the top-level architecture unaffected, and the ove
5252
In the `init` stage, the following work is mainly completed:
5353

5454
<p align="center">
55-
<img src="./pic/1_fsdp_init.png" alt="FSDP actor init flow" width="50%" />
55+
<img src="/images/blog/miles-fsdp/1_fsdp_init.png" alt="FSDP actor init flow" width="50%" />
5656
</p>
5757

5858
FSDP actor init flow
@@ -69,7 +69,7 @@ FSDP actor init flow
6969
The `train` function serves as the main training entry point:
7070

7171
<p align="center">
72-
<img src="./pic/2_fsdp_train.png" alt="FSDP actor train flow" width="50%" />
72+
<img src="/images/blog/miles-fsdp/2_fsdp_train.png" alt="FSDP actor train flow" width="50%" />
7373
</p>
7474

7575
FSDP actor train flow
@@ -105,7 +105,7 @@ After completing Data Packing, the actor calculates log-prob and entropy of ref/
105105
> Specific details are documented in more detail in Miles's Docs.
106106
107107
<p align="center">
108-
<img src="./pic/3_kl_0.png" alt="training-rollout logprob diff = 0" width="50%" />
108+
<img src="/images/blog/miles-fsdp/3_kl_0.png" alt="training-rollout logprob diff = 0" width="50%" />
109109
</p>
110110

111111

@@ -139,7 +139,7 @@ $$
139139
After training ends, the latest weights are synchronized back to the Inference Engine (this is the best definition of the term refit). In `update_weight_utis.py`, we fully support all modes: `colocated` and `distributed`. The former alternates train / rollout occupying the same batch of GPUs, while the latter distributes train / rollout on different GPUs. For both methods, we adopted a bucketed asynchronous update strategy [Reference](https://hebiao064.github.io/rl-weight-sync), synchronizing chunked weights to the inference engine one by one, minimizing peak memory usage as much as possible.
140140

141141
<p align="center">
142-
<img src="./pic/4_fsdp_refit.png" alt="Update weights from training to inference with async tensor handle and bucket" width="50%" />
142+
<img src="/images/blog/miles-fsdp/4_fsdp_refit.png" alt="Update weights from training to inference with async tensor handle and bucket" width="50%" />
143143
</p>
144144

145145
> ✅ For specific mechanisms of weight update, welcome to check the previous blogs of SGLang RL group: [**RL System Deep Thinking: Weight Update Mechanisms**](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-1-EN.md)
@@ -162,7 +162,7 @@ Experimental Environment: Single node H100, Miles 0.5.5post1
162162
Megatron, FSDP colocated w ref model, FSDP colocated w/o ref model
163163

164164
<p align="center">
165-
<img src="./pic/5_fsdp_mcore_match.png" alt="Raw reward match" width="50%" />
165+
<img src="/images/blog/miles-fsdp/5_fsdp_mcore_match.png" alt="Raw reward match" width="50%" />
166166
</p>
167167

168168

0 commit comments

Comments
 (0)