Fsdp (#268)

zhaochenyang20 · zhaochen20 · web-flow · commit 7ff86616c690 · 2025-12-03T08:40:28.000-08:00
* update with miles

* adding FSDP in miles

* delete link

* add ack

* folding details

* solve miles

* fix complie

---------

Co-authored-by: zhaochenyang20 &lt;zhaochenyang20@gmail.com&gt;
diff --git a/blog/2025-12-03-miles-fsdp.md b/blog/2025-12-03-miles-fsdp.md
@@ -52,7 +52,7 @@ The robust FSDP design leaves the top-level architecture unaffected, and the ove
 In the `init` stage, the following work is mainly completed:
 
 <p align="center">
-  <img src="./pic/1_fsdp_init.png" alt="FSDP actor init flow" width="50%" />
+  <img src="/images/blog/miles-fsdp/1_fsdp_init.png" alt="FSDP actor init flow" width="50%" />
 </p>
 
 FSDP actor init flow
@@ -69,7 +69,7 @@ FSDP actor init flow
 The `train` function serves as the main training entry point:
 
 <p align="center">
-  <img src="./pic/2_fsdp_train.png" alt="FSDP actor train flow" width="50%" />
+  <img src="/images/blog/miles-fsdp/2_fsdp_train.png" alt="FSDP actor train flow" width="50%" />
 </p>
 
 FSDP actor train flow
@@ -105,7 +105,7 @@ After completing Data Packing, the actor calculates log-prob and entropy of ref/
 > Specific details are documented in more detail in Miles's Docs.
 
 <p align="center">
-  <img src="./pic/3_kl_0.png" alt="training-rollout logprob diff = 0" width="50%" />
+  <img src="/images/blog/miles-fsdp/3_kl_0.png" alt="training-rollout logprob diff = 0" width="50%" />
 </p>
 
 
@@ -139,7 +139,7 @@ $$
 After training ends, the latest weights are synchronized back to the Inference Engine (this is the best definition of the term refit). In `update_weight_utis.py`, we fully support all modes: `colocated` and `distributed`. The former alternates train / rollout occupying the same batch of GPUs, while the latter distributes train / rollout on different GPUs. For both methods, we adopted a bucketed asynchronous update strategy [Reference](https://hebiao064.github.io/rl-weight-sync), synchronizing chunked weights to the inference engine one by one, minimizing peak memory usage as much as possible.
 
 <p align="center">
-  <img src="./pic/4_fsdp_refit.png" alt="Update weights from training to inference with async tensor handle and bucket" width="50%" />
+  <img src="/images/blog/miles-fsdp/4_fsdp_refit.png" alt="Update weights from training to inference with async tensor handle and bucket" width="50%" />
 </p>
 
 > ✅ For specific mechanisms of weight update, welcome to check the previous blogs of SGLang RL group: [**RL System Deep Thinking: Weight Update Mechanisms**](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-1-EN.md)
@@ -162,7 +162,7 @@ Experimental Environment: Single node H100, Miles 0.5.5post1
 Megatron, FSDP colocated w ref model, FSDP colocated w/o ref model
 
 <p align="center">
-  <img src="./pic/5_fsdp_mcore_match.png" alt="Raw reward match" width="50%" />
+  <img src="/images/blog/miles-fsdp/5_fsdp_mcore_match.png" alt="Raw reward match" width="50%" />
 </p>