Skip to content

Commit 3bc8aa5

Browse files
authored
[FSDP][Blog] Fix GRPO formula (#269)
* Fix GRPO formula * Correct formatting of weight update equation Fix formatting of mathematical expression in markdown. * fix * Fix formatting of mathematical expressions in blog post * Fix format
1 parent 7ff8661 commit 3bc8aa5

File tree

1 file changed

+15
-5
lines changed

1 file changed

+15
-5
lines changed

blog/2025-12-03-miles-fsdp.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -127,11 +127,21 @@ Considering mismatch, `rollout_log_probs, old_log_probs, log_probs` will all par
127127
Taking GRPO as an example, the final loss function is:
128128

129129
$$
130-
\begin{aligned}
131-
\mathcal{L}(\theta) &= \frac{1}{L} \sum_{t=1}^L \left[ \bar{w}_t \cdot \mathcal{L}^{\text{clip}}_t(\theta) - \beta \text{KL}_t + \lambda H_t \right] \\
132-
\text{where } \mathcal{L}^{\text{clip}}_t &= \min \left( r_t(\theta) A_t, \ \text{clip}(r_t(\theta), 1\pm\epsilon) A_t \right) \\
133-
r_t(\theta) &= \frac{\pi_{\theta}}{\pi_{\text{old}}}, \quad \bar{w}_t = \text{min}\left( \frac{\pi_{\text{old}}}{\pi_{\text{rollout}}}, C \right)
134-
\end{aligned}
130+
\mathcal{L}(\theta)
131+
= \frac{1}{L} \sum_{t=1}^L \left[ \bar{w}_t \cdot \mathcal{L}^{\text{clip}}_t(\theta) - \beta \,\text{KL}_t + \lambda H_t \right]
132+
$$
133+
134+
where
135+
136+
$$
137+
\mathcal{L}^{\text{clip}}_t
138+
= \min \left( r_t(\theta) A_t,\ \text{clip}(r_t(\theta), 1\pm\epsilon)\, A_t \right)
139+
$$
140+
141+
and
142+
143+
$$
144+
r_t(\theta) = \frac{\pi_\theta}{\pi\_{\text{old}}}, \quad \bar{w}_t = \min \left( \frac{\pi\_{\text{old}}}{\pi\_{\text{rollout}}}, C \right)
135145
$$
136146

137147
### Weight Update Optimization: Weight Update and Colocated Mode

0 commit comments

Comments
 (0)