Skip to content

Commit 24a98f6

Browse files
Add link to online SFT in speculative training section (#250)
Updated the speculative training section to include a link for online SFT on the draft model.
1 parent 2840ec4 commit 24a98f6

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

blog/2025-11-19-miles.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ In order to fully utilize the precious GPU memory for maximum performance withou
5555

5656
### Speculative Training
5757

58-
In RL, freezing the draft model prevents it from following the target model policy, reducing accept length and degrading speedup, so we perform online SFT on the draft model throughout RL.
58+
In RL, freezing the draft model prevents it from following the target model policy, reducing accept length and degrading speedup, so we perform [online SFT on the draft model](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/spec/readme-en.md) throughout RL.
5959

6060
- Achieve 25%+ rollout speedup vs. frozen MTP, especially in the late training stage.
6161
- Support MTP with sequence packing + CP; Loss masks with proper edge-case handling; LM head/embedding gradient isolation, and Megatron↔SGLang weight syncing.
@@ -82,4 +82,4 @@ For the future development of Miles, we will put together more efforts to suppor
8282

8383
Miles exists thanks to the slime authors and the broader (SGLang) RL community.
8484

85-
We invite researchers, startups, and enterprise teams alike to explore slime and Miles - whichever best fits your environment - and to be together with us to make reinforcement learning efficient and reliable. We'll hear from the community and actively work on Miles' future development, towards a production-ready training environment.
85+
We invite researchers, startups, and enterprise teams alike to explore slime and Miles - whichever best fits your environment - and to be together with us to make reinforcement learning efficient and reliable. We'll hear from the community and actively work on Miles' future development, towards a production-ready training environment.

0 commit comments

Comments
 (0)