Skip to content

Commit 05fa5fd

Browse files
committed
spelling mistake fixed - grpo
1 parent 5420581 commit 05fa5fd

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/natural_language_processing/deepseek.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ The final stage involved another round of RL, this time aimed at improving the m
118118
To make the advanced reasoning capabilities more accessible, the researchers distilled DeepSeek-R1's knowledge into smaller dense models based on Qwen and Llama architectures. For distilled models, authors apply only SFT and do not include an RL stage, even though incorporating RL could substantially boost model performance.
119119

120120
!!! Note
121-
There is a major takeaway from this analysis regarding the efficiency of Distillation on different technique GPRO vs SFT: Transferring knowledge from advanced AI models to smaller versions ("distillation") often works better than training compact models (< 3B models) with resource-heavy reinforcement learning (RL), which demands massive computing power and still underperforms.
121+
There is a major takeaway from this analysis regarding the efficiency of Distillation on different technique GRPO vs SFT: Transferring knowledge from advanced AI models to smaller versions ("distillation") often works better than training compact models (< 3B models) with resource-heavy reinforcement learning (RL), which demands massive computing power and still underperforms.
122122

123123
In short, if your model is <3B parameters and you have sufficient data, consider supervised finetuning over RL based training.
124124

0 commit comments

Comments
 (0)