typo

caoshiyi · caoshiyi · commit c76da3f2beee · 2025-02-13T21:05:38.000-08:00
diff --git a/src/content/posts/sky-t1-7b.md b/src/content/posts/sky-t1-7b.md
@@ -56,7 +56,7 @@ In this stage, to speed up the RL training, we adopt the simple [RLOO](https://a
 For reproductivity, we perform all the evaluation using the [Qwen’s math evaluation suite](https://github.com/QwenLM/Qwen2.5-Math/blob/main/evaluation/sh/eval.sh). For AIME24 and AMC 23, since they only have 30 and 40 questions respectively, we evaluate their performance by sampling 8 times for each question with a temperature of 0.6 and a top-p sampling probability of 0.95 and then compute the [pass@1](https://arxiv.org/pdf/2107.03374) (the calculation script is also provided [here](https://github.com/NovaSky-AI/SkyThought/tree/main/scripts/qwen_eval_bon.py)). For MATH500 and OlympiadBench, we use greedy decoding.
 
 ### Results
-We report the benchmark results for models after each stage as well as the intermediate distilled model in Table 1. We also plot the models’ pass@k curves to better understand how each SFT and RL stage impacts the model’s internal capability. For comparison, we conduct another ablation experiment which runs the RLOO directly on the Qwen2.5-math-7B base model using the [STILL3](https://huggingface.co/datasets/RUC-AIBOX/STILL-3-Preview-RL-Data) dataset, with 4 rollouts for each prompt. We train for 104 steps and get the final model as Sky-T1-7B-Zero.
+We report the benchmark results for models after each stage as well as the intermediate distilled model in Table 1. We also plot the models’ pass@k curves to better understand how each SFT and RL stage impacts the model’s internal capability. For comparison, we conduct another ablation experiment which runs the RLOO directly on the Qwen2.5-Math-7B base model using the [STILL3](https://huggingface.co/datasets/RUC-AIBOX/STILL-3-Preview-RL-Data) dataset, with 4 rollouts for each prompt. We train for 104 steps and get the final model as Sky-T1-7B-Zero.
 
 As shown in Figure 2, Long CoT SFT significantly improves the model’s overall pass@k performance in both AIME24 and AMC23. In AMC, the two-stage RL primarily boosts pass@1 accuracy while reducing the diversity of solutions for k = 4 to 32. In AIME, the step4 RL further enhances overall pass@k compared to the step1 SFT and step2 RL, though its impact is less pronounced compared to Sky-T1-7B-Zero.