modelscope · helloml0326 · Jan 8, 2026 · Jan 8, 2026 · Jan 8, 2026 · Jan 8, 2026
diff --git a/cookbooks/training_judge_model/sft/README.md b/cookbooks/training_judge_model/sft/README.md
@@ -12,7 +12,7 @@ SFT training is the **foundation for building judge models**. It works with conv
 
 The model learns to minimize cross-entropy loss:
 
-$$\mathcal{L} = -\sum_{t} \log P(y_t | y_{<t}, x)$$
+$$\mathcal{L} = -\sum_{t} \log P(y_t \mid y_{\lt t}, x)$$
 
 Where $x$ is the input and $y$ is the target response.
 

diff --git a/docs/building_graders/overview.md b/docs/building_graders/overview.md
@@ -79,15 +79,15 @@ Automatically generate evaluation rubrics and create graders. Two approaches ava
 
 Train neural networks on preference data to learn evaluation criteria automatically. Supports Bradley-Terry (preference pairs), Generative Pointwise (absolute scores), and Generative Pairwise (comparison decisions). Requires 1K-100K examples and 1-3 days but delivers highly consistent evaluation at 10x lower per-query cost—ideal for high-volume scenarios exceeding 1M queries per month.
 
-**Learn more:** [Train with GRPO →](training_grpo.md) | [Bradley-Terry Training](https://github.com/modelscope/OpenJudge/tree/main/cookbooks/training_judge_model/bradley-terry) | [SFT Training](https://github.com/modelscope/OpenJudge/tree/main/cookbooks/training_judge_model/sft)
+**Learn more:** [Train Reward Models →](training_reward_models.md)
 
 
 
 ## Next Steps
 
 - [Create Custom Graders](create_custom_graders.md) — Build graders using LLM or code-based logic
 - [Generate Rubrics as Graders](generate_rubrics_as_graders.md) — Automatically generate graders from task description or labeled data
-- [Train with GRPO](training_grpo.md) — Train generative judge models with reinforcement learning
+- [Train Reward Models](training_reward_models.md) — Train SFT, Bradley-Terry, or GRPO judge models
 - [Built-in Graders](../built_in_graders/overview.md) — Explore pre-built graders to customize
 - [Run Grading Tasks](../running_graders/run_tasks.md) — Deploy graders at scale with batch workflows
 
diff --git a/docs/building_graders/training_grpo.md b/docs/building_graders/training_grpo.md