modelscope
diff --git a/‎cookbooks/training_judge_model/sft/README.md‎
Lines changed: 1 addition & 1 deletion b/‎cookbooks/training_judge_model/sft/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/building_graders/overview.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/building_graders/overview.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/building_graders/training_grpo.md‎
Lines changed: 0 additions & 129 deletions b/‎docs/building_graders/training_grpo.md‎
Lines changed: 0 additions & 129 deletions
@@ -12,7 +12,7 @@ SFT training is the **foundation for building judge models**. It works with conv
 
 The model learns to minimize cross-entropy loss:
 
-$$\mathcal{L} = -\sum_{t} \log P(y_t | y_{<t}, x)$$
+$$\mathcal{L} = -\sum_{t} \log P(y_t \mid y_{\lt t}, x)$$
 
 Where $x$ is the input and $y$ is the target response.
 
 
@@ -79,15 +79,15 @@ Automatically analyze evaluation data to create structured scoring rubrics. Prov
 
 Train neural networks on preference data to learn evaluation criteria automatically. Supports Bradley-Terry (preference pairs), Generative Pointwise (absolute scores), and Generative Pairwise (comparison decisions). Requires 1K-100K examples and 1-3 days but delivers highly consistent evaluation at 10x lower per-query cost—ideal for high-volume scenarios exceeding 1M queries per month.
 
-**Learn more:** [Train with GRPO →](training_grpo.md) | [Bradley-Terry Training](https://github.com/modelscope/OpenJudge/tree/main/cookbooks/training_judge_model/bradley-terry) | [SFT Training](https://github.com/modelscope/OpenJudge/tree/main/cookbooks/training_judge_model/sft)
+**Learn more:** [Train Reward Models →](training_reward_models.md)
 
 
 
 ## Next Steps
 
 - [Create Custom Graders](create_custom_graders.md) — Build graders using LLM or code-based logic
 - [Generate Graders from Data](generate_graders_from_data.md) — Auto-generate rubrics from labeled data
-- [Train with GRPO](training_grpo.md) — Train generative judge models with reinforcement learning
+- [Train Reward Models](training_reward_models.md) — Train SFT, Bradley-Terry, or GRPO judge models
 - [Built-in Graders](../built_in_graders/overview.md) — Explore pre-built graders to customize
 - [Run Grading Tasks](../running_graders/run_tasks.md) — Deploy graders at scale with batch workflows