Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cookbooks/training_judge_model/sft/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ SFT training is the **foundation for building judge models**. It works with conv

The model learns to minimize cross-entropy loss:

$$\mathcal{L} = -\sum_{t} \log P(y_t | y_{<t}, x)$$
$$\mathcal{L} = -\sum_{t} \log P(y_t \mid y_{\lt t}, x)$$

Where $x$ is the input and $y$ is the target response.

Expand Down
4 changes: 2 additions & 2 deletions docs/building_graders/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,15 @@ Automatically generate evaluation rubrics and create graders. Two approaches ava

Train neural networks on preference data to learn evaluation criteria automatically. Supports Bradley-Terry (preference pairs), Generative Pointwise (absolute scores), and Generative Pairwise (comparison decisions). Requires 1K-100K examples and 1-3 days but delivers highly consistent evaluation at 10x lower per-query cost—ideal for high-volume scenarios exceeding 1M queries per month.

**Learn more:** [Train with GRPO →](training_grpo.md) | [Bradley-Terry Training](https://github.com/modelscope/OpenJudge/tree/main/cookbooks/training_judge_model/bradley-terry) | [SFT Training](https://github.com/modelscope/OpenJudge/tree/main/cookbooks/training_judge_model/sft)
**Learn more:** [Train Reward Models →](training_reward_models.md)



## Next Steps

- [Create Custom Graders](create_custom_graders.md) — Build graders using LLM or code-based logic
- [Generate Rubrics as Graders](generate_rubrics_as_graders.md) — Automatically generate graders from task description or labeled data
- [Train with GRPO](training_grpo.md) — Train generative judge models with reinforcement learning
- [Train Reward Models](training_reward_models.md) — Train SFT, Bradley-Terry, or GRPO judge models
- [Built-in Graders](../built_in_graders/overview.md) — Explore pre-built graders to customize
- [Run Grading Tasks](../running_graders/run_tasks.md) — Deploy graders at scale with batch workflows

129 changes: 0 additions & 129 deletions docs/building_graders/training_grpo.md

This file was deleted.

Loading