Skip to content

Commit ea8db00

Browse files
authored
docs: rename reward to judge model (#47)
* docs: rename reward model to judge model for consistency * docs: add terminology note clarifying judge model vs reward model * docs: fix terminology in build_reward.md - use composite rewards instead of judge models
1 parent 178ee9f commit ea8db00

File tree

7 files changed

+16
-10
lines changed

7 files changed

+16
-10
lines changed

docs/building_graders/generate_rubrics_as_graders.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ Learn evaluation rubrics from labeled preference data. Based on [Auto-Rubric: Le
177177
1. **Infer query-specific rubrics** — For each labeled example, the system proposes criteria that explain why one response is better than another
178178
2. **Generalize to core set** — Similar rubrics are merged and organized into a compact, non-redundant "Theme-Tips" structure
179179

180-
**Data efficiency:** Using just 70 preference pairs, this method enables smaller models to match or outperform fully-trained reward models.
180+
**Data efficiency:** Using just 70 preference pairs, this method enables smaller models to match or outperform fully-trained judge models.
181181

182182
<figure markdown="span">
183183
![Auto-Rubric Pipeline Overview](../images/auto_rubric_overview.png){ width="100%" }

docs/building_graders/overview.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Building Custom Graders
22

3-
Extend OpenJudge beyond built-in evaluators by creating custom graders or training reward models. Build domain-specific evaluation logic that seamlessly integrates with OpenJudge's evaluation pipeline.
3+
Extend OpenJudge beyond built-in evaluators by creating custom graders or training judge models. Build domain-specific evaluation logic that seamlessly integrates with OpenJudge's evaluation pipeline.
44

55

66
## Why Build Custom Graders?
@@ -17,7 +17,7 @@ OpenJudge supports three paths for creating custom graders, each optimized for d
1717
|----------|---------------|---------------|----------|--------------|
1818
| **Create Custom Graders** | Minutes | None | Quick prototyping, domain-specific logic | Pay-per-query (API) or free (code-based) |
1919
| **Generate from Data** | 1-4 hours | 50-500 examples | Iterative refinement, transparent rubrics | Medium setup + pay-per-query |
20-
| **Train Reward Models** | 1-3 days | 1K-100K pairs | High-volume production (>1M queries/month) | High upfront, 10x lower per-query |
20+
| **Train Judge Models** | 1-3 days | 1K-100K pairs | High-volume production (>1M queries/month) | High upfront, 10x lower per-query |
2121

2222
Use this decision tree to choose the right approach based on your data availability and requirements:
2323

@@ -57,7 +57,7 @@ Use this decision tree to choose the right approach based on your data availabil
5757

5858
**Choose based on your situation:**
5959

60-
- **Have labeled data + need automation?** → Train a reward model
60+
- **Have labeled data + need automation?** → Train a judge model
6161
- **Have data + need fast iteration?** → Generate rubrics from data
6262
- **No data + need immediate results?** → Create custom graders
6363

@@ -75,19 +75,19 @@ Automatically generate evaluation rubrics and create graders. Two approaches ava
7575
**Learn more:** [Generate Rubrics as Graders →](generate_rubrics_as_graders.md)
7676

7777

78-
### Approach 3: Train Reward Models
78+
### Approach 3: Train Judge Models
7979

8080
Train neural networks on preference data to learn evaluation criteria automatically. Supports Bradley-Terry (preference pairs), Generative Pointwise (absolute scores), and Generative Pairwise (comparison decisions). Requires 1K-100K examples and 1-3 days but delivers highly consistent evaluation at 10x lower per-query cost—ideal for high-volume scenarios exceeding 1M queries per month.
8181

82-
**Learn more:** [Train Reward Models →](training_reward_models.md)
82+
**Learn more:** [Train Judge Models →](training_judge_models.md)
8383

8484

8585

8686
## Next Steps
8787

8888
- [Create Custom Graders](create_custom_graders.md) — Build graders using LLM or code-based logic
8989
- [Generate Rubrics as Graders](generate_rubrics_as_graders.md) — Automatically generate graders from task description or labeled data
90-
- [Train Reward Models](training_reward_models.md) — Train SFT, Bradley-Terry, or GRPO judge models
90+
- [Train Judge Models](training_judge_models.md) — Train SFT, Bradley-Terry, or GRPO judge models
9191
- [Built-in Graders](../built_in_graders/overview.md) — Explore pre-built graders to customize
9292
- [Run Grading Tasks](../running_graders/run_tasks.md) — Deploy graders at scale with batch workflows
9393

docs/building_graders/training_judge_models.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
Train judge models using three approaches: **SFT** for foundation learning, **Bradley-Terry** for scalar preference scoring, and **GRPO** for generative evaluation with reasoning.
44

5+
!!! info "Terminology: Judge Model vs Reward Model"
6+
In OpenJudge, we use **judge model** to refer to models trained for evaluation. This is the same concept as **reward model** commonly used in RLHF literature. Both terms describe models that assess and score AI outputs—we prefer "judge model" to emphasize the evaluation and assessment role.
7+
58

69
## Overview
710

docs/community/contributing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Contribute to OpenJudge
22

3-
Welcome! OpenJudge is an open-source reward model platform. Your contributions help make AI alignment and evaluation more accessible to the community.
3+
Welcome! OpenJudge is an open-source judge model platform. Your contributions help make AI alignment and evaluation more accessible to the community.
44

55
!!! info "Ways to Contribute"
66
We welcome contributions of all kinds:

docs/get_started/build_reward.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,7 @@ asyncio.run(main())
259259

260260
Running this code evaluates both responses across three quality dimensions and produces a training reward for each. These rewards can then feed into RLHF or DPO algorithms to optimize your chatbot. The output shows individual dimension scores alongside the final aggregated reward, helping you understand what drives the training signal.
261261

262-
You now have a foundation for building reward models. Start with a single grader to validate your setup, then progressively add more dimensions as needed. The key is choosing graders that align with your application's requirements and weighting them appropriately based on what matters most for your use case.
262+
You now have a foundation for building composite rewards. Start with a single grader to validate your setup, then progressively add more dimensions as needed. The key is choosing graders that align with your application's requirements and weighting them appropriately based on what matters most for your use case.
263263

264264

265265
## Explore More Graders

docs/get_started/core_concepts.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ In the era of advanced AI systems, especially large language models (LLMs), havi
1010

1111
**Reward** mechanisms, on the other hand, provide signals that guide model training through techniques like Reinforcement Learning from Human Feedback (RLHF). These reward signals enable automated optimization, allowing systems to self-improve without constant human intervention by providing feedback on the quality of model outputs.
1212

13+
!!! info "Terminology: Judge Model vs Reward Model"
14+
In OpenJudge, we use **judge model** to refer to models trained for evaluation. This is the same concept as **reward model** commonly used in RLHF literature. Both terms describe models that assess and score AI outputs—we prefer "judge model" to emphasize the evaluation and assessment role.
15+
1316
The OpenJudge framework unifies these two critical functions under a single abstraction: the Grader. A Grader is a modular, standardized component that can function as either an evaluator or a reward generator depending on your use case. As an **evaluator**, a Grader assesses model outputs against specific criteria. As a **reward generator**, a Grader provides signals that guide model training. This unified approach provides a consistent interface that simplifies the process of building, managing, and deploying both evaluation and reward systems, transforming raw model outputs into meaningful, quantifiable assessments that serve as the foundation for systematic model evaluation and automated model improvement.
1417

1518
## Why Graders Matter

docs/get_started/evaluate_ai_agents.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ AI agents operate autonomously through complex reasoning loops, making multiple
2222
|-------------|------------------|-------------|
2323
| **Final Response** | Overall task success and answer quality | Production monitoring, A/B testing |
2424
| **Single Step** | Individual action quality (tool calls, planning) | Debugging failures, prompt engineering |
25-
| **Trajectory** | Multi-step reasoning paths and efficiency | Cost optimization, training reward models |
25+
| **Trajectory** | Multi-step reasoning paths and efficiency | Cost optimization, training judge models |
2626

2727
!!! tip "Evaluation Strategy"
2828
Start with **Final Response** evaluation to establish baseline success rates. When failures occur, use **Single Step** evaluation to pinpoint root causes. Use **Trajectory** evaluation to detect systemic issues like loops or inefficiencies.

0 commit comments

Comments
 (0)