Skip to content

Commit a94ec79

Browse files
committed
docs: add terminology note clarifying judge model vs reward model
1 parent 6dc69c4 commit a94ec79

File tree

2 files changed

+6
-0
lines changed

2 files changed

+6
-0
lines changed

docs/building_graders/training_judge_models.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
Train judge models using three approaches: **SFT** for foundation learning, **Bradley-Terry** for scalar preference scoring, and **GRPO** for generative evaluation with reasoning.
44

5+
!!! info "Terminology: Judge Model vs Reward Model"
6+
In OpenJudge, we use **judge model** to refer to models trained for evaluation. This is the same concept as **reward model** commonly used in RLHF literature. Both terms describe models that assess and score AI outputs—we prefer "judge model" to emphasize the evaluation and assessment role.
7+
58

69
## Overview
710

docs/get_started/core_concepts.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ In the era of advanced AI systems, especially large language models (LLMs), havi
1010

1111
**Reward** mechanisms, on the other hand, provide signals that guide model training through techniques like Reinforcement Learning from Human Feedback (RLHF). These reward signals enable automated optimization, allowing systems to self-improve without constant human intervention by providing feedback on the quality of model outputs.
1212

13+
!!! info "Terminology: Judge Model vs Reward Model"
14+
In OpenJudge, we use **judge model** to refer to models trained for evaluation. This is the same concept as **reward model** commonly used in RLHF literature. Both terms describe models that assess and score AI outputs—we prefer "judge model" to emphasize the evaluation and assessment role.
15+
1316
The OpenJudge framework unifies these two critical functions under a single abstraction: the Grader. A Grader is a modular, standardized component that can function as either an evaluator or a reward generator depending on your use case. As an **evaluator**, a Grader assesses model outputs against specific criteria. As a **reward generator**, a Grader provides signals that guide model training. This unified approach provides a consistent interface that simplifies the process of building, managing, and deploying both evaluation and reward systems, transforming raw model outputs into meaningful, quantifiable assessments that serve as the foundation for systematic model evaluation and automated model improvement.
1417

1518
## Why Graders Matter

0 commit comments

Comments
 (0)