Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 61 additions & 143 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,208 +1,126 @@
<!-- # RM-Gallery: A One-Stop Reward Model Platform -->
English | [**中文**](./README_zh.md)
<h2 align="center">RM-Gallery: A One-Stop Reward Model Platform</h2>

[![](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/)
[![](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/)
[![](https://img.shields.io/badge/Docs-English-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/)

----

## 🗂️ Table of Contents
- [📢 News](#-news)
- [🌟 Why RM-Gallery?](#-why-rm-gallery)
- [📥 Installation](#-installation)
- [🚀 RM Gallery Walkthrough](#-rm-gallery-walkthrough)
- [🏋️‍♂️ Training RM](#-training-rm)
- [🏗️ Building RM](#-building-rm)
- [🧩 Use Built-in RMs Directly](#-use-built-in-rms-directly)
- [🛠️ Building Custom RMs](#-building-custom-rms)
- [🧪 Evaluating with Reward Model](#-evaluating-with-reward-model)
- [⚡ High-Performance RM Serving](#-high-performance-rm-serving)
- [🛠️ Reward Applications](#-reward-applications)
- [📚 Documentation](#-documentation)
- [🤝 Contribute](#-contribute)
- [📝 Citation](#-citation)

----

## 📢 News
- **[2025-07-09]** We release RM Gallery v0.1.0 now, which is also available in [PyPI](https://pypi.org/simple/rm-gallery/)!
----

## 🌟 Why RM-Gallery?

RM-Gallery is a one-stop platform for training, building and applying reward models. It provides a comprehensive solution for implementing reward models at both task-level and atomic-level, with high-throughput and fault-tolerant capabilities.
<div align="center">

<p align="center">
<img src="./docs/images/framework.png" alt="Framework" width="75%">
<br/>
<em>RM-Gallery Framework </em>
<img src="./docs/images/logo.svg" alt="RM-Gallery Logo" width="500">
</p>

### 🏋️‍♂️ Training RM
- **Integrated RM Training Pipeline**: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework.
<p align="center">
<img src="./docs/images/building_rm/helpsteer2_pairwise_training_RM-Bench_eval_accuracy.png" alt="Training RM Accuracy Curve" width="60%">
<br/>
<em>RM Training Pipeline improves accuracy on RM Bench</em>
</p>
This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%.

### 🏗️ Building RM
- **Unified Reward Model Architecture**: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise)
<h3>A unified platform for building, evaluating, and applying reward models.</h3>

- **Comprehensive RM Gallery**: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel.
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/)
[![PyPI](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/)
[![Documentation](https://img.shields.io/badge/docs-online-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/)

- **Rubric-Critic-Score Paradigm**: Adopts the Rubric+Critic+Score-based reasoning Reward Model paradigm, offering best practices to help users generate rubrics with limited preference data.
[Documentation](https://modelscope.github.io/RM-Gallery/) | [Examples](./examples/) | [中文](./README_zh.md)

<div style="display: flex; flex-wrap: wrap;">
<img src="./docs/images/building_rm/rewardbench2_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;">
<img src="./docs/images/building_rm/rmb_pairwise_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;">
</div>
The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1–3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise.

### 🛠️ Applying RM

- **Multiple Usage Scenarios**: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-N,data-correction)

- **High-Performance RM Serving**: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency.
## News

- **2025-10-20** - [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314) - We released a new paper on learning generalizable reward criteria for robust modeling.
- **2025-10-17** - [Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning](https://arxiv.org/abs/2510.15514) - We introduced techniques to align judge feedback and improve RL stability.
- **2025-07-09** - Released RM-Gallery v0.1.0 on [PyPI](https://pypi.org/project/rm-gallery/)

## Installation

## 📥 Installation
> RM Gallery requires **Python >= 3.10 and < 3.13**

### 📦 Install From source
RM-Gallery requires Python 3.10 or higher (< 3.13).

```bash
# Pull the source code from GitHub
git clone https://github.com/modelscope/RM-Gallery.git

# Install the package
pip install .
pip install rm-gallery
```

### Install From PyPi
Or install from source:

```bash
pip install rm-gallery
git clone https://github.com/modelscope/RM-Gallery.git
cd RM-Gallery
pip install .
```

## 🚀 Quick Start

### Your First Reward Model
## Quick Start

```python
from rm_gallery.core.reward.registry import RewardRegistry
from rm_gallery.core.data.schema import DataSample

# 1. Choose a pre-built reward model
# Choose from 35+ pre-built reward models
rm = RewardRegistry.get("safety_listwise_reward")

# 2. Prepare your data
from rm_gallery.core.data.schema import DataSample
sample = DataSample(...) # See docs for details

# 3. Evaluate
# Evaluate your data
sample = DataSample(...)
result = rm.evaluate(sample)
print(result)
```

**That's it!** 🎉

👉 **[5-Minute Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started in minutes

👉 **[Interactive Notebooks](./examples/)** - Try it hands-on
See the [quickstart guide](https://modelscope.github.io/RM-Gallery/quickstart/) for a complete example, or try our [interactive notebooks](./examples/).

## Features

## 📖 Key Features
### Pre-built Reward Models

### 🏗️ Building Reward Models

Choose from **35+ pre-built reward models** or create your own:
Access 35+ reward models for different domains:

```python
# Use pre-built models
rm = RewardRegistry.get("math_correctness_reward")
rm = RewardRegistry.get("code_quality_reward")
rm = RewardRegistry.get("helpfulness_listwise_reward")

# Or build custom models
class CustomReward(BasePointWiseReward):
def _evaluate(self, sample, **kwargs):
# Your custom logic here
return RewardResult(...)
```

📚 **[See all available reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/)**
[View all reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/)

### 🏋️‍♂️ Training Reward Models
### Custom Reward Models

Train your own reward models with VERL framework:
Build your own reward models with simple APIs:

```bash
# Prepare data and launch training
cd examples/train/pointwise
./run_pointwise.sh
```python
from rm_gallery.core.reward import BasePointWiseReward

class CustomReward(BasePointWiseReward):
def _evaluate(self, sample, **kwargs):
# Your evaluation logic
return RewardResult(...)
Comment on lines +74 to +79
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code example for creating a custom reward model is missing an import for RewardResult. This will cause a NameError if a user tries to run this code. Please add the import to make the example runnable.

Suggested change
from rm_gallery.core.reward import BasePointWiseReward
class CustomReward(BasePointWiseReward):
def _evaluate(self, sample, **kwargs):
# Your evaluation logic
return RewardResult(...)
from rm_gallery.core.reward import BasePointWiseReward
from rm_gallery.core.reward.schema import RewardResult
class CustomReward(BasePointWiseReward):
def _evaluate(self, sample, **kwargs):
# Your evaluation logic
return RewardResult(...)

```

📚 **[Training guide →](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)**
[Learn more about building custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)

### 🧪 Evaluating on Benchmarks
### Benchmarking

Test your models on standard benchmarks:
Evaluate models on standard benchmarks:

- **RewardBench2** - Latest reward model benchmark
- **RM-Bench** - Comprehensive evaluation
- **Conflict Detector** - Detect evaluation conflicts
- **JudgeBench** - Judge capability evaluation

📚 **[Evaluation guide →](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)**

### 🛠️ Real-World Applications

- **Best-of-N Selection** - Choose the best from multiple responses
- **Data Refinement** - Improve data quality with reward feedback
- **Post Training (RLHF)** - Integrate with reinforcement learning
- **High-Performance Serving** - Deploy as scalable service

📚 **[Application guides →](https://modelscope.github.io/RM-Gallery/)**
- **RM-Bench** - Comprehensive evaluation suite
- **Conflict Detector** - Detect evaluation inconsistencies
- **JudgeBench** - Judge capability assessment

[Read the evaluation guide](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)

## 📚 Documentation
### Applications

**📖 [Complete Documentation](https://modelscope.github.io/RM-Gallery/)** - Full documentation site
- **Best-of-N Selection** - Choose optimal responses from candidates
- **Data Refinement** - Improve dataset quality with reward signals
- **RLHF Integration** - Use rewards in reinforcement learning pipelines
- **High-Performance Serving** - Deploy models with fault-tolerant infrastructure

### Quick Links
## Documentation

- **[5-Minute Quickstart](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started fast
- **[Interactive Examples](./examples/)** - Hands-on Jupyter notebooks
- **[Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)** - Create your own
- **[Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** - Train reward models
- **[API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)** - Complete API docs
- **[Changelog](./CHANGELOG.md)** - Version history and updates
- [Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)
- [Interactive Examples](./examples/)
- [Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)
- [Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)
- [API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)

## Contributing

We welcome contributions! Please install pre-commit hooks before submitting pull requests:


## 🤝 Contribute

Contributions are always encouraged!

We highly recommend install pre-commit hooks in this repo before committing pull requests.
These hooks are small house-keeping scripts executed every time you make a git commit,
which will take care of the formatting and linting automatically.
```shell
```bash
pip install -e .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pre-commit package needs to be installed for pre-commit install to work. pip install -e . might not install it. Consider adding pip install pre-commit to the instructions to ensure contributors can set up the environment correctly.

pre-commit install
```

Please refer to our [Contribution Guide](./docs/contribution.md) for more details.
See our [contribution guide](./docs/contribution.md) for details.

## 📝 Citation
## Citation

Reference to cite if you use RM-Gallery in a paper:
If you use RM-Gallery in your research, please cite:

```
@software{
Expand Down
23 changes: 23 additions & 0 deletions docs/images/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 2 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,8 @@ show_datetime: true

<div style="text-align: center; margin: 3rem 0 2rem 0;">
<div style="display: inline-block; position: relative;">
<div style="font-size: 4.5rem; font-weight: 700; letter-spacing: -0.03em; line-height: 0.9; margin-bottom: 1rem; font-family: 'Inter', 'SF Pro Display', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;">
<span style="background: linear-gradient(135deg, #22d3ee 0%, #3b82f6 30%, #6366f1 70%, #8b5cf6 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; text-shadow: 0 0 25px rgba(59, 130, 246, 0.3);">RM</span><span style="background: linear-gradient(135deg, #6366f1 0%, #8b5cf6 30%, #a855f7 70%, #ec4899 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; text-shadow: 0 0 25px rgba(139, 92, 246, 0.3);">Gallery</span>
</div>
<div style="position: absolute; top: -10px; left: -10px; right: -10px; bottom: -10px; background: radial-gradient(ellipse at center, rgba(59, 130, 246, 0.1) 0%, transparent 70%); border-radius: 20px; z-index: -1;"></div>
<img src="./images/logo.svg" alt="RM-Gallery Logo" style="display: block; width: 520px; max-width: 80vw; z-index: 1; position: relative;">
<div style="position: absolute; top: -10px; left: -10px; right: -10px; bottom: -10px; background: radial-gradient(ellipse at center, rgba(59, 130, 246, 0.12) 0%, transparent 70%); border-radius: 20px; z-index: 0;"></div>
</div>
</div>

Expand Down
Loading