|
1 | | -<!-- # RM-Gallery: A One-Stop Reward Model Platform --> |
2 | | -English | [**中文**](./README_zh.md) |
3 | | -<h2 align="center">RM-Gallery: A One-Stop Reward Model Platform</h2> |
4 | | - |
5 | | -[](https://pypi.org/project/rm-gallery/) |
6 | | -[](https://pypi.org/project/rm-gallery/) |
7 | | -[](https://modelscope.github.io/RM-Gallery/) |
8 | | - |
9 | | ----- |
10 | | - |
11 | | -## 🗂️ Table of Contents |
12 | | -- [📢 News](#-news) |
13 | | -- [🌟 Why RM-Gallery?](#-why-rm-gallery) |
14 | | -- [📥 Installation](#-installation) |
15 | | -- [🚀 RM Gallery Walkthrough](#-rm-gallery-walkthrough) |
16 | | - - [🏋️♂️ Training RM](#-training-rm) |
17 | | - - [🏗️ Building RM](#-building-rm) |
18 | | - - [🧩 Use Built-in RMs Directly](#-use-built-in-rms-directly) |
19 | | - - [🛠️ Building Custom RMs](#-building-custom-rms) |
20 | | - - [🧪 Evaluating with Reward Model](#-evaluating-with-reward-model) |
21 | | - - [⚡ High-Performance RM Serving](#-high-performance-rm-serving) |
22 | | - - [🛠️ Reward Applications](#-reward-applications) |
23 | | -- [📚 Documentation](#-documentation) |
24 | | -- [🤝 Contribute](#-contribute) |
25 | | -- [📝 Citation](#-citation) |
26 | | - |
27 | | ----- |
28 | | - |
29 | | -## 📢 News |
30 | | -- **[2025-07-09]** We release RM Gallery v0.1.0 now, which is also available in [PyPI](https://pypi.org/simple/rm-gallery/)! |
31 | | ----- |
32 | | - |
33 | | -## 🌟 Why RM-Gallery? |
34 | | - |
35 | | -RM-Gallery is a one-stop platform for training, building and applying reward models. It provides a comprehensive solution for implementing reward models at both task-level and atomic-level, with high-throughput and fault-tolerant capabilities. |
| 1 | +<div align="center"> |
36 | 2 |
|
37 | 3 | <p align="center"> |
38 | | - <img src="./docs/images/framework.png" alt="Framework" width="75%"> |
39 | | - <br/> |
40 | | - <em>RM-Gallery Framework </em> |
| 4 | + <img src="./docs/images/logo.svg" alt="RM-Gallery Logo" width="500"> |
41 | 5 | </p> |
42 | 6 |
|
43 | | -### 🏋️♂️ Training RM |
44 | | -- **Integrated RM Training Pipeline**: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework. |
45 | | -<p align="center"> |
46 | | - <img src="./docs/images/building_rm/helpsteer2_pairwise_training_RM-Bench_eval_accuracy.png" alt="Training RM Accuracy Curve" width="60%"> |
47 | | - <br/> |
48 | | - <em>RM Training Pipeline improves accuracy on RM Bench</em> |
49 | | -</p> |
50 | | -This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%. |
51 | | - |
52 | | -### 🏗️ Building RM |
53 | | -- **Unified Reward Model Architecture**: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise) |
| 7 | +<h3>A unified platform for building, evaluating, and applying reward models.</h3> |
54 | 8 |
|
55 | | -- **Comprehensive RM Gallery**: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel. |
| 9 | +[](https://pypi.org/project/rm-gallery/) |
| 10 | +[](https://pypi.org/project/rm-gallery/) |
| 11 | +[](https://modelscope.github.io/RM-Gallery/) |
56 | 12 |
|
57 | | -- **Rubric-Critic-Score Paradigm**: Adopts the Rubric+Critic+Score-based reasoning Reward Model paradigm, offering best practices to help users generate rubrics with limited preference data. |
| 13 | +[Documentation](https://modelscope.github.io/RM-Gallery/) | [Examples](./examples/) | [中文](./README_zh.md) |
58 | 14 |
|
59 | | -<div style="display: flex; flex-wrap: wrap;"> |
60 | | - <img src="./docs/images/building_rm/rewardbench2_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;"> |
61 | | - <img src="./docs/images/building_rm/rmb_pairwise_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;"> |
62 | 15 | </div> |
63 | | -The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1–3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise. |
64 | | - |
65 | | -### 🛠️ Applying RM |
66 | | - |
67 | | -- **Multiple Usage Scenarios**: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-N,data-correction) |
68 | 16 |
|
69 | | -- **High-Performance RM Serving**: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency. |
| 17 | +## News |
70 | 18 |
|
| 19 | +- **2025-10-20** - [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314) - We released a new paper on learning generalizable reward criteria for robust modeling. |
| 20 | +- **2025-10-17** - [Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning](https://arxiv.org/abs/2510.15514) - We introduced techniques to align judge feedback and improve RL stability. |
| 21 | +- **2025-07-09** - Released RM-Gallery v0.1.0 on [PyPI](https://pypi.org/project/rm-gallery/) |
71 | 22 |
|
| 23 | +## Installation |
72 | 24 |
|
73 | | -## 📥 Installation |
74 | | -> RM Gallery requires **Python >= 3.10 and < 3.13** |
75 | | -
|
76 | | - |
77 | | -### 📦 Install From source |
| 25 | +RM-Gallery requires Python 3.10 or higher (< 3.13). |
78 | 26 |
|
79 | 27 | ```bash |
80 | | -# Pull the source code from GitHub |
81 | | -git clone https://github.com/modelscope/RM-Gallery.git |
82 | | - |
83 | | -# Install the package |
84 | | -pip install . |
| 28 | +pip install rm-gallery |
85 | 29 | ``` |
86 | 30 |
|
87 | | -### Install From PyPi |
| 31 | +Or install from source: |
88 | 32 |
|
89 | 33 | ```bash |
90 | | -pip install rm-gallery |
| 34 | +git clone https://github.com/modelscope/RM-Gallery.git |
| 35 | +cd RM-Gallery |
| 36 | +pip install . |
91 | 37 | ``` |
92 | 38 |
|
93 | | -## 🚀 Quick Start |
94 | | - |
95 | | -### Your First Reward Model |
| 39 | +## Quick Start |
96 | 40 |
|
97 | 41 | ```python |
98 | 42 | from rm_gallery.core.reward.registry import RewardRegistry |
| 43 | +from rm_gallery.core.data.schema import DataSample |
99 | 44 |
|
100 | | -# 1. Choose a pre-built reward model |
| 45 | +# Choose from 35+ pre-built reward models |
101 | 46 | rm = RewardRegistry.get("safety_listwise_reward") |
102 | 47 |
|
103 | | -# 2. Prepare your data |
104 | | -from rm_gallery.core.data.schema import DataSample |
105 | | -sample = DataSample(...) # See docs for details |
106 | | - |
107 | | -# 3. Evaluate |
| 48 | +# Evaluate your data |
| 49 | +sample = DataSample(...) |
108 | 50 | result = rm.evaluate(sample) |
109 | | -print(result) |
110 | 51 | ``` |
111 | 52 |
|
112 | | -**That's it!** 🎉 |
113 | | - |
114 | | -👉 **[5-Minute Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started in minutes |
115 | | - |
116 | | -👉 **[Interactive Notebooks](./examples/)** - Try it hands-on |
| 53 | +See the [quickstart guide](https://modelscope.github.io/RM-Gallery/quickstart/) for a complete example, or try our [interactive notebooks](./examples/). |
117 | 54 |
|
| 55 | +## Features |
118 | 56 |
|
119 | | -## 📖 Key Features |
| 57 | +### Pre-built Reward Models |
120 | 58 |
|
121 | | -### 🏗️ Building Reward Models |
122 | | - |
123 | | -Choose from **35+ pre-built reward models** or create your own: |
| 59 | +Access 35+ reward models for different domains: |
124 | 60 |
|
125 | 61 | ```python |
126 | | -# Use pre-built models |
127 | 62 | rm = RewardRegistry.get("math_correctness_reward") |
128 | 63 | rm = RewardRegistry.get("code_quality_reward") |
129 | 64 | rm = RewardRegistry.get("helpfulness_listwise_reward") |
130 | | - |
131 | | -# Or build custom models |
132 | | -class CustomReward(BasePointWiseReward): |
133 | | - def _evaluate(self, sample, **kwargs): |
134 | | - # Your custom logic here |
135 | | - return RewardResult(...) |
136 | 65 | ``` |
137 | 66 |
|
138 | | -📚 **[See all available reward models →](https://modelscope.github.io/RM-Gallery/library/rm_library/)** |
| 67 | +[View all reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/) |
139 | 68 |
|
140 | | -### 🏋️♂️ Training Reward Models |
| 69 | +### Custom Reward Models |
141 | 70 |
|
142 | | -Train your own reward models with VERL framework: |
| 71 | +Build your own reward models with simple APIs: |
143 | 72 |
|
144 | | -```bash |
145 | | -# Prepare data and launch training |
146 | | -cd examples/train/pointwise |
147 | | -./run_pointwise.sh |
| 73 | +```python |
| 74 | +from rm_gallery.core.reward import BasePointWiseReward |
| 75 | + |
| 76 | +class CustomReward(BasePointWiseReward): |
| 77 | + def _evaluate(self, sample, **kwargs): |
| 78 | + # Your evaluation logic |
| 79 | + return RewardResult(...) |
148 | 80 | ``` |
149 | 81 |
|
150 | | -📚 **[Training guide →](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** |
| 82 | +[Learn more about building custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/) |
151 | 83 |
|
152 | | -### 🧪 Evaluating on Benchmarks |
| 84 | +### Benchmarking |
153 | 85 |
|
154 | | -Test your models on standard benchmarks: |
| 86 | +Evaluate models on standard benchmarks: |
155 | 87 |
|
156 | 88 | - **RewardBench2** - Latest reward model benchmark |
157 | | -- **RM-Bench** - Comprehensive evaluation |
158 | | -- **Conflict Detector** - Detect evaluation conflicts |
159 | | -- **JudgeBench** - Judge capability evaluation |
160 | | - |
161 | | -📚 **[Evaluation guide →](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)** |
162 | | - |
163 | | -### 🛠️ Real-World Applications |
164 | | - |
165 | | -- **Best-of-N Selection** - Choose the best from multiple responses |
166 | | -- **Data Refinement** - Improve data quality with reward feedback |
167 | | -- **Post Training (RLHF)** - Integrate with reinforcement learning |
168 | | -- **High-Performance Serving** - Deploy as scalable service |
169 | | - |
170 | | -📚 **[Application guides →](https://modelscope.github.io/RM-Gallery/)** |
| 89 | +- **RM-Bench** - Comprehensive evaluation suite |
| 90 | +- **Conflict Detector** - Detect evaluation inconsistencies |
| 91 | +- **JudgeBench** - Judge capability assessment |
171 | 92 |
|
| 93 | +[Read the evaluation guide](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/) |
172 | 94 |
|
173 | | -## 📚 Documentation |
| 95 | +### Applications |
174 | 96 |
|
175 | | -**📖 [Complete Documentation](https://modelscope.github.io/RM-Gallery/)** - Full documentation site |
| 97 | +- **Best-of-N Selection** - Choose optimal responses from candidates |
| 98 | +- **Data Refinement** - Improve dataset quality with reward signals |
| 99 | +- **RLHF Integration** - Use rewards in reinforcement learning pipelines |
| 100 | +- **High-Performance Serving** - Deploy models with fault-tolerant infrastructure |
176 | 101 |
|
177 | | -### Quick Links |
| 102 | +## Documentation |
178 | 103 |
|
179 | | -- **[5-Minute Quickstart](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started fast |
180 | | -- **[Interactive Examples](./examples/)** - Hands-on Jupyter notebooks |
181 | | -- **[Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)** - Create your own |
182 | | -- **[Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** - Train reward models |
183 | | -- **[API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)** - Complete API docs |
184 | | -- **[Changelog](./CHANGELOG.md)** - Version history and updates |
| 104 | +- [Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/) |
| 105 | +- [Interactive Examples](./examples/) |
| 106 | +- [Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/) |
| 107 | +- [Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/) |
| 108 | +- [API Reference](https://modelscope.github.io/RM-Gallery/api_reference/) |
185 | 109 |
|
| 110 | +## Contributing |
186 | 111 |
|
| 112 | +We welcome contributions! Please install pre-commit hooks before submitting pull requests: |
187 | 113 |
|
188 | | - |
189 | | -## 🤝 Contribute |
190 | | - |
191 | | -Contributions are always encouraged! |
192 | | - |
193 | | -We highly recommend install pre-commit hooks in this repo before committing pull requests. |
194 | | -These hooks are small house-keeping scripts executed every time you make a git commit, |
195 | | -which will take care of the formatting and linting automatically. |
196 | | -```shell |
| 114 | +```bash |
197 | 115 | pip install -e . |
198 | 116 | pre-commit install |
199 | 117 | ``` |
200 | 118 |
|
201 | | -Please refer to our [Contribution Guide](./docs/contribution.md) for more details. |
| 119 | +See our [contribution guide](./docs/contribution.md) for details. |
202 | 120 |
|
203 | | -## 📝 Citation |
| 121 | +## Citation |
204 | 122 |
|
205 | | -Reference to cite if you use RM-Gallery in a paper: |
| 123 | +If you use RM-Gallery in your research, please cite: |
206 | 124 |
|
207 | 125 | ``` |
208 | 126 | @software{ |
|
0 commit comments