Skip to content

Commit ccf97f5

Browse files
authored
Merge pull request #25 from modelscope/docs/simplify-readme
Docs/simplify readme
2 parents 2859c63 + b443639 commit ccf97f5

File tree

3 files changed

+86
-147
lines changed

3 files changed

+86
-147
lines changed

README.md

Lines changed: 61 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -1,208 +1,126 @@
1-
<!-- # RM-Gallery: A One-Stop Reward Model Platform -->
2-
English | [**中文**](./README_zh.md)
3-
<h2 align="center">RM-Gallery: A One-Stop Reward Model Platform</h2>
4-
5-
[![](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/)
6-
[![](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/)
7-
[![](https://img.shields.io/badge/Docs-English-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/)
8-
9-
----
10-
11-
## 🗂️ Table of Contents
12-
- [📢 News](#-news)
13-
- [🌟 Why RM-Gallery?](#-why-rm-gallery)
14-
- [📥 Installation](#-installation)
15-
- [🚀 RM Gallery Walkthrough](#-rm-gallery-walkthrough)
16-
- [🏋️‍♂️ Training RM](#-training-rm)
17-
- [🏗️ Building RM](#-building-rm)
18-
- [🧩 Use Built-in RMs Directly](#-use-built-in-rms-directly)
19-
- [🛠️ Building Custom RMs](#-building-custom-rms)
20-
- [🧪 Evaluating with Reward Model](#-evaluating-with-reward-model)
21-
- [⚡ High-Performance RM Serving](#-high-performance-rm-serving)
22-
- [🛠️ Reward Applications](#-reward-applications)
23-
- [📚 Documentation](#-documentation)
24-
- [🤝 Contribute](#-contribute)
25-
- [📝 Citation](#-citation)
26-
27-
----
28-
29-
## 📢 News
30-
- **[2025-07-09]** We release RM Gallery v0.1.0 now, which is also available in [PyPI](https://pypi.org/simple/rm-gallery/)!
31-
----
32-
33-
## 🌟 Why RM-Gallery?
34-
35-
RM-Gallery is a one-stop platform for training, building and applying reward models. It provides a comprehensive solution for implementing reward models at both task-level and atomic-level, with high-throughput and fault-tolerant capabilities.
1+
<div align="center">
362

373
<p align="center">
38-
<img src="./docs/images/framework.png" alt="Framework" width="75%">
39-
<br/>
40-
<em>RM-Gallery Framework </em>
4+
<img src="./docs/images/logo.svg" alt="RM-Gallery Logo" width="500">
415
</p>
426

43-
### 🏋️‍♂️ Training RM
44-
- **Integrated RM Training Pipeline**: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework.
45-
<p align="center">
46-
<img src="./docs/images/building_rm/helpsteer2_pairwise_training_RM-Bench_eval_accuracy.png" alt="Training RM Accuracy Curve" width="60%">
47-
<br/>
48-
<em>RM Training Pipeline improves accuracy on RM Bench</em>
49-
</p>
50-
This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%.
51-
52-
### 🏗️ Building RM
53-
- **Unified Reward Model Architecture**: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise)
7+
<h3>A unified platform for building, evaluating, and applying reward models.</h3>
548

55-
- **Comprehensive RM Gallery**: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel.
9+
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/)
10+
[![PyPI](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/)
11+
[![Documentation](https://img.shields.io/badge/docs-online-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/)
5612

57-
- **Rubric-Critic-Score Paradigm**: Adopts the Rubric+Critic+Score-based reasoning Reward Model paradigm, offering best practices to help users generate rubrics with limited preference data.
13+
[Documentation](https://modelscope.github.io/RM-Gallery/) | [Examples](./examples/) | [中文](./README_zh.md)
5814

59-
<div style="display: flex; flex-wrap: wrap;">
60-
<img src="./docs/images/building_rm/rewardbench2_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;">
61-
<img src="./docs/images/building_rm/rmb_pairwise_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;">
6215
</div>
63-
The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1–3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise.
64-
65-
### 🛠️ Applying RM
66-
67-
- **Multiple Usage Scenarios**: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-N,data-correction)
6816

69-
- **High-Performance RM Serving**: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency.
17+
## News
7018

19+
- **2025-10-20** - [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314) - We released a new paper on learning generalizable reward criteria for robust modeling.
20+
- **2025-10-17** - [Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning](https://arxiv.org/abs/2510.15514) - We introduced techniques to align judge feedback and improve RL stability.
21+
- **2025-07-09** - Released RM-Gallery v0.1.0 on [PyPI](https://pypi.org/project/rm-gallery/)
7122

23+
## Installation
7224

73-
## 📥 Installation
74-
> RM Gallery requires **Python >= 3.10 and < 3.13**
75-
76-
77-
### 📦 Install From source
25+
RM-Gallery requires Python 3.10 or higher (< 3.13).
7826

7927
```bash
80-
# Pull the source code from GitHub
81-
git clone https://github.com/modelscope/RM-Gallery.git
82-
83-
# Install the package
84-
pip install .
28+
pip install rm-gallery
8529
```
8630

87-
### Install From PyPi
31+
Or install from source:
8832

8933
```bash
90-
pip install rm-gallery
34+
git clone https://github.com/modelscope/RM-Gallery.git
35+
cd RM-Gallery
36+
pip install .
9137
```
9238

93-
## 🚀 Quick Start
94-
95-
### Your First Reward Model
39+
## Quick Start
9640

9741
```python
9842
from rm_gallery.core.reward.registry import RewardRegistry
43+
from rm_gallery.core.data.schema import DataSample
9944

100-
# 1. Choose a pre-built reward model
45+
# Choose from 35+ pre-built reward models
10146
rm = RewardRegistry.get("safety_listwise_reward")
10247

103-
# 2. Prepare your data
104-
from rm_gallery.core.data.schema import DataSample
105-
sample = DataSample(...) # See docs for details
106-
107-
# 3. Evaluate
48+
# Evaluate your data
49+
sample = DataSample(...)
10850
result = rm.evaluate(sample)
109-
print(result)
11051
```
11152

112-
**That's it!** 🎉
113-
114-
👉 **[5-Minute Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started in minutes
115-
116-
👉 **[Interactive Notebooks](./examples/)** - Try it hands-on
53+
See the [quickstart guide](https://modelscope.github.io/RM-Gallery/quickstart/) for a complete example, or try our [interactive notebooks](./examples/).
11754

55+
## Features
11856

119-
## 📖 Key Features
57+
### Pre-built Reward Models
12058

121-
### 🏗️ Building Reward Models
122-
123-
Choose from **35+ pre-built reward models** or create your own:
59+
Access 35+ reward models for different domains:
12460

12561
```python
126-
# Use pre-built models
12762
rm = RewardRegistry.get("math_correctness_reward")
12863
rm = RewardRegistry.get("code_quality_reward")
12964
rm = RewardRegistry.get("helpfulness_listwise_reward")
130-
131-
# Or build custom models
132-
class CustomReward(BasePointWiseReward):
133-
def _evaluate(self, sample, **kwargs):
134-
# Your custom logic here
135-
return RewardResult(...)
13665
```
13766

138-
📚 **[See all available reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/)**
67+
[View all reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/)
13968

140-
### 🏋️‍♂️ Training Reward Models
69+
### Custom Reward Models
14170

142-
Train your own reward models with VERL framework:
71+
Build your own reward models with simple APIs:
14372

144-
```bash
145-
# Prepare data and launch training
146-
cd examples/train/pointwise
147-
./run_pointwise.sh
73+
```python
74+
from rm_gallery.core.reward import BasePointWiseReward
75+
76+
class CustomReward(BasePointWiseReward):
77+
def _evaluate(self, sample, **kwargs):
78+
# Your evaluation logic
79+
return RewardResult(...)
14880
```
14981

150-
📚 **[Training guide →](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)**
82+
[Learn more about building custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)
15183

152-
### 🧪 Evaluating on Benchmarks
84+
### Benchmarking
15385

154-
Test your models on standard benchmarks:
86+
Evaluate models on standard benchmarks:
15587

15688
- **RewardBench2** - Latest reward model benchmark
157-
- **RM-Bench** - Comprehensive evaluation
158-
- **Conflict Detector** - Detect evaluation conflicts
159-
- **JudgeBench** - Judge capability evaluation
160-
161-
📚 **[Evaluation guide →](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)**
162-
163-
### 🛠️ Real-World Applications
164-
165-
- **Best-of-N Selection** - Choose the best from multiple responses
166-
- **Data Refinement** - Improve data quality with reward feedback
167-
- **Post Training (RLHF)** - Integrate with reinforcement learning
168-
- **High-Performance Serving** - Deploy as scalable service
169-
170-
📚 **[Application guides →](https://modelscope.github.io/RM-Gallery/)**
89+
- **RM-Bench** - Comprehensive evaluation suite
90+
- **Conflict Detector** - Detect evaluation inconsistencies
91+
- **JudgeBench** - Judge capability assessment
17192

93+
[Read the evaluation guide](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)
17294

173-
## 📚 Documentation
95+
### Applications
17496

175-
**📖 [Complete Documentation](https://modelscope.github.io/RM-Gallery/)** - Full documentation site
97+
- **Best-of-N Selection** - Choose optimal responses from candidates
98+
- **Data Refinement** - Improve dataset quality with reward signals
99+
- **RLHF Integration** - Use rewards in reinforcement learning pipelines
100+
- **High-Performance Serving** - Deploy models with fault-tolerant infrastructure
176101

177-
### Quick Links
102+
## Documentation
178103

179-
- **[5-Minute Quickstart](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started fast
180-
- **[Interactive Examples](./examples/)** - Hands-on Jupyter notebooks
181-
- **[Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)** - Create your own
182-
- **[Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** - Train reward models
183-
- **[API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)** - Complete API docs
184-
- **[Changelog](./CHANGELOG.md)** - Version history and updates
104+
- [Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)
105+
- [Interactive Examples](./examples/)
106+
- [Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)
107+
- [Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)
108+
- [API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)
185109

110+
## Contributing
186111

112+
We welcome contributions! Please install pre-commit hooks before submitting pull requests:
187113

188-
189-
## 🤝 Contribute
190-
191-
Contributions are always encouraged!
192-
193-
We highly recommend install pre-commit hooks in this repo before committing pull requests.
194-
These hooks are small house-keeping scripts executed every time you make a git commit,
195-
which will take care of the formatting and linting automatically.
196-
```shell
114+
```bash
197115
pip install -e .
198116
pre-commit install
199117
```
200118

201-
Please refer to our [Contribution Guide](./docs/contribution.md) for more details.
119+
See our [contribution guide](./docs/contribution.md) for details.
202120

203-
## 📝 Citation
121+
## Citation
204122

205-
Reference to cite if you use RM-Gallery in a paper:
123+
If you use RM-Gallery in your research, please cite:
206124

207125
```
208126
@software{

docs/images/logo.svg

Lines changed: 23 additions & 0 deletions
Loading

docs/index.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,8 @@ show_datetime: true
1414

1515
<div style="text-align: center; margin: 3rem 0 2rem 0;">
1616
<div style="display: inline-block; position: relative;">
17-
<div style="font-size: 4.5rem; font-weight: 700; letter-spacing: -0.03em; line-height: 0.9; margin-bottom: 1rem; font-family: 'Inter', 'SF Pro Display', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;">
18-
<span style="background: linear-gradient(135deg, #22d3ee 0%, #3b82f6 30%, #6366f1 70%, #8b5cf6 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; text-shadow: 0 0 25px rgba(59, 130, 246, 0.3);">RM</span><span style="background: linear-gradient(135deg, #6366f1 0%, #8b5cf6 30%, #a855f7 70%, #ec4899 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; text-shadow: 0 0 25px rgba(139, 92, 246, 0.3);">Gallery</span>
19-
</div>
20-
<div style="position: absolute; top: -10px; left: -10px; right: -10px; bottom: -10px; background: radial-gradient(ellipse at center, rgba(59, 130, 246, 0.1) 0%, transparent 70%); border-radius: 20px; z-index: -1;"></div>
17+
<img src="./images/logo.svg" alt="RM-Gallery Logo" style="display: block; width: 520px; max-width: 80vw; z-index: 1; position: relative;">
18+
<div style="position: absolute; top: -10px; left: -10px; right: -10px; bottom: -10px; background: radial-gradient(ellipse at center, rgba(59, 130, 246, 0.12) 0%, transparent 70%); border-radius: 20px; z-index: 0;"></div>
2119
</div>
2220
</div>
2321

0 commit comments

Comments
 (0)