-### ποΈββοΈ Training RM
-- **Integrated RM Training Pipeline**: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework.
-
-
-
- RM Training Pipeline improves accuracy on RM Bench
-
-This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%.
-
-### ποΈ Building RM
-- **Unified Reward Model Architecture**: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise)
+
A unified platform for building, evaluating, and applying reward models.
-- **Comprehensive RM Gallery**: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel.
+[](https://pypi.org/project/rm-gallery/)
+[](https://pypi.org/project/rm-gallery/)
+[](https://modelscope.github.io/RM-Gallery/)
-- **Rubric-Critic-Score Paradigm**: Adopts the Rubric+Critic+Score-based reasoning Reward Model paradigm, offering best practices to help users generate rubrics with limited preference data.
+[Documentation](https://modelscope.github.io/RM-Gallery/) | [Examples](./examples/) | [δΈζ](./README_zh.md)
-
-
-
-The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1β3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise.
-
-### π οΈ Applying RM
-
-- **Multiple Usage Scenarios**: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-NοΌdata-correction)
-- **High-Performance RM Serving**: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency.
+## News
+- **2025-10-20** - [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314) - We released a new paper on learning generalizable reward criteria for robust modeling.
+- **2025-10-17** - [Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning](https://arxiv.org/abs/2510.15514) - We introduced techniques to align judge feedback and improve RL stability.
+- **2025-07-09** - Released RM-Gallery v0.1.0 on [PyPI](https://pypi.org/project/rm-gallery/)
+## Installation
-## π₯ Installation
-> RM Gallery requires **Python >= 3.10 and < 3.13**
-
-
-### π¦ Install From source
+RM-Gallery requires Python 3.10 or higher (< 3.13).
```bash
-# Pull the source code from GitHub
-git clone https://github.com/modelscope/RM-Gallery.git
-
-# Install the package
-pip install .
+pip install rm-gallery
```
-### Install From PyPi
+Or install from source:
```bash
-pip install rm-gallery
+git clone https://github.com/modelscope/RM-Gallery.git
+cd RM-Gallery
+pip install .
```
-## π Quick Start
-
-### Your First Reward Model
+## Quick Start
```python
from rm_gallery.core.reward.registry import RewardRegistry
+from rm_gallery.core.data.schema import DataSample
-# 1. Choose a pre-built reward model
+# Choose from 35+ pre-built reward models
rm = RewardRegistry.get("safety_listwise_reward")
-# 2. Prepare your data
-from rm_gallery.core.data.schema import DataSample
-sample = DataSample(...) # See docs for details
-
-# 3. Evaluate
+# Evaluate your data
+sample = DataSample(...)
result = rm.evaluate(sample)
-print(result)
```
-**That's it!** π
-
-π **[5-Minute Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started in minutes
-
-π **[Interactive Notebooks](./examples/)** - Try it hands-on
+See the [quickstart guide](https://modelscope.github.io/RM-Gallery/quickstart/) for a complete example, or try our [interactive notebooks](./examples/).
+## Features
-## π Key Features
+### Pre-built Reward Models
-### ποΈ Building Reward Models
-
-Choose from **35+ pre-built reward models** or create your own:
+Access 35+ reward models for different domains:
```python
-# Use pre-built models
rm = RewardRegistry.get("math_correctness_reward")
rm = RewardRegistry.get("code_quality_reward")
rm = RewardRegistry.get("helpfulness_listwise_reward")
-
-# Or build custom models
-class CustomReward(BasePointWiseReward):
- def _evaluate(self, sample, **kwargs):
- # Your custom logic here
- return RewardResult(...)
```
-π **[See all available reward models β](https://modelscope.github.io/RM-Gallery/library/rm_library/)**
+[View all reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/)
-### ποΈββοΈ Training Reward Models
+### Custom Reward Models
-Train your own reward models with VERL framework:
+Build your own reward models with simple APIs:
-```bash
-# Prepare data and launch training
-cd examples/train/pointwise
-./run_pointwise.sh
+```python
+from rm_gallery.core.reward import BasePointWiseReward
+
+class CustomReward(BasePointWiseReward):
+ def _evaluate(self, sample, **kwargs):
+ # Your evaluation logic
+ return RewardResult(...)
```
-π **[Training guide β](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)**
+[Learn more about building custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)
-### π§ͺ Evaluating on Benchmarks
+### Benchmarking
-Test your models on standard benchmarks:
+Evaluate models on standard benchmarks:
- **RewardBench2** - Latest reward model benchmark
-- **RM-Bench** - Comprehensive evaluation
-- **Conflict Detector** - Detect evaluation conflicts
-- **JudgeBench** - Judge capability evaluation
-
-π **[Evaluation guide β](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)**
-
-### π οΈ Real-World Applications
-
-- **Best-of-N Selection** - Choose the best from multiple responses
-- **Data Refinement** - Improve data quality with reward feedback
-- **Post Training (RLHF)** - Integrate with reinforcement learning
-- **High-Performance Serving** - Deploy as scalable service
-
-π **[Application guides β](https://modelscope.github.io/RM-Gallery/)**
+- **RM-Bench** - Comprehensive evaluation suite
+- **Conflict Detector** - Detect evaluation inconsistencies
+- **JudgeBench** - Judge capability assessment
+[Read the evaluation guide](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)
-## π Documentation
+### Applications
-**π [Complete Documentation](https://modelscope.github.io/RM-Gallery/)** - Full documentation site
+- **Best-of-N Selection** - Choose optimal responses from candidates
+- **Data Refinement** - Improve dataset quality with reward signals
+- **RLHF Integration** - Use rewards in reinforcement learning pipelines
+- **High-Performance Serving** - Deploy models with fault-tolerant infrastructure
-### Quick Links
+## Documentation
-- **[5-Minute Quickstart](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started fast
-- **[Interactive Examples](./examples/)** - Hands-on Jupyter notebooks
-- **[Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)** - Create your own
-- **[Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** - Train reward models
-- **[API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)** - Complete API docs
-- **[Changelog](./CHANGELOG.md)** - Version history and updates
+- [Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)
+- [Interactive Examples](./examples/)
+- [Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)
+- [Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)
+- [API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)
+## Contributing
+We welcome contributions! Please install pre-commit hooks before submitting pull requests:
-
-## π€ Contribute
-
-Contributions are always encouraged!
-
-We highly recommend install pre-commit hooks in this repo before committing pull requests.
-These hooks are small house-keeping scripts executed every time you make a git commit,
-which will take care of the formatting and linting automatically.
-```shell
+```bash
pip install -e .
pre-commit install
```
-Please refer to our [Contribution Guide](./docs/contribution.md) for more details.
+See our [contribution guide](./docs/contribution.md) for details.
-## π Citation
+## Citation
-Reference to cite if you use RM-Gallery in a paper:
+If you use RM-Gallery in your research, please cite:
```
@software{
diff --git a/docs/images/logo.svg b/docs/images/logo.svg
new file mode 100644
index 000000000..63c26bb83
--- /dev/null
+++ b/docs/images/logo.svg
@@ -0,0 +1,23 @@
+
+
diff --git a/docs/index.md b/docs/index.md
index 5cd6a17b4..c3268e248 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -14,10 +14,8 @@ show_datetime: true