diff --git a/README.md b/README.md index 49d7d9cac..ee2f80cbe 100644 --- a/README.md +++ b/README.md @@ -1,208 +1,126 @@ - -English | [**δΈ­ζ–‡**](./README_zh.md) -

RM-Gallery: A One-Stop Reward Model Platform

- -[![](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/) -[![](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/) -[![](https://img.shields.io/badge/Docs-English-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/) - ----- - -## πŸ—‚οΈ Table of Contents -- [πŸ“’ News](#-news) -- [🌟 Why RM-Gallery?](#-why-rm-gallery) -- [πŸ“₯ Installation](#-installation) -- [πŸš€ RM Gallery Walkthrough](#-rm-gallery-walkthrough) - - [πŸ‹οΈβ€β™‚οΈ Training RM](#-training-rm) - - [πŸ—οΈ Building RM](#-building-rm) - - [🧩 Use Built-in RMs Directly](#-use-built-in-rms-directly) - - [πŸ› οΈ Building Custom RMs](#-building-custom-rms) - - [πŸ§ͺ Evaluating with Reward Model](#-evaluating-with-reward-model) - - [⚑ High-Performance RM Serving](#-high-performance-rm-serving) - - [πŸ› οΈ Reward Applications](#-reward-applications) -- [πŸ“š Documentation](#-documentation) -- [🀝 Contribute](#-contribute) -- [πŸ“ Citation](#-citation) - ----- - -## πŸ“’ News -- **[2025-07-09]** We release RM Gallery v0.1.0 now, which is also available in [PyPI](https://pypi.org/simple/rm-gallery/)! ----- - -## 🌟 Why RM-Gallery? - -RM-Gallery is a one-stop platform for training, building and applying reward models. It provides a comprehensive solution for implementing reward models at both task-level and atomic-level, with high-throughput and fault-tolerant capabilities. +

- Framework -
- RM-Gallery Framework + RM-Gallery Logo

-### πŸ‹οΈβ€β™‚οΈ Training RM -- **Integrated RM Training Pipeline**: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework. -

- Training RM Accuracy Curve -
- RM Training Pipeline improves accuracy on RM Bench -

-This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%. - -### πŸ—οΈ Building RM -- **Unified Reward Model Architecture**: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise) +

A unified platform for building, evaluating, and applying reward models.

-- **Comprehensive RM Gallery**: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel. +[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/) +[![PyPI](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/) +[![Documentation](https://img.shields.io/badge/docs-online-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/) -- **Rubric-Critic-Score Paradigm**: Adopts the Rubric+Critic+Score-based reasoning Reward Model paradigm, offering best practices to help users generate rubrics with limited preference data. +[Documentation](https://modelscope.github.io/RM-Gallery/) | [Examples](./examples/) | [δΈ­ζ–‡](./README_zh.md) -
- -
-The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1–3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise. - -### πŸ› οΈ Applying RM - -- **Multiple Usage Scenarios**: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-N,data-correction) -- **High-Performance RM Serving**: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency. +## News +- **2025-10-20** - [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314) - We released a new paper on learning generalizable reward criteria for robust modeling. +- **2025-10-17** - [Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning](https://arxiv.org/abs/2510.15514) - We introduced techniques to align judge feedback and improve RL stability. +- **2025-07-09** - Released RM-Gallery v0.1.0 on [PyPI](https://pypi.org/project/rm-gallery/) +## Installation -## πŸ“₯ Installation -> RM Gallery requires **Python >= 3.10 and < 3.13** - - -### πŸ“¦ Install From source +RM-Gallery requires Python 3.10 or higher (< 3.13). ```bash -# Pull the source code from GitHub -git clone https://github.com/modelscope/RM-Gallery.git - -# Install the package -pip install . +pip install rm-gallery ``` -### Install From PyPi +Or install from source: ```bash -pip install rm-gallery +git clone https://github.com/modelscope/RM-Gallery.git +cd RM-Gallery +pip install . ``` -## πŸš€ Quick Start - -### Your First Reward Model +## Quick Start ```python from rm_gallery.core.reward.registry import RewardRegistry +from rm_gallery.core.data.schema import DataSample -# 1. Choose a pre-built reward model +# Choose from 35+ pre-built reward models rm = RewardRegistry.get("safety_listwise_reward") -# 2. Prepare your data -from rm_gallery.core.data.schema import DataSample -sample = DataSample(...) # See docs for details - -# 3. Evaluate +# Evaluate your data +sample = DataSample(...) result = rm.evaluate(sample) -print(result) ``` -**That's it!** πŸŽ‰ - -πŸ‘‰ **[5-Minute Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started in minutes - -πŸ‘‰ **[Interactive Notebooks](./examples/)** - Try it hands-on +See the [quickstart guide](https://modelscope.github.io/RM-Gallery/quickstart/) for a complete example, or try our [interactive notebooks](./examples/). +## Features -## πŸ“– Key Features +### Pre-built Reward Models -### πŸ—οΈ Building Reward Models - -Choose from **35+ pre-built reward models** or create your own: +Access 35+ reward models for different domains: ```python -# Use pre-built models rm = RewardRegistry.get("math_correctness_reward") rm = RewardRegistry.get("code_quality_reward") rm = RewardRegistry.get("helpfulness_listwise_reward") - -# Or build custom models -class CustomReward(BasePointWiseReward): - def _evaluate(self, sample, **kwargs): - # Your custom logic here - return RewardResult(...) ``` -πŸ“š **[See all available reward models β†’](https://modelscope.github.io/RM-Gallery/library/rm_library/)** +[View all reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/) -### πŸ‹οΈβ€β™‚οΈ Training Reward Models +### Custom Reward Models -Train your own reward models with VERL framework: +Build your own reward models with simple APIs: -```bash -# Prepare data and launch training -cd examples/train/pointwise -./run_pointwise.sh +```python +from rm_gallery.core.reward import BasePointWiseReward + +class CustomReward(BasePointWiseReward): + def _evaluate(self, sample, **kwargs): + # Your evaluation logic + return RewardResult(...) ``` -πŸ“š **[Training guide β†’](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** +[Learn more about building custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/) -### πŸ§ͺ Evaluating on Benchmarks +### Benchmarking -Test your models on standard benchmarks: +Evaluate models on standard benchmarks: - **RewardBench2** - Latest reward model benchmark -- **RM-Bench** - Comprehensive evaluation -- **Conflict Detector** - Detect evaluation conflicts -- **JudgeBench** - Judge capability evaluation - -πŸ“š **[Evaluation guide β†’](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)** - -### πŸ› οΈ Real-World Applications - -- **Best-of-N Selection** - Choose the best from multiple responses -- **Data Refinement** - Improve data quality with reward feedback -- **Post Training (RLHF)** - Integrate with reinforcement learning -- **High-Performance Serving** - Deploy as scalable service - -πŸ“š **[Application guides β†’](https://modelscope.github.io/RM-Gallery/)** +- **RM-Bench** - Comprehensive evaluation suite +- **Conflict Detector** - Detect evaluation inconsistencies +- **JudgeBench** - Judge capability assessment +[Read the evaluation guide](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/) -## πŸ“š Documentation +### Applications -**πŸ“– [Complete Documentation](https://modelscope.github.io/RM-Gallery/)** - Full documentation site +- **Best-of-N Selection** - Choose optimal responses from candidates +- **Data Refinement** - Improve dataset quality with reward signals +- **RLHF Integration** - Use rewards in reinforcement learning pipelines +- **High-Performance Serving** - Deploy models with fault-tolerant infrastructure -### Quick Links +## Documentation -- **[5-Minute Quickstart](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started fast -- **[Interactive Examples](./examples/)** - Hands-on Jupyter notebooks -- **[Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)** - Create your own -- **[Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** - Train reward models -- **[API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)** - Complete API docs -- **[Changelog](./CHANGELOG.md)** - Version history and updates +- [Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/) +- [Interactive Examples](./examples/) +- [Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/) +- [Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/) +- [API Reference](https://modelscope.github.io/RM-Gallery/api_reference/) +## Contributing +We welcome contributions! Please install pre-commit hooks before submitting pull requests: - -## 🀝 Contribute - -Contributions are always encouraged! - -We highly recommend install pre-commit hooks in this repo before committing pull requests. -These hooks are small house-keeping scripts executed every time you make a git commit, -which will take care of the formatting and linting automatically. -```shell +```bash pip install -e . pre-commit install ``` -Please refer to our [Contribution Guide](./docs/contribution.md) for more details. +See our [contribution guide](./docs/contribution.md) for details. -## πŸ“ Citation +## Citation -Reference to cite if you use RM-Gallery in a paper: +If you use RM-Gallery in your research, please cite: ``` @software{ diff --git a/docs/images/logo.svg b/docs/images/logo.svg new file mode 100644 index 000000000..63c26bb83 --- /dev/null +++ b/docs/images/logo.svg @@ -0,0 +1,23 @@ + + + + + + + + + + + + + + + + + + RMGallery + + + diff --git a/docs/index.md b/docs/index.md index 5cd6a17b4..c3268e248 100644 --- a/docs/index.md +++ b/docs/index.md @@ -14,10 +14,8 @@ show_datetime: true
-
- RMGallery -
-
+ RM-Gallery Logo +