-
Notifications
You must be signed in to change notification settings - Fork 41
Docs/simplify readme #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
61791f0
docs: simplify README and add latest research papers
XiaoBoAI d77a410
docs: add gradient logo and move News section to top of README
XiaoBoAI c43e46e
docs: use gradient SVG logo matching index.md colors
XiaoBoAI 94776ed
docs: reuse shared gradient logo across README and docs index
XiaoBoAI b443639
Add concise descriptions to recent news entries
XiaoBoAI File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,208 +1,126 @@ | ||
| <!-- # RM-Gallery: A One-Stop Reward Model Platform --> | ||
| English | [**中文**](./README_zh.md) | ||
| <h2 align="center">RM-Gallery: A One-Stop Reward Model Platform</h2> | ||
|
|
||
| [](https://pypi.org/project/rm-gallery/) | ||
| [](https://pypi.org/project/rm-gallery/) | ||
| [](https://modelscope.github.io/RM-Gallery/) | ||
|
|
||
| ---- | ||
|
|
||
| ## 🗂️ Table of Contents | ||
| - [📢 News](#-news) | ||
| - [🌟 Why RM-Gallery?](#-why-rm-gallery) | ||
| - [📥 Installation](#-installation) | ||
| - [🚀 RM Gallery Walkthrough](#-rm-gallery-walkthrough) | ||
| - [🏋️♂️ Training RM](#-training-rm) | ||
| - [🏗️ Building RM](#-building-rm) | ||
| - [🧩 Use Built-in RMs Directly](#-use-built-in-rms-directly) | ||
| - [🛠️ Building Custom RMs](#-building-custom-rms) | ||
| - [🧪 Evaluating with Reward Model](#-evaluating-with-reward-model) | ||
| - [⚡ High-Performance RM Serving](#-high-performance-rm-serving) | ||
| - [🛠️ Reward Applications](#-reward-applications) | ||
| - [📚 Documentation](#-documentation) | ||
| - [🤝 Contribute](#-contribute) | ||
| - [📝 Citation](#-citation) | ||
|
|
||
| ---- | ||
|
|
||
| ## 📢 News | ||
| - **[2025-07-09]** We release RM Gallery v0.1.0 now, which is also available in [PyPI](https://pypi.org/simple/rm-gallery/)! | ||
| ---- | ||
|
|
||
| ## 🌟 Why RM-Gallery? | ||
|
|
||
| RM-Gallery is a one-stop platform for training, building and applying reward models. It provides a comprehensive solution for implementing reward models at both task-level and atomic-level, with high-throughput and fault-tolerant capabilities. | ||
| <div align="center"> | ||
|
|
||
| <p align="center"> | ||
| <img src="./docs/images/framework.png" alt="Framework" width="75%"> | ||
| <br/> | ||
| <em>RM-Gallery Framework </em> | ||
| <img src="./docs/images/logo.svg" alt="RM-Gallery Logo" width="500"> | ||
| </p> | ||
|
|
||
| ### 🏋️♂️ Training RM | ||
| - **Integrated RM Training Pipeline**: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework. | ||
| <p align="center"> | ||
| <img src="./docs/images/building_rm/helpsteer2_pairwise_training_RM-Bench_eval_accuracy.png" alt="Training RM Accuracy Curve" width="60%"> | ||
| <br/> | ||
| <em>RM Training Pipeline improves accuracy on RM Bench</em> | ||
| </p> | ||
| This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%. | ||
|
|
||
| ### 🏗️ Building RM | ||
| - **Unified Reward Model Architecture**: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise) | ||
| <h3>A unified platform for building, evaluating, and applying reward models.</h3> | ||
|
|
||
| - **Comprehensive RM Gallery**: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel. | ||
| [](https://pypi.org/project/rm-gallery/) | ||
| [](https://pypi.org/project/rm-gallery/) | ||
| [](https://modelscope.github.io/RM-Gallery/) | ||
|
|
||
| - **Rubric-Critic-Score Paradigm**: Adopts the Rubric+Critic+Score-based reasoning Reward Model paradigm, offering best practices to help users generate rubrics with limited preference data. | ||
| [Documentation](https://modelscope.github.io/RM-Gallery/) | [Examples](./examples/) | [中文](./README_zh.md) | ||
|
|
||
| <div style="display: flex; flex-wrap: wrap;"> | ||
| <img src="./docs/images/building_rm/rewardbench2_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;"> | ||
| <img src="./docs/images/building_rm/rmb_pairwise_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;"> | ||
| </div> | ||
| The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1–3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise. | ||
|
|
||
| ### 🛠️ Applying RM | ||
|
|
||
| - **Multiple Usage Scenarios**: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-N,data-correction) | ||
|
|
||
| - **High-Performance RM Serving**: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency. | ||
| ## News | ||
|
|
||
| - **2025-10-20** - [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314) - We released a new paper on learning generalizable reward criteria for robust modeling. | ||
| - **2025-10-17** - [Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning](https://arxiv.org/abs/2510.15514) - We introduced techniques to align judge feedback and improve RL stability. | ||
| - **2025-07-09** - Released RM-Gallery v0.1.0 on [PyPI](https://pypi.org/project/rm-gallery/) | ||
|
|
||
| ## Installation | ||
|
|
||
| ## 📥 Installation | ||
| > RM Gallery requires **Python >= 3.10 and < 3.13** | ||
|
|
||
| ### 📦 Install From source | ||
| RM-Gallery requires Python 3.10 or higher (< 3.13). | ||
|
|
||
| ```bash | ||
| # Pull the source code from GitHub | ||
| git clone https://github.com/modelscope/RM-Gallery.git | ||
|
|
||
| # Install the package | ||
| pip install . | ||
| pip install rm-gallery | ||
| ``` | ||
|
|
||
| ### Install From PyPi | ||
| Or install from source: | ||
|
|
||
| ```bash | ||
| pip install rm-gallery | ||
| git clone https://github.com/modelscope/RM-Gallery.git | ||
| cd RM-Gallery | ||
| pip install . | ||
| ``` | ||
|
|
||
| ## 🚀 Quick Start | ||
|
|
||
| ### Your First Reward Model | ||
| ## Quick Start | ||
|
|
||
| ```python | ||
| from rm_gallery.core.reward.registry import RewardRegistry | ||
| from rm_gallery.core.data.schema import DataSample | ||
|
|
||
| # 1. Choose a pre-built reward model | ||
| # Choose from 35+ pre-built reward models | ||
| rm = RewardRegistry.get("safety_listwise_reward") | ||
|
|
||
| # 2. Prepare your data | ||
| from rm_gallery.core.data.schema import DataSample | ||
| sample = DataSample(...) # See docs for details | ||
|
|
||
| # 3. Evaluate | ||
| # Evaluate your data | ||
| sample = DataSample(...) | ||
| result = rm.evaluate(sample) | ||
| print(result) | ||
| ``` | ||
|
|
||
| **That's it!** 🎉 | ||
|
|
||
| 👉 **[5-Minute Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started in minutes | ||
|
|
||
| 👉 **[Interactive Notebooks](./examples/)** - Try it hands-on | ||
| See the [quickstart guide](https://modelscope.github.io/RM-Gallery/quickstart/) for a complete example, or try our [interactive notebooks](./examples/). | ||
|
|
||
| ## Features | ||
|
|
||
| ## 📖 Key Features | ||
| ### Pre-built Reward Models | ||
|
|
||
| ### 🏗️ Building Reward Models | ||
|
|
||
| Choose from **35+ pre-built reward models** or create your own: | ||
| Access 35+ reward models for different domains: | ||
|
|
||
| ```python | ||
| # Use pre-built models | ||
| rm = RewardRegistry.get("math_correctness_reward") | ||
| rm = RewardRegistry.get("code_quality_reward") | ||
| rm = RewardRegistry.get("helpfulness_listwise_reward") | ||
|
|
||
| # Or build custom models | ||
| class CustomReward(BasePointWiseReward): | ||
| def _evaluate(self, sample, **kwargs): | ||
| # Your custom logic here | ||
| return RewardResult(...) | ||
| ``` | ||
|
|
||
| 📚 **[See all available reward models →](https://modelscope.github.io/RM-Gallery/library/rm_library/)** | ||
| [View all reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/) | ||
|
|
||
| ### 🏋️♂️ Training Reward Models | ||
| ### Custom Reward Models | ||
|
|
||
| Train your own reward models with VERL framework: | ||
| Build your own reward models with simple APIs: | ||
|
|
||
| ```bash | ||
| # Prepare data and launch training | ||
| cd examples/train/pointwise | ||
| ./run_pointwise.sh | ||
| ```python | ||
| from rm_gallery.core.reward import BasePointWiseReward | ||
|
|
||
| class CustomReward(BasePointWiseReward): | ||
| def _evaluate(self, sample, **kwargs): | ||
| # Your evaluation logic | ||
| return RewardResult(...) | ||
| ``` | ||
|
|
||
| 📚 **[Training guide →](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** | ||
| [Learn more about building custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/) | ||
|
|
||
| ### 🧪 Evaluating on Benchmarks | ||
| ### Benchmarking | ||
|
|
||
| Test your models on standard benchmarks: | ||
| Evaluate models on standard benchmarks: | ||
|
|
||
| - **RewardBench2** - Latest reward model benchmark | ||
| - **RM-Bench** - Comprehensive evaluation | ||
| - **Conflict Detector** - Detect evaluation conflicts | ||
| - **JudgeBench** - Judge capability evaluation | ||
|
|
||
| 📚 **[Evaluation guide →](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)** | ||
|
|
||
| ### 🛠️ Real-World Applications | ||
|
|
||
| - **Best-of-N Selection** - Choose the best from multiple responses | ||
| - **Data Refinement** - Improve data quality with reward feedback | ||
| - **Post Training (RLHF)** - Integrate with reinforcement learning | ||
| - **High-Performance Serving** - Deploy as scalable service | ||
|
|
||
| 📚 **[Application guides →](https://modelscope.github.io/RM-Gallery/)** | ||
| - **RM-Bench** - Comprehensive evaluation suite | ||
| - **Conflict Detector** - Detect evaluation inconsistencies | ||
| - **JudgeBench** - Judge capability assessment | ||
|
|
||
| [Read the evaluation guide](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/) | ||
|
|
||
| ## 📚 Documentation | ||
| ### Applications | ||
|
|
||
| **📖 [Complete Documentation](https://modelscope.github.io/RM-Gallery/)** - Full documentation site | ||
| - **Best-of-N Selection** - Choose optimal responses from candidates | ||
| - **Data Refinement** - Improve dataset quality with reward signals | ||
| - **RLHF Integration** - Use rewards in reinforcement learning pipelines | ||
| - **High-Performance Serving** - Deploy models with fault-tolerant infrastructure | ||
|
|
||
| ### Quick Links | ||
| ## Documentation | ||
|
|
||
| - **[5-Minute Quickstart](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started fast | ||
| - **[Interactive Examples](./examples/)** - Hands-on Jupyter notebooks | ||
| - **[Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)** - Create your own | ||
| - **[Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** - Train reward models | ||
| - **[API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)** - Complete API docs | ||
| - **[Changelog](./CHANGELOG.md)** - Version history and updates | ||
| - [Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/) | ||
| - [Interactive Examples](./examples/) | ||
| - [Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/) | ||
| - [Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/) | ||
| - [API Reference](https://modelscope.github.io/RM-Gallery/api_reference/) | ||
|
|
||
| ## Contributing | ||
|
|
||
| We welcome contributions! Please install pre-commit hooks before submitting pull requests: | ||
|
|
||
|
|
||
| ## 🤝 Contribute | ||
|
|
||
| Contributions are always encouraged! | ||
|
|
||
| We highly recommend install pre-commit hooks in this repo before committing pull requests. | ||
| These hooks are small house-keeping scripts executed every time you make a git commit, | ||
| which will take care of the formatting and linting automatically. | ||
| ```shell | ||
| ```bash | ||
| pip install -e . | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| pre-commit install | ||
| ``` | ||
|
|
||
| Please refer to our [Contribution Guide](./docs/contribution.md) for more details. | ||
| See our [contribution guide](./docs/contribution.md) for details. | ||
|
|
||
| ## 📝 Citation | ||
| ## Citation | ||
|
|
||
| Reference to cite if you use RM-Gallery in a paper: | ||
| If you use RM-Gallery in your research, please cite: | ||
|
|
||
| ``` | ||
| @software{ | ||
|
|
||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code example for creating a custom reward model is missing an import for
RewardResult. This will cause aNameErrorif a user tries to run this code. Please add the import to make the example runnable.