Merge pull request #25 from modelscope/docs/simplify-readme

helloml0326 · web-flow · commit ccf97f566a3a · 2025-11-05T19:20:59.000+08:00
Docs/simplify readme
diff --git a/README.md b/README.md
@@ -1,208 +1,126 @@
-<!-- # RM-Gallery: A One-Stop Reward Model Platform -->
-English | [**中文**](./README_zh.md)
-<h2 align="center">RM-Gallery: A One-Stop Reward Model Platform</h2>
-
-[![](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/)
-[![](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/)
-[![](https://img.shields.io/badge/Docs-English-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/)
-
-----
-
-## 🗂️ Table of Contents
-- [📢 News](#-news)
-- [🌟 Why RM-Gallery?](#-why-rm-gallery)
-- [📥 Installation](#-installation)
-- [🚀 RM Gallery Walkthrough](#-rm-gallery-walkthrough)
-  - [🏋️‍♂️ Training RM](#-training-rm)
-  - [🏗️ Building RM](#-building-rm)
-    -  [🧩 Use Built-in RMs Directly](#-use-built-in-rms-directly)
-    - [🛠️ Building Custom RMs](#-building-custom-rms)
-  - [🧪 Evaluating with Reward Model](#-evaluating-with-reward-model)
-    - [⚡ High-Performance RM Serving](#-high-performance-rm-serving)
-  - [🛠️ Reward Applications](#-reward-applications)
-- [📚 Documentation](#-documentation)
-- [🤝 Contribute](#-contribute)
-- [📝 Citation](#-citation)
-
-----
-
-## 📢 News
-- **[2025-07-09]** We release RM Gallery v0.1.0 now, which is also available in [PyPI](https://pypi.org/simple/rm-gallery/)!
-----
-
-## 🌟 Why RM-Gallery?
-
-RM-Gallery is a one-stop platform for training, building and applying reward models. It provides a comprehensive solution for implementing reward models at both task-level and atomic-level, with high-throughput and fault-tolerant capabilities.
+<div align="center">
 
 <p align="center">
- <img src="./docs/images/framework.png" alt="Framework" width="75%">
- <br/>
- <em>RM-Gallery Framework </em>
+  <img src="./docs/images/logo.svg" alt="RM-Gallery Logo" width="500">
 </p>
 
-### 🏋️‍♂️ Training RM
-- **Integrated RM Training Pipeline**: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework.
-<p align="center">
-  <img src="./docs/images/building_rm/helpsteer2_pairwise_training_RM-Bench_eval_accuracy.png" alt="Training RM Accuracy Curve" width="60%">
-  <br/>
-  <em>RM Training Pipeline improves accuracy on RM Bench</em>
-</p>
-This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%.
-
-### 🏗️ Building RM
-- **Unified Reward Model Architecture**: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise)
+<h3>A unified platform for building, evaluating, and applying reward models.</h3>
 
-- **Comprehensive RM Gallery**: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel.
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/)
+[![PyPI](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/)
+[![Documentation](https://img.shields.io/badge/docs-online-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/)
 
-- **Rubric-Critic-Score Paradigm**: Adopts the Rubric+Critic+Score-based reasoning Reward Model  paradigm, offering best practices to help users generate rubrics with limited preference data.
+[Documentation](https://modelscope.github.io/RM-Gallery/) | [Examples](./examples/) | [中文](./README_zh.md)
 
-<div style="display: flex; flex-wrap: wrap;">
-  <img src="./docs/images/building_rm/rewardbench2_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;">
-  <img src="./docs/images/building_rm/rmb_pairwise_exp_result.png" style="width: 48%; min-width: 200px; margin: 1%;">
 </div>
-The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1–3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise.
-
-### 🛠️ Applying RM
-
-- **Multiple Usage Scenarios**: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-N，data-correction)
 
-- **High-Performance RM Serving**: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency.
+## News
 
+- **2025-10-20** - [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314) - We released a new paper on learning generalizable reward criteria for robust modeling.
+- **2025-10-17** - [Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning](https://arxiv.org/abs/2510.15514) - We introduced techniques to align judge feedback and improve RL stability.
+- **2025-07-09** - Released RM-Gallery v0.1.0 on [PyPI](https://pypi.org/project/rm-gallery/)
 
+## Installation
 
-## 📥 Installation
-> RM Gallery requires **Python >= 3.10 and < 3.13**
-
-
-### 📦 Install From source
+RM-Gallery requires Python 3.10 or higher (< 3.13).
 
 ```bash
-# Pull the source code from GitHub
-git clone https://github.com/modelscope/RM-Gallery.git
-
-# Install the package
-pip install .
+pip install rm-gallery
 ```
 
-### Install From PyPi
+Or install from source:
 
 ```bash
-pip install rm-gallery
+git clone https://github.com/modelscope/RM-Gallery.git
+cd RM-Gallery
+pip install .
 ```
 
-## 🚀 Quick Start
-
-### Your First Reward Model
+## Quick Start
 
 ```python
 from rm_gallery.core.reward.registry import RewardRegistry
+from rm_gallery.core.data.schema import DataSample
 
-# 1. Choose a pre-built reward model
+# Choose from 35+ pre-built reward models
 rm = RewardRegistry.get("safety_listwise_reward")
 
-# 2. Prepare your data
-from rm_gallery.core.data.schema import DataSample
-sample = DataSample(...)  # See docs for details
-
-# 3. Evaluate
+# Evaluate your data
+sample = DataSample(...)
 result = rm.evaluate(sample)
-print(result)
 ```
 
-**That's it!** 🎉
-
-👉 **[5-Minute Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started in minutes
-
-👉 **[Interactive Notebooks](./examples/)** - Try it hands-on
+See the [quickstart guide](https://modelscope.github.io/RM-Gallery/quickstart/) for a complete example, or try our [interactive notebooks](./examples/).
 
+## Features
 
-## 📖 Key Features
+### Pre-built Reward Models
 
-### 🏗️ Building Reward Models
-
-Choose from **35+ pre-built reward models** or create your own:
+Access 35+ reward models for different domains:
 
 ```python
-# Use pre-built models
 rm = RewardRegistry.get("math_correctness_reward")
 rm = RewardRegistry.get("code_quality_reward")
 rm = RewardRegistry.get("helpfulness_listwise_reward")
-
-# Or build custom models
-class CustomReward(BasePointWiseReward):
-    def _evaluate(self, sample, **kwargs):
-        # Your custom logic here
-        return RewardResult(...)
 ```
 
-📚 **[See all available reward models →](https://modelscope.github.io/RM-Gallery/library/rm_library/)**
+[View all reward models](https://modelscope.github.io/RM-Gallery/library/rm_library/)
 
-### 🏋️‍♂️ Training Reward Models
+### Custom Reward Models
 
-Train your own reward models with VERL framework:
+Build your own reward models with simple APIs:
 
-```bash
-# Prepare data and launch training
-cd examples/train/pointwise
-./run_pointwise.sh
+```python
+from rm_gallery.core.reward import BasePointWiseReward
+
+class CustomReward(BasePointWiseReward):
+    def _evaluate(self, sample, **kwargs):
+        # Your evaluation logic
+        return RewardResult(...)
 ```
 
-📚 **[Training guide →](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)**
+[Learn more about building custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)
 
-### 🧪 Evaluating on Benchmarks
+### Benchmarking
 
-Test your models on standard benchmarks:
+Evaluate models on standard benchmarks:
 
 - **RewardBench2** - Latest reward model benchmark
-- **RM-Bench** - Comprehensive evaluation
-- **Conflict Detector** - Detect evaluation conflicts
-- **JudgeBench** - Judge capability evaluation
-
-📚 **[Evaluation guide →](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)**
-
-### 🛠️ Real-World Applications
-
-- **Best-of-N Selection** - Choose the best from multiple responses
-- **Data Refinement** - Improve data quality with reward feedback
-- **Post Training (RLHF)** - Integrate with reinforcement learning
-- **High-Performance Serving** - Deploy as scalable service
-
-📚 **[Application guides →](https://modelscope.github.io/RM-Gallery/)**
+- **RM-Bench** - Comprehensive evaluation suite
+- **Conflict Detector** - Detect evaluation inconsistencies
+- **JudgeBench** - Judge capability assessment
 
+[Read the evaluation guide](https://modelscope.github.io/RM-Gallery/tutorial/evaluation/overview/)
 
-## 📚 Documentation
+### Applications
 
-**📖 [Complete Documentation](https://modelscope.github.io/RM-Gallery/)** - Full documentation site
+- **Best-of-N Selection** - Choose optimal responses from candidates
+- **Data Refinement** - Improve dataset quality with reward signals
+- **RLHF Integration** - Use rewards in reinforcement learning pipelines
+- **High-Performance Serving** - Deploy models with fault-tolerant infrastructure
 
-### Quick Links
+## Documentation
 
-- **[5-Minute Quickstart](https://modelscope.github.io/RM-Gallery/quickstart/)** - Get started fast
-- **[Interactive Examples](./examples/)** - Hands-on Jupyter notebooks
-- **[Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)** - Create your own
-- **[Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)** - Train reward models
-- **[API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)** - Complete API docs
-- **[Changelog](./CHANGELOG.md)** - Version history and updates
+- [Quickstart Guide](https://modelscope.github.io/RM-Gallery/quickstart/)
+- [Interactive Examples](./examples/)
+- [Building Custom RMs](https://modelscope.github.io/RM-Gallery/tutorial/building_rm/custom_reward/)
+- [Training Guide](https://modelscope.github.io/RM-Gallery/tutorial/training_rm/overview/)
+- [API Reference](https://modelscope.github.io/RM-Gallery/api_reference/)
 
+## Contributing
 
+We welcome contributions! Please install pre-commit hooks before submitting pull requests:
 
-
-## 🤝 Contribute
-
-Contributions are always encouraged!
-
-We highly recommend install pre-commit hooks in this repo before committing pull requests.
-These hooks are small house-keeping scripts executed every time you make a git commit,
-which will take care of the formatting and linting automatically.
-```shell
+```bash
 pip install -e .
 pre-commit install
 ```
 
-Please refer to our [Contribution Guide](./docs/contribution.md) for more details.
+See our [contribution guide](./docs/contribution.md) for details.
 
-## 📝 Citation
+## Citation
 
-Reference to cite if you use RM-Gallery in a paper:
+If you use RM-Gallery in your research, please cite:
 
 ```
 @software{
diff --git a/docs/images/logo.svg b/docs/images/logo.svg
@@ -0,0 +1,23 @@
+<svg width="600" height="120" viewBox="0 0 600 120" xmlns="http://www.w3.org/2000/svg">
+  <defs>
+    <linearGradient id="rmGradient" x1="0%" y1="0%" x2="100%" y2="0%" gradientUnits="objectBoundingBox" gradientTransform="rotate(135 .5 .5)">
+      <stop offset="0%" stop-color="#22d3ee" />
+      <stop offset="30%" stop-color="#3b82f6" />
+      <stop offset="70%" stop-color="#6366f1" />
+      <stop offset="100%" stop-color="#8b5cf6" />
+    </linearGradient>
+    <linearGradient id="galleryGradient" x1="0%" y1="0%" x2="100%" y2="0%" gradientUnits="objectBoundingBox" gradientTransform="rotate(135 .5 .5)">
+      <stop offset="0%" stop-color="#6366f1" />
+      <stop offset="30%" stop-color="#8b5cf6" />
+      <stop offset="70%" stop-color="#a855f7" />
+      <stop offset="100%" stop-color="#ec4899" />
+    </linearGradient>
+  </defs>
+
+  <text x="50%" y="50%" text-anchor="middle" dominant-baseline="central"
+        font-family="Inter, 'SF Pro Display', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif"
+        font-size="72" font-weight="700" letter-spacing="-0.03em">
+    <tspan fill="url(#rmGradient)">RM</tspan><tspan dx="0.12em" fill="url(#galleryGradient)">Gallery</tspan>
+  </text>
+</svg>
+
diff --git a/docs/index.md b/docs/index.md
@@ -14,10 +14,8 @@ show_datetime: true
 
 <div style="text-align: center; margin: 3rem 0 2rem 0;">
   <div style="display: inline-block; position: relative;">
-    <div style="font-size: 4.5rem; font-weight: 700; letter-spacing: -0.03em; line-height: 0.9; margin-bottom: 1rem; font-family: 'Inter', 'SF Pro Display', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;">
-      <span style="background: linear-gradient(135deg, #22d3ee 0%, #3b82f6 30%, #6366f1 70%, #8b5cf6 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; text-shadow: 0 0 25px rgba(59, 130, 246, 0.3);">RM</span><span style="background: linear-gradient(135deg, #6366f1 0%, #8b5cf6 30%, #a855f7 70%, #ec4899 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; text-shadow: 0 0 25px rgba(139, 92, 246, 0.3);">Gallery</span>
-    </div>
-    <div style="position: absolute; top: -10px; left: -10px; right: -10px; bottom: -10px; background: radial-gradient(ellipse at center, rgba(59, 130, 246, 0.1) 0%, transparent 70%); border-radius: 20px; z-index: -1;"></div>
+    <img src="./images/logo.svg" alt="RM-Gallery Logo" style="display: block; width: 520px; max-width: 80vw; z-index: 1; position: relative;">
+    <div style="position: absolute; top: -10px; left: -10px; right: -10px; bottom: -10px; background: radial-gradient(ellipse at center, rgba(59, 130, 246, 0.12) 0%, transparent 70%); border-radius: 20px; z-index: 0;"></div>
   </div>
 </div>