docs: 优化文档结构和内容

XiaoBoAI · XiaoBoAI · commit b154a3033dc6 · 2025-10-24T18:05:20.000+08:00
主要改进：
- 重构首页：精简内容从 489 行减至 236 行（52%↓）
- 优化导航：将 Reference 部分前置，删除不存在的 API Documentation 和 Changelog
- 清理 Jupyter Notebook 引用：删除所有 .ipynb 文件引用，修复 16+ 处错误链接
- 简化 Learning Paths：移除冗余子项描述，使路径更清晰
- 修复 Installation tabs：统一使用 pymdownx.tabbed 语法，移除扩展冲突
- 精简 Tutorial README：从 242 行减至 175 行（28%↓）
- 统一文档格式：将 'notebook' 改为 'guide'，保持一致性

影响的文件：
- 核心文档：index.md, quickstart.md, mkdocs.yml
- 教程文档：tutorial/README.md 及多个子教程
- 配置文件：sitemap.txt

这些改进让文档更加简洁、准确、易于导航。
diff --git a/docs/index.md b/docs/index.md
diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -11,14 +11,16 @@ Get started with RM-Gallery in just 5 minutes! This guide will walk you through
 
 ## Installation
 
-RM-Gallery requires Python >= 3.10 and < 3.13.
+> RM-Gallery requires **Python >= 3.10 and < 3.13**
 
 === "From PyPI"
+
     ```bash
     pip install rm-gallery
     ```
 
 === "From Source"
+
     ```bash
     git clone https://github.com/modelscope/RM-Gallery.git
     cd RM-Gallery
@@ -134,14 +136,6 @@ Use reward models in real applications:
 - **[Data Refinement](tutorial/rm_application/data_refinement.md)** - Improve data quality with RM
 - **[Post Training](tutorial/rm_application/post_training.md)** - Integrate with RLHF
 
-## Interactive Examples
-
-Want to try it hands-on? Check out our Jupyter Notebook examples:
-
-- **[Quickstart Notebook](../examples/quickstart.ipynb)** - Interactive version of this guide
-- **[Custom RM Tutorial](../examples/custom-rm.ipynb)** - Build your own reward model
-- **[Evaluation Pipeline](../examples/evaluation.ipynb)** - Complete evaluation workflow
-
 ## Common Scenarios
 
 ### Math Problems
diff --git a/docs/sitemap.txt b/docs/sitemap.txt
@@ -15,12 +15,6 @@
 - Data Pipeline: /tutorial/data/pipeline/
 - End-to-End Guide: /tutorial/end-to-end/
 
-## Examples (Interactive)
-
-- Quickstart Notebook: /examples/quickstart.ipynb
-- Custom RM Notebook: /examples/custom-rm.ipynb
-- Evaluation Pipeline: /examples/evaluation.ipynb
-
 ## Guides
 
 - Using Built-in RMs: /tutorial/building_rm/ready2use_rewards/
@@ -35,7 +29,6 @@
 
 - RM Library: /library/rm_library/
 - Rubric Library: /library/rubric_library/
-- API Documentation: /api_reference/
 
 ## Contribution
 
diff --git a/docs/tutorial/README.md b/docs/tutorial/README.md
@@ -8,58 +8,25 @@ Welcome to the RM-Gallery tutorial series! This directory contains comprehensive
 
 **Goal**: Get started with reward models in 30 minutes
 
-1. **[Quickstart Guide](../quickstart.md)** (5 min)
-   - Install RM-Gallery
-   - Use your first reward model
-   - Evaluate AI responses
-
-2. **[Building RM Overview](building_rm/overview.md)** (10 min)
-   - Understand reward model types
-   - Learn the architecture
-   - See examples
-
-3. **[Using Built-in RMs](building_rm/ready2use_rewards.md)** (15 min)
-   - Explore 35+ pre-built models
-   - Choose the right model
-   - Run evaluations
+1. **[Quickstart Guide](../quickstart.md)** - Install, use, and evaluate your first RM (5 min)
+2. **[Building RM Overview](building_rm/overview.md)** - Understand RM types and architecture (10 min)
+3. **[Using Built-in RMs](building_rm/ready2use_rewards.md)** - Explore 35+ pre-built models (15 min)
 
 ### 🚀 Intermediate Path
 
 **Goal**: Build and customize reward models
 
-1. **[Building Custom RMs](building_rm/custom_reward.md)** (30 min)
-   - Create rule-based rewards
-   - Build LLM-based rewards
-   - Use the Rubric-Critic-Score paradigm
-
-2. **[Data Pipeline](data/pipeline.md)** (20 min)
-   - Load data from various sources
-   - Process and transform data
-   - Export to different formats
-
-3. **[End-to-End Tutorial](end-to-end.md)** (30 min)
-   - Build a complete reward model from scratch
-   - Test and validate
-   - Deploy and use
+1. **[Building Custom RMs](building_rm/custom_reward.md)** - Create rule-based and LLM-based rewards (30 min)
+2. **[Data Pipeline](data/pipeline.md)** - Load, process, and transform data (20 min)
+3. **[End-to-End Tutorial](end-to-end.md)** - Complete workflow from data to deployment (30 min)
 
 ### 🎓 Advanced Path
 
 **Goal**: Train, evaluate, and deploy at scale
 
-1. **[Training RM Overview](training_rm/overview.md)** (15 min)
-   - Understand training paradigms
-   - Set up training environment
-   - Choose training strategy
-
-2. **[Training with VERL](training_rm/training_rm.md)** (60 min)
-   - Prepare training data
-   - Configure training
-   - Launch distributed training
-
-3. **[High-Performance Serving](rm_serving/rm_server.md)** (45 min)
-   - Deploy RM as a service
-   - Set up load balancing
-   - Monitor performance
+1. **[Training RM Overview](training_rm/overview.md)** - Understand training paradigms and setup (15 min)
+2. **[Training with VERL](training_rm/training_rm.md)** - Complete RL-based training workflow (60 min)
+3. **[High-Performance Serving](rm_serving/rm_server.md)** - Deploy RM as production service (45 min)
 
 ## 📚 Tutorial Catalog
 
@@ -86,10 +53,11 @@ Welcome to the RM-Gallery tutorial series! This directory contains comprehensive
 | Tutorial | Level | Time | Description |
 |----------|-------|------|-------------|
 | [Evaluation Overview](evaluation/overview.md) | Beginner | 10 min | Introduction to evaluation |
+| [RMB](evaluation/rmb.md) | Intermediate | 30 min | Reward Model Benchmark |
+| [RM-Bench](evaluation/rmbench.md) | Intermediate | 30 min | Subtlety and style evaluation |
+| [JudgeBench](evaluation/judgebench.md) | Intermediate | 30 min | Judge capability testing |
 | [RewardBench2](evaluation/rewardbench2.md) | Intermediate | 30 min | Latest benchmark |
 | [Conflict Detector](evaluation/conflict_detector.md) | Advanced | 45 min | Detect evaluation conflicts |
-| [JudgeBench](evaluation/judgebench.md) | Intermediate | 30 min | Judge capability testing |
-| [RM-Bench](evaluation/rmbench.md) | Intermediate | 30 min | Comprehensive evaluation |
 
 ### Data Processing
 
@@ -127,7 +95,7 @@ Welcome to the RM-Gallery tutorial series! This directory contains comprehensive
 
 **Test on benchmarks**
 → Read [Evaluation Overview](evaluation/overview.md)
-→ Try specific benchmarks (RewardBench2, RM-Bench, etc.)
+→ Try specific benchmarks: [RMB](evaluation/rmb.md), [RM-Bench](evaluation/rmbench.md), [RewardBench2](evaluation/rewardbench2.md)
 
 **Deploy to production**
 → Follow [RM Server Guide](rm_serving/rm_server.md)
@@ -164,11 +132,9 @@ Welcome to the RM-Gallery tutorial series! This directory contains comprehensive
 
 - [Quickstart Guide](../quickstart.md) - Get started in 5 minutes
 - [FAQ](../faq.md) - Common questions answered
-- [API Reference](../api_reference.md) - Complete API docs
 
 ### Interactive
 
-- [Jupyter Notebooks](../../examples/) - Hands-on tutorials
 - [End-to-End Tutorial](end-to-end.md) - Complete project
 
 ### Reference
@@ -177,21 +143,6 @@ Welcome to the RM-Gallery tutorial series! This directory contains comprehensive
 - [Rubric Library](../library/rubric_library.md) - Evaluation rubrics
 - [Contribution Guide](../contribution.md) - How to contribute
 
-## 📊 Tutorial Difficulty Legend
-
-- 🌱 **Beginner**: No prior experience needed
-- 🚀 **Intermediate**: Basic understanding required
-- 🎓 **Advanced**: In-depth knowledge helpful
-
-## ⏱️ Time Estimates
-
-Time estimates are for:
-- **Reading**: Understanding the concepts
-- **Coding**: Running and modifying examples
-- **Practice**: Experimenting with your own data
-
-Actual time may vary based on your experience level.
-
 ## 🆘 Getting Help
 
 **Stuck on a tutorial?**
@@ -203,25 +154,7 @@ Actual time may vary based on your experience level.
 
 **Found an error?**
 
-Please report it by:
-1. Opening a GitHub Issue
-2. Including the tutorial name
-3. Describing the problem
-4. Suggesting a fix (optional)
-
-## 🎓 Additional Resources
-
-### External Learning
-
-- **OpenAI Evals**: Similar evaluation framework
-- **RLHF Papers**: Academic background
-- **LLM Alignment**: Broader context
-
-### Community
-
-- **GitHub**: Source code and issues
-- **Discussions**: Q&A and ideas
-- **Examples**: Community contributions
+Please [open a GitHub Issue](https://github.com/modelscope/RM-Gallery/issues) with the tutorial name and problem description.
 
 ## 🚀 Next Steps
 
diff --git a/docs/tutorial/building_rm/autorubric.md b/docs/tutorial/building_rm/autorubric.md
@@ -289,7 +289,7 @@ Input preference data should be in JSONL format with the following structure:
 
 ### Data Loading & Conversion
 
-For loading and converting data from various sources (HuggingFace datasets, local files, etc.), we provide a unified data loading framework. See the **[Data Loading Tutorial](../data/load.ipynb)** for comprehensive examples.
+For loading and converting data from various sources (HuggingFace datasets, local files, etc.), we provide a unified data loading framework. See the **[Data Loading Tutorial](../data/load.md)** for comprehensive examples.
 
 **Quick Example - Load HelpSteer3 Preference Dataset:**
 
diff --git a/docs/tutorial/building_rm/benchmark_practices.md b/docs/tutorial/building_rm/benchmark_practices.md
@@ -1,7 +1,7 @@
 # Benchmark
 
 ## 1. Overview
-In this notebook, we will show the gallery's pipeline on built-in reward benchmark: [RewardBench2](https://huggingface.co/spaces/allenai/reward-bench) and [RMB Bench](https://github.com/Zhou-Zoey/RMB-Reward-Model-Benchmark).
+In this guide, we will show the gallery's pipeline on built-in reward benchmark: [RewardBench2](https://huggingface.co/spaces/allenai/reward-bench) and [RMB Bench](https://github.com/Zhou-Zoey/RMB-Reward-Model-Benchmark).
 
 ## 2. Setup
 
diff --git a/docs/tutorial/building_rm/custom_reward.md b/docs/tutorial/building_rm/custom_reward.md
@@ -1,6 +1,6 @@
 # Custom Reward Module Development Guide
 
-This notebook demonstrates how to create custom reward modules by extending the base classes in RM-Gallery.
+This guide demonstrates how to create custom reward modules by extending the base classes in RM-Gallery.
 
 ## 1. Overview
 Here's a structured reference listing of the key base classes, select appropriate base class based on evaluation strategy:
diff --git a/docs/tutorial/building_rm/overview.md b/docs/tutorial/building_rm/overview.md
@@ -1,7 +1,7 @@
 # End-to-End Pipeline: From Data to Reward
 
 ## 1. Overview
-This notebook demonstrates a complete workflow following these steps:
+This guide demonstrates a complete workflow following these steps:
 
 - **Data Preparation** - Load dataset from source and split into training (for AutoRubric) and test sets
 
@@ -26,7 +26,7 @@ os.environ["BASE_URL"] = ""
 ## 3. Data Preparation
 
 We'll start by loading our dataset using the flexible data loading module.
-You can read more from [Data Loading](../data/load.ipynb).
+You can read more from [Data Loading](../data/load.md).
 
 ```python
 # Implementation by creating base class
@@ -154,7 +154,7 @@ generated_reward_module = BaseHarmlessnessListWiseReward(
 ```
 
 ### 4.3. Customize Your Reward
-See more details in [Reward Customization](./custom_reward.ipynb).
+See more details in [Reward Customization](./custom_reward.md).
 
 ```python
 from typing import List
diff --git a/docs/tutorial/end-to-end.md b/docs/tutorial/end-to-end.md
@@ -567,9 +567,8 @@ cd RM-Gallery/examples/end_to_end/
 ## Additional Resources
 
 - 📚 [Full Documentation](../index.md)
-- 💻 [Interactive Notebooks](../../examples/)
 - 🤝 [Community Forum](https://github.com/modelscope/RM-Gallery/discussions)
-- 📝 [API Reference](../api_reference.md)
+- ❓ [FAQ](../faq.md)
 
 Happy building! 🚀
 
diff --git a/docs/tutorial/evaluation/overview.md b/docs/tutorial/evaluation/overview.md
@@ -261,7 +261,7 @@ Each benchmark page provides detailed setup instructions, code examples, and res
 
 ## Additional Resources
 
-- **[Building RM Overview](../building_rm/overview.ipynb)** - Learn how to build reward models
+- **[Building RM Overview](../building_rm/overview.md)** - Learn how to build reward models
 - **[RM Library](../../library/rm_library.md)** - Pre-built reward models
-- **[Best Practices](../building_rm/benchmark_practices.ipynb)** - Evaluation best practices
+- **[Best Practices](../building_rm/benchmark_practices.md)** - Evaluation best practices
 
diff --git a/docs/tutorial/training_rm/overview.md b/docs/tutorial/training_rm/overview.md
@@ -45,7 +45,7 @@ The data pipeline provides flexible loading and processing capabilities:
 - **Batch processing**: Efficient handling of large datasets
 - **Quality validation**: Built-in data quality checks
 
-[→ Learn about Data Pipeline](../data/pipeline.ipynb)
+[→ Learn about Data Pipeline](../data/pipeline.md)
 
 ---
 
@@ -56,7 +56,7 @@ Interactive annotation tools for creating high-quality training data:
 - **Quality control**: Inter-annotator agreement tracking
 - **Export formats**: Compatible with all training approaches
 
-[→ Learn about Data Annotation](../data/annotation.ipynb)
+[→ Learn about Data Annotation](../data/annotation.md)
 
 ---
 
@@ -67,7 +67,7 @@ Flexible data loading strategies for various sources:
 - **Custom sources**: Extensible loader architecture
 - **Streaming support**: Memory-efficient large dataset handling
 
-[→ Learn about Data Loading](../data/load.ipynb)
+[→ Learn about Data Loading](../data/load.md)
 
 ---
 
@@ -78,7 +78,7 @@ Transform and prepare data for training:
 - **Augmentation**: Expand training data diversity
 - **Train/validation splits**: Automated splitting with stratification
 
-[→ Learn about Data Processing](../data/process.ipynb)
+[→ Learn about Data Processing](../data/process.md)
 
 ---
 
@@ -397,7 +397,7 @@ results = evaluate_reward_model(
 Ready to start training? Follow this learning path:
 
 ### For Beginners
-1. **Start with data**: [Data Loading](../data/load.ipynb)
+1. **Start with data**: [Data Loading](../data/load.md)
 2. **Try Bradley-Terry**: [Pairwise Training](bradley_terry_rm.md)
 3. **Evaluate results**: [Evaluation Overview](../evaluation/overview.md)
 
@@ -409,13 +409,13 @@ Ready to start training? Follow this learning path:
 ### For Researchers
 1. **Compare approaches**: Test all three training methods
 2. **Benchmark extensively**: Use all evaluation tools
-3. **Iterate and improve**: [Data Refinement](../rm_application/data_refinement.ipynb)
+3. **Iterate and improve**: [Data Refinement](../rm_application/data_refinement.md)
 
 ## 12. Additional Resources
 
 ### Documentation
-- **[Data Pipeline](../data/pipeline.ipynb)** - Complete data preparation workflow
-- **[Building RM](../building_rm/overview.ipynb)** - Non-training reward model construction
+- **[Data Pipeline](../data/pipeline.md)** - Complete data preparation workflow
+- **[Building RM](../building_rm/overview.md)** - Non-training reward model construction
 - **[Evaluating RM](../evaluation/overview.md)** - Comprehensive evaluation guide
 
 ### Example Code
@@ -432,7 +432,7 @@ Ready to start training? Follow this learning path:
 
 **Ready to train your reward model? Choose your path:**
 
-- **[Data Pipeline](../data/pipeline.ipynb)** - Start with data preparation
+- **[Data Pipeline](../data/pipeline.md)** - Start with data preparation
 - **[Complete Training Guide](training_rm.md)** - Comprehensive VERL training
 - **[Bradley-Terry Training](bradley_terry_rm.md)** - Most common RLHF approach
 - **[SFT Training](sft_rm.md)** - Specialized evaluation models
diff --git a/docs/tutorial/training_rm/training_rm.md b/docs/tutorial/training_rm/training_rm.md
@@ -534,7 +534,7 @@ After training, look for **LoRA** or full weights in `checkpoints/<TIMESTAMP>/ac
 ## 11. Related Resources
 
 ### 11.1. Tutorial Documentation
-- **[Data Processing Tutorial](../data/process.ipynb)** - Comprehensive data handling techniques
+- **[Data Processing Tutorial](../data/process.md)** - Comprehensive data handling techniques
 
 ### 11.2. Framework Documentation
 - **[VERL Framework](https://github.com/volcengine/verl)**: Core training framework
diff --git a/docs/using_rm/boosting_strategy.md b/docs/using_rm/boosting_strategy.md
@@ -82,7 +82,7 @@ print(f"Best response: {best_sample.output[0].answer.content}")
 ## Related Topics
 
 For more applications of reward models, see:
-- [Post Training with RM](../tutorial/rm_application/post_training.ipynb)
-- [Data Refinement](../tutorial/rm_application/data_refinement.ipynb)
+- [Post Training with RM](../tutorial/rm_application/post_training.md)
+- [Data Refinement](../tutorial/rm_application/data_refinement.md)
 
 -->
diff --git a/mkdocs.yml b/mkdocs.yml