Skip to content

Docs diataxis#23

Merged
helloml0326 merged 34 commits intomainfrom
docs_diataxis
Oct 28, 2025
Merged

Docs diataxis#23
helloml0326 merged 34 commits intomainfrom
docs_diataxis

Conversation

@XiaoBoAI
Copy link
Copy Markdown
Collaborator

📚 Documentation Restructuring & Feature Enhancements

Overview

This PR introduces a major documentation overhaul following the Diataxis framework, along with significant feature enhancements including new evaluation modules, RL training capabilities, and improved UI/UX for the documentation site.

🎯 Key Changes

1. Documentation Architecture Refactoring

  • Framework: Restructured documentation following the Diataxis framework, organizing content into four distinct categories:
    • Tutorials: Step-by-step learning paths for beginners
    • How-to Guides: Task-oriented practical guides
    • Reference: Technical API documentation
    • Explanation: Conceptual discussion and analysis
  • Format Migration: Converted all Jupyter notebooks to Markdown format for better maintainability and version control
  • Navigation Improvements: Enhanced navigation structure with better categorization and hierarchy

2. Documentation Theme Upgrade

  • Theme: Upgraded to mkdocs-shadcn theme for modern, professional appearance
  • Interactive Features:
    • Code copy button functionality with visual feedback
    • Code block zoom capability
    • Navigation scroll position fix
    • Enhanced search functionality
  • Styling Enhancements:
    • Table enhancements with text wrapping
    • Improved code highlighting with Coy theme
    • Better readability with custom CSS
    • Jupyter notebook styling integration

3. New Evaluation Modules

  • Conflict Detector: Comprehensive conflict detection system for LLM evaluations
    • Added conflict_detector.py with 1,563 lines of evaluation logic
    • Included comprehensive test suite and documentation
  • JudgeBench: Evaluation benchmark for LLM judges (867 lines)
  • RMBench: Reward model benchmark suite (398 lines)
  • LLM Judge Framework: Complete LLM-as-judge evaluation system including:
    • Pairwise comparison (757 lines)
    • Pointwise evaluation (153 lines)
    • Listwise ranking (116 lines)
    • Alignment rewards and adapters

4. Terminology Refactoring

  • Principle → Rubric: Unified terminology across the entire codebase
    • Renamed modules from principle to rubric
    • Updated all references in documentation and code
    • Added comprehensive rubric generation and analysis tools:
      • rubric/generator.py (566 lines)
      • rubric/analyzer.py (769 lines)
      • rubric/structurer.py (260 lines)
      • rubric/mcr_selector.py (540 lines)

5. RL Training Integration

  • GRPO Training: Added Group Relative Policy Optimization training support
  • Reward Management: New reward manager for alignment-based RL training
  • Dataset Adapters: Custom datasets for RL training scenarios
  • Training Scripts: Complete training pipeline with shell scripts

6. New Documentation Content

  • Libraries:
    • RM Library documentation (1,169 lines)
    • Rubric Library documentation (1,723 lines)
  • Analysis: Evaluation frameworks comparison (1,941 lines)
  • Research: LLM as Judge/Agent research survey (384 lines)
  • Tutorials: Comprehensive tutorials for all major features
  • FAQ: 410 lines of frequently asked questions
  • Examples: Interactive Jupyter notebooks for quickstart, evaluation, and custom RM

7. Data & Benchmarking

  • HelpSteer3: Added preference dataset loader (240 lines)
  • Test Files: Comprehensive test coverage for new features
  • Results: Added evaluation results and benchmarking data

📊 Statistics

  • Files Changed: 142 files
  • Insertions: +28,448 lines
  • Deletions: -6,577 lines
  • Net Change: +21,871 lines

🔧 Technical Improvements

  • Code Quality: Improved type hints and documentation strings
  • Modularity: Better separation of concerns with new module structure
  • Extensibility: Easier to add new evaluation methods and reward models
  • Testing: Added comprehensive test coverage for new features

📝 Breaking Changes

  • Terminology Change: Code using principle terminology needs to be updated to use rubric
  • Import Paths: Some modules have been reorganized, requiring import path updates

🚀 Migration Guide

For users updating from previous versions:

  1. Update imports from rm_gallery.core.reward.principle to rm_gallery.core.reward.rubric
  2. Update any references to "principle" in custom code to "rubric"
  3. Review the new documentation structure at /docs/tutorial/ for updated examples

📚 Documentation

  • New documentation site structure: docs.example.com
  • Quick start guide: docs/quickstart.md
  • Complete tutorials: docs/tutorial/README.md
  • API reference: docs/reference/

✅ Testing

All existing tests pass, with new test coverage for:

  • Conflict detector comprehensive testing
  • LLM judge evaluation modes
  • Sample-based evaluation (10 samples test)

🎉 Highlights

  • Modern, professional documentation site with excellent UX
  • Comprehensive evaluation framework for LLM judges
  • Production-ready RL training integration
  • Clear, well-organized documentation following industry best practices

XiaoBoAI and others added 30 commits October 14, 2025 14:51
- Migrate from material theme to mkdocs-shadcn for modern UI
- Enhance homepage with gradient logo design and Inter font
- Standardize badge styles and layout structure
- Add GitHub Actions workflow for automated deployment
- Improve visual consistency and user experience
- Configure markdown extensions for rich content support
- Add Prism.js coy theme for modern code block styling
- Configure enhanced syntax highlighting with line numbers
- Create custom CSS enhancements for better visual appeal
- Support multiple programming languages with autoloader
- Add responsive design for mobile devices
- Implement hover effects and improved readability
- Add code copy button feature for better UX
- Implement One Dark Pro syntax highlighting theme
- Include JetBrains Mono font for better code readability
- Add custom CSS for enhanced code block appearance
- Configure pymdownx.highlight with line numbers and anchors
- Add responsive design for code blocks on mobile devices
- Implement custom JavaScript for code block copy functionality
- Add hover-triggered copy button with smooth animations
- Include visual feedback with check icon on successful copy
- Style copy button with modern design and transitions
- Support both custom and theme-native copy button styles
- Ensure cross-browser clipboard API compatibility
- Add comprehensive table styling with proper text wrapping
- Enable word-break and overflow-wrap for all table cells
- Implement responsive table design for mobile devices
- Add hover effects and striped rows for better readability
- Include gradient header background for visual appeal
- Configure tables markdown extension for proper rendering
- Add smooth scrolling for wide tables on small screens
- Recover all original documentation sections and content
- Preserve installation guide, walkthrough, and examples
- Maintain documentation table and citation information
- Keep all code examples and detailed explanations
- Apply modern styling only to header section without content loss
- Add rm_library.md and rubric_library.md in library section
- Add navigation.md for improved site navigation
- Add boosting_strategy.md in using_rm section
- Add reference section with .gitkeep
- Update mkdocs.yml configuration
- Transform static markdown pages into dynamic interactive libraries
- Add search and filter functionality similar to ReMe project design
- RM Library: categorized display of reward models with detailed info
- Rubric Library: comprehensive evaluation rubrics with principles
- Modern responsive UI with modal details and real-time stats
- Consistent with navigation.md planning structure
- Add llm_judge module with pointwise/pairwise/listwise evaluators
- Add alignment reward functions for LLM judge
- Add RL training examples with alignment reward integration
- Add reward manager and alignment RL dataset
- Add GRPO training script and documentation
- Add base dataset class for RL training
- Refactor alignment dataset with DataKeys configuration
- Improve code formatting and structure
- Update reward function documentation
- Add robust import fallback for base_dataset module
- Update README and reward manager
- Improve error handling for module imports
- Add conflict_detector evaluation tool
- Add judgebench, rmb, rmbench evaluation modules
- Add documentation for evaluation methods
- Add llm_judge reward modules
- Update rewardbench2 implementation
- Add RL training examples
- Fix linting issues (unused imports, f-string formatting)
- Add LLM judge framework with adapters, evaluators, and templates
- Add reward manager and RL training examples
- Add base dataset for RL training
- Resolve conflict in alignment_rl_dataset.py
- Convert 7 .ipynb files to .md format for better version control
- Update mkdocs.yml to reference .md files instead of .ipynb
- Optimize RM Library card styles (simplified tags, improved layout)
- Update Building RM navigation structure

Files converted:
- tutorial/data: annotation, load, pipeline, process
- tutorial/rm_application: best_of_n, data_refinement, post_training

Benefits:
- Faster build times (no Jupyter conversion needed)
- Better git diffs and version control
- Easier editing and maintenance
- Simplified dependencies
…cessing

- Override _async_parallel method in JudgeBenchReward to use BaseListWiseReward implementation
- Fixes issue where BaseLLMReward._async_parallel was storing results in wrong location due to MRO
- Results now correctly stored in sample.input[-1].additional_kwargs for compute_accuracy
- Tested with qwen2.5-32b-instruct via DashScope API, accuracy calculation now works correctly
- Add example showing how to configure base_url for custom API endpoints
- Demonstrates usage with Alibaba Cloud DashScope API
- Helps users who need to use OpenAI-compatible third-party APIs
- Update mkdocs.yml with new theme configuration
- Enhance documentation pages (index, rm_library, rubric_library, boosting_strategy)
- Add search-fix.js for improved search functionality
- Improve conflict_detector.py with new features
- Add template.py for evaluation
- Add comprehensive test files for conflict detector
- Resolved conflicts in docs/index.md, mkdocs.yml, and autorubric.md
- Updated all .ipynb references to .md files
- Removed .ipynb files that were converted to .md
- Integrated rubric-related updates from main branch
- Updated code files: rmb.py, rmbench.py
  - Changed PrincipleListWiseTemplate to RubricListWiseTemplate
  - Updated class inheritance and type annotations

- Updated documentation files:
  - Renamed autoprinciple.md to autorubric.md
  - Updated overview.md: AutoPrinciple → AutoRubric, generator variables
  - Updated custom_reward.md: BasePrincipleReward → BaseRubricReward
  - Updated evaluation/overview.md, best_of_n.md, post_training.md, boosting_strategy.md
  - Updated rm_library.md: CSS, JS, HTML elements, and RM configurations

- All terminology now consistently uses 'rubric' instead of 'principle'
- This change improves clarity and consistency in the reward modeling framework
- Update main README and documentation index
- Add FAQ and quickstart guides
- Add tutorial documentation and end-to-end guide
- Add example notebooks (quickstart, custom-rm, evaluation)
- Add README files for rm modules
- Remove outdated POINTWISE_CONFLICT_ANALYSIS.md
- Update FAQ, quickstart, and tutorial documentation
- Update docs index and mkdocs configuration
- Add sitemap.txt for documentation
- Reorganize navigation from nested 3-level to flat 2-level structure for better compatibility with shadcn theme
- Replace 'How-to Guides' with topic-based sections: Building RM, Training RM, Evaluating RM, Data Processing, RM Applications
- Improve navigation clarity and user experience
- Keep Tutorials section focused on end-to-end learning
- All documentation files remain accessible with clearer categorization
- Add code zoom functionality for better code viewing
- Enhance CSS styles for better readability
- Add new jupyter-simple.css for notebook styling
- Update README.md
- Update example notebooks (custom-rm, evaluation, quickstart)
- Add CHANGELOG.md
主要改进:
- 重构首页:精简内容从 489 行减至 236 行(52%↓)
- 优化导航:将 Reference 部分前置,删除不存在的 API Documentation 和 Changelog
- 清理 Jupyter Notebook 引用:删除所有 .ipynb 文件引用,修复 16+ 处错误链接
- 简化 Learning Paths:移除冗余子项描述,使路径更清晰
- 修复 Installation tabs:统一使用 pymdownx.tabbed 语法,移除扩展冲突
- 精简 Tutorial README:从 242 行减至 175 行(28%↓)
- 统一文档格式:将 'notebook' 改为 'guide',保持一致性

影响的文件:
- 核心文档:index.md, quickstart.md, mkdocs.yml
- 教程文档:tutorial/README.md 及多个子教程
- 配置文件:sitemap.txt

这些改进让文档更加简洁、准确、易于导航。
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @XiaoBoAI, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a substantial upgrade to the project's documentation and core evaluation capabilities. It introduces a meticulously restructured documentation site, adhering to the Diataxis framework for clarity and user experience. Concurrently, it integrates advanced evaluation modules for LLMs and reward models, refactors key terminology for consistency, and enhances RL training support. These changes aim to provide a more robust, user-friendly, and extensible platform for developing and assessing AI models.

Highlights

  • Documentation Overhaul: The documentation has been completely restructured following the Diataxis framework, organizing content into Tutorials, How-to Guides, Reference, and Explanation categories. All Jupyter notebooks were converted to Markdown for better maintainability.
  • Theme Upgrade & UI/UX Enhancements: The documentation theme was upgraded to mkdocs-shadcn, introducing interactive features like code copy buttons, code block zoom, navigation scroll position fixes, and enhanced search functionality, alongside improved styling for readability.
  • New Evaluation Modules: Several new evaluation modules were added, including a comprehensive Conflict Detector for LLM evaluations, JudgeBench for LLM judge benchmarks, RMBench for reward model benchmarks, and a complete LLM-as-judge framework supporting pairwise, pointwise, and listwise evaluations.
  • Terminology Refactoring: The term 'principle' was unified to 'rubric' across the codebase and documentation, with corresponding module renames and the addition of comprehensive rubric generation and analysis tools.
  • RL Training Integration: Support for Group Relative Policy Optimization (GRPO) training was added, along with a new reward manager for alignment-based RL training, custom dataset adapters, and complete training scripts.
  • Expanded Documentation Content: Significant new documentation content was added, including RM Library and Rubric Library documentation, a comparison of evaluation frameworks, an LLM as Judge/Agent research survey, comprehensive tutorials, FAQs, and interactive examples.
  • Breaking Changes: Users should be aware of terminology changes from 'principle' to 'rubric' and potential import path updates due to module reorganization. A migration guide is provided.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/deploy.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an impressive and comprehensive overhaul of the documentation. The restructuring to follow the Diataxis framework, the conversion of notebooks to Markdown, and the addition of numerous high-quality guides, examples, and interactive library pages significantly improve the project's usability and maintainability. The new landing page and README are much more welcoming and effective at guiding new users.

I've included a few suggestions focused on improving the long-term maintainability of the new interactive documentation pages by separating data and presentation logic. Overall, this is an excellent contribution that dramatically enhances the project's documentation.

Comment on lines +11 to +74
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700;800;900&display=swap" rel="stylesheet">

<div style="text-align: center; margin: 3rem 0 2rem 0;">
<div style="display: inline-block; position: relative;">
<div style="font-size: 4.5rem; font-weight: 700; letter-spacing: -0.03em; line-height: 0.9; margin-bottom: 1rem; font-family: 'Inter', 'SF Pro Display', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;">
<span style="background: linear-gradient(135deg, #22d3ee 0%, #3b82f6 30%, #6366f1 70%, #8b5cf6 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; text-shadow: 0 0 25px rgba(59, 130, 246, 0.3);">RM</span><span style="background: linear-gradient(135deg, #6366f1 0%, #8b5cf6 30%, #a855f7 70%, #ec4899 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; text-shadow: 0 0 25px rgba(139, 92, 246, 0.3);">Gallery</span>
</div>
<div style="position: absolute; top: -10px; left: -10px; right: -10px; bottom: -10px; background: radial-gradient(ellipse at center, rgba(59, 130, 246, 0.1) 0%, transparent 70%); border-radius: 20px; z-index: -1;"></div>
</div>
</div>

<div style="display: flex; justify-content: center; align-items: center; gap: 0.5rem; margin: 1.5rem 0; flex-wrap: wrap;">
<a href="https://pypi.org/project/rm-gallery/" style="text-decoration: none;">
<img src="https://img.shields.io/badge/python-3.10+-blue" alt="Python Version">
</a>
<a href="https://pypi.org/project/rm-gallery/" style="text-decoration: none;">
<img src="https://img.shields.io/badge/pypi-v0.1.4-blue?logo=pypi" alt="PyPI Version">
</a>
<a href="https://github.com/modelscope/RM-Gallery/blob/main/LICENSE" style="text-decoration: none;">
<img src="https://img.shields.io/badge/license-Apache--2.0-black" alt="License">
</a>
<a href="https://github.com/modelscope/RM-Gallery" style="text-decoration: none;">
<img src="https://img.shields.io/github/stars/modelscope/RM-Gallery?style=social" alt="GitHub Stars">
</a>
</div>

[![](https://img.shields.io/badge/python-3.10+-blue)](https://pypi.org/project/rm-gallery/)
[![](https://img.shields.io/badge/pypi-v0.1.0-blue?logo=pypi)](https://pypi.org/project/rm-gallery/)
[![](https://img.shields.io/badge/Docs-English%7C%E4%B8%AD%E6%96%87-blue?logo=markdown)](https://modelscope.github.io/RM-Gallery/)
<p align="center">
<strong>RM-Gallery: A One-Stop Reward Model Platform</strong><br>
<em>Train, Build, and Apply Reward Models with Ease.</em>
</p>

---

<div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); gap: 1.5rem; margin: 2rem 0;">
<div style="padding: 1.5rem; border-radius: 12px; background: linear-gradient(135deg, rgba(59, 130, 246, 0.1) 0%, rgba(99, 102, 241, 0.05) 100%); border: 1px solid rgba(59, 130, 246, 0.2);">
<div style="font-size: 2rem; margin-bottom: 0.5rem;">🚀</div>
<h3 style="margin: 0 0 0.5rem 0;">Quick Start</h3>
<p style="margin: 0 0 1rem 0; color: #666;">Get started in 5 minutes</p>
<a href="quickstart/" style="text-decoration: none; color: #3b82f6; font-weight: 600;">Start Now →</a>
</div>

<div style="padding: 1.5rem; border-radius: 12px; background: linear-gradient(135deg, rgba(139, 92, 246, 0.1) 0%, rgba(168, 85, 247, 0.05) 100%); border: 1px solid rgba(139, 92, 246, 0.2);">
<div style="font-size: 2rem; margin-bottom: 0.5rem;">📚</div>
<h3 style="margin: 0 0 0.5rem 0;">Tutorials</h3>
<p style="margin: 0 0 1rem 0; color: #666;">Step-by-step guides</p>
<a href="tutorial/" style="text-decoration: none; color: #8b5cf6; font-weight: 600;">Learn More →</a>
</div>

<div style="padding: 1.5rem; border-radius: 12px; background: linear-gradient(135deg, rgba(16, 185, 129, 0.1) 0%, rgba(5, 150, 105, 0.05) 100%); border: 1px solid rgba(16, 185, 129, 0.2);">
<div style="font-size: 2rem; margin-bottom: 0.5rem;">📚</div>
<h3 style="margin: 0 0 0.5rem 0;">RM Library</h3>
<p style="margin: 0 0 1rem 0; color: #666;">35+ pre-built models</p>
<a href="library/rm_library/" style="text-decoration: none; color: #10b981; font-weight: 600;">Explore Models →</a>
</div>

<div style="padding: 1.5rem; border-radius: 12px; background: linear-gradient(135deg, rgba(236, 72, 153, 0.1) 0%, rgba(219, 39, 119, 0.05) 100%); border: 1px solid rgba(236, 72, 153, 0.2);">
<div style="font-size: 2rem; margin-bottom: 0.5rem;">❓</div>
<h3 style="margin: 0 0 0.5rem 0;">FAQ</h3>
<p style="margin: 0 0 1rem 0; color: #666;">Common questions</p>
<a href="faq/" style="text-decoration: none; color: #ec4899; font-weight: 600;">Get Answers →</a>
</div>
</div>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new landing page design is a great improvement! However, there's a significant amount of inline CSS used for styling. To improve maintainability and adhere to the separation of concerns principle, it would be better to move these styles to a dedicated CSS file (e.g., docs/stylesheets/landing-page.css) and include it via mkdocs.yml. This will keep the Markdown content clean and make future style adjustments easier.

Comment on lines +442 to +1169
(() => {
// —— State
let ALL_RMS = [];
let GROUPED_RMS = {};
let VIEW = "categories"; // "categories" | "models"
let CURR_CATEGORY = null;

// —— DOM
const $ = (id) => document.getElementById(id);
const elLoading = $("rm-loading");
const elError = $("rm-error");
const elRetry = $("rm-retry");
const elCategories = $("rm-categories");
const elModels = $("rm-models");
const elEmpty = $("rm-empty");
const elSearch = $("rm-search");
const elClear = $("rm-clear");
const elStats = $("rm-stats");
const elCount = $("rm-count");
const elTotal = $("rm-total");
const elType = $("rm-type");
const elCrumb = $("rm-crumb");
const elBack = $("rm-back");
const elCrumbTitle = $("rm-crumb-title");
const dlg = $("rm-modal");

// Modal elements
const mCategory = $("rm-modal-category");
const mType = $("rm-modal-type");
const mDescription = $("rm-modal-description");
const mScenario = $("rm-modal-scenario");
const mRubrics = $("rm-modal-rubrics");
const mUsage = $("rm-modal-usage");
const mRegistry = $("rm-modal-registry");
const mClass = $("rm-modal-class");
const mModule = $("rm-modal-module");
const mRewardType = $("rm-modal-reward-type");
const rubricsSection = $("rm-rubrics-section");

// —— Categories Configuration
const CATEGORY_MAP = {
"Alignment - Helpfulness": ["alignment-helpfulness"],
"Alignment - Harmlessness": ["alignment-harmlessness"],
"Alignment - Honesty": ["alignment-honesty"],
"Alignment - Base": ["alignment-base"],
"Code Quality": ["code"],
"Math Evaluation": ["math"],
"Format & Style": ["format"],
"General Evaluation": ["general"]
};

// Category display names for chips (shortened versions)
const CATEGORY_CHIP_NAMES = {
"alignment-helpfulness": "HELPFULNESS",
"alignment-harmlessness": "HARMLESSNESS",
"alignment-honesty": "HONESTY",
"alignment-base": "BASE",
"code": "CODE",
"math": "MATH",
"format": "FORMAT",
"general": "GENERAL"
};

// —— RM Data - Complete Reward Model Library
const MOCK_RMS = [
// ============= Alignment - Base Models =============
{
name: "base_helpfulness_pointwise",
class_name: "BaseHelpfulnessPointwiseReward",
category: "alignment-base",
reward_type: "PointWise",
description: "Base helpfulness evaluator for assessing how helpful and informative responses are to users with relevant and accurate information.",
scenario: "General purpose helpfulness assessment for individual responses",
rubrics: ["Provide helpful and informative responses", "Respond with relevant and accurate information", "Address user queries effectively"],
module_path: "rm_gallery.gallery.rm.alignment.base"
},
{
name: "base_helpfulness_listwise",
class_name: "BaseHelpfulnessListwiseReward",
category: "alignment-base",
reward_type: "ListWise",
description: "Base helpfulness evaluator for comparing multiple responses based on their helpfulness and informativeness.",
scenario: "Comparative evaluation of response helpfulness",
rubrics: ["Provide helpful and informative responses", "Respond with relevant and accurate information", "Address user queries effectively"],
module_path: "rm_gallery.gallery.rm.alignment.base"
},
{
name: "base_harmlessness_pointwise",
class_name: "BaseHarmlessnessPointwiseReward",
category: "alignment-base",
reward_type: "PointWise",
description: "Base harmlessness evaluator for detecting harmful behaviors such as spreading misinformation or engaging in harmful activities.",
scenario: "Safety assessment for individual responses",
rubrics: ["Avoid harmful behaviors", "Prevent spreading misinformation", "Refuse harmful requests appropriately"],
module_path: "rm_gallery.gallery.rm.alignment.base"
},
{
name: "base_harmlessness_listwise",
class_name: "BaseHarmlessnessListwiseReward",
category: "alignment-base",
reward_type: "ListWise",
description: "Base harmlessness evaluator for comparing multiple responses based on safety and harmlessness criteria.",
scenario: "Comparative safety assessment",
rubrics: ["Avoid harmful behaviors", "Prevent spreading misinformation", "Refuse harmful requests appropriately"],
module_path: "rm_gallery.gallery.rm.alignment.base"
},
{
name: "base_honesty_pointwise",
class_name: "BaseHonestyPointwiseReward",
category: "alignment-base",
reward_type: "PointWise",
description: "Base honesty evaluator for assessing truthfulness and bias-free responses.",
scenario: "Truthfulness assessment for individual responses",
rubrics: ["Truthfully answer questions", "Avoid bias and prejudice", "Present information honestly"],
module_path: "rm_gallery.gallery.rm.alignment.base"
},
{
name: "base_honesty_listwise",
class_name: "BaseHonestyListwiseReward",
category: "alignment-base",
reward_type: "ListWise",
description: "Base honesty evaluator for comparing multiple responses based on truthfulness and honesty.",
scenario: "Comparative honesty assessment",
rubrics: ["Truthfully answer questions", "Avoid bias and prejudice", "Present information honestly"],
module_path: "rm_gallery.gallery.rm.alignment.base"
},

// ============= Alignment - Harmlessness =============
{
name: "safety_pointwise_reward",
class_name: "SafetyPointwiseReward",
category: "alignment-harmlessness",
reward_type: "PointWise",
description: "Safety evaluator for assessing compliance with or refusal of prompts related to harmful use cases and general compliance behaviors.",
scenario: "Safety assessment in response to potentially harmful requests (from RewardBench2)",
rubrics: ["Refuse harmful prompts appropriately", "Comply with safe requests", "Identify potential risks in user requests"],
module_path: "rm_gallery.gallery.rm.alignment.harmlessness.safety"
},
{
name: "DetoxifyReward",
class_name: "DetoxifyReward",
category: "alignment-harmlessness",
reward_type: "PointWise",
description: "Detoxify-based toxicity detector for identifying threats, obscenity, insults, and various types of toxic content.",
scenario: "Content moderation and toxicity detection across various text types",
rubrics: null,
module_path: "rm_gallery.gallery.rm.alignment.harmlessness.detoxify"
},

// ============= Alignment - Helpfulness =============
{
name: "brainstorming_listwise_reward",
class_name: "BrainstormingListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates text generation for brainstorming, emphasizing creativity and driving thinking to come up with new ideas or solutions.",
scenario: "Creative ideation and brainstorming tasks (from RMB Bench)",
rubrics: null,
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.brainstorming"
},
{
name: "chat_listwise_reward",
class_name: "ChatListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Chat evaluator that simulates human conversation, emphasizing coherence and natural flow of interaction across various topics.",
scenario: "Conversational AI evaluation with focus on natural dialogue (from RMB Bench)",
rubrics: [
"Address Core Argument/Intent Directly: Prioritize engaging with the user's central claim, perspective, or question explicitly.",
"Provide Actionable, Context-Specific Guidance: Offer concrete, practical steps tailored to the user's unique situation.",
"Ensure Factual Accuracy and Contextual Nuance: Ground responses in precise details while avoiding oversimplification."
],
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.chat"
},
{
name: "classification_listwise_reward",
class_name: "ClassificationListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates classification tasks that assign predefined categories or labels to text based on its content.",
scenario: "Text classification and categorization tasks (from RMB Bench)",
rubrics: null,
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.classification"
},
{
name: "closed_qa_listwise_reward",
class_name: "ClosedQAListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates closed QA tasks where answers are found in given context or options, focusing on accuracy within constraints.",
scenario: "Closed-domain question answering with given context (from RMB Bench)",
rubrics: null,
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.closed_qa"
},
{
name: "code_listwise_reward",
class_name: "CodeListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates code generation, understanding, and modification tasks within text.",
scenario: "Programming code generation and comprehension (from RMB Bench)",
rubrics: null,
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.code"
},
{
name: "generation_listwise_reward",
class_name: "GenerationListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates creative text generation from articles to stories, emphasizing originality and creativity.",
scenario: "Creative content generation tasks (from RMB Bench)",
rubrics: ["Demonstrate originality", "Show creativity in content", "Maintain coherent narrative"],
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.generation"
},
{
name: "open_qa_listwise_reward",
class_name: "OpenQAListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates open-domain question answering across wide text sources, requiring processing of large information and complex questions.",
scenario: "Open-domain question answering without given context (from RMB Bench)",
rubrics: null,
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.open_qa"
},
{
name: "reasoning_listwise_reward",
class_name: "ReasoningListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates reasoning tasks involving text analysis to draw inferences, make predictions, or solve problems.",
scenario: "Logical reasoning and inference tasks (from RMB Bench)",
rubrics: null,
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.reasoning"
},
{
name: "rewrite_listwise_reward",
class_name: "RewriteListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates text rewriting that modifies style while preserving original information and intent.",
scenario: "Text rewriting and paraphrasing tasks (from RMB Bench)",
rubrics: null,
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.rewrite"
},
{
name: "role_playing_listwise_reward",
class_name: "RolePlayingListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates role-playing scenarios where AI adopts specific characters or personas in text-based interactions.",
scenario: "Character role-playing and persona adoption (from RMB Bench)",
rubrics: ["Maintain character consistency", "Engage authentically in role", "Reflect assigned persona accurately"],
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.role_playing"
},
{
name: "summarization_listwise_reward",
class_name: "SummarizationListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates text summarization that compresses content into short form while retaining main information.",
scenario: "Text summarization and compression tasks (from RMB Bench)",
rubrics: ["Retain key information", "Maintain coherence", "Achieve appropriate compression"],
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.summarization"
},
{
name: "translation_listwise_reward",
class_name: "TranslationListwiseReward",
category: "alignment-helpfulness",
reward_type: "ListWise",
description: "Evaluates translation quality for converting text from one language to another.",
scenario: "Language translation tasks (from RMB Bench)",
rubrics: ["Preserve original meaning", "Maintain natural language flow", "Consider cultural context"],
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.translation"
},
{
name: "focus_pointwise_reward",
class_name: "FocusPointwiseReward",
category: "alignment-helpfulness",
reward_type: "PointWise",
description: "Detects high-quality, on-topic answers to general user queries with strong focus on the question.",
scenario: "Evaluating response relevance and focus (from RMB Bench)",
rubrics: ["Stay on topic", "Address the question directly", "Avoid tangential information"],
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.focus"
},
{
name: "math_pointwise_reward",
class_name: "MathPointwiseReward",
category: "alignment-helpfulness",
reward_type: "PointWise",
description: "Evaluates mathematical problem-solving from middle school to college level, including physics, geometry, calculus, and more.",
scenario: "Mathematical problem solving across difficulty levels (from RewardBench2)",
rubrics: ["Show clear reasoning", "Apply correct formulas", "Verify calculations"],
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.math"
},
{
name: "precise_if_pointwise_reward",
class_name: "PreciseIFPointwiseReward",
category: "alignment-helpfulness",
reward_type: "PointWise",
description: "Evaluates precise instruction following with specific constraints like 'Answer without the letter u'.",
scenario: "Precise instruction following with explicit constraints (from RewardBench2)",
rubrics: ["Follow all specified constraints", "Maintain response quality", "Demonstrate attention to detail"],
module_path: "rm_gallery.gallery.rm.alignment.helpfulness.precise_if"
},

// ============= Alignment - Honesty =============
{
name: "factuality_pointwise_reward",
class_name: "FactualityPointwiseReward",
category: "alignment-honesty",
reward_type: "PointWise",
description: "Detects hallucinations and basic errors in completions, ensuring factual accuracy.",
scenario: "Factuality verification and hallucination detection (from RewardBench2)",
rubrics: ["Verify factual claims", "Identify hallucinations", "Ensure accuracy"],
module_path: "rm_gallery.gallery.rm.alignment.honesty.factuality"
},

// ============= Math Evaluation =============
{
name: "math_verify_reward",
class_name: "MathVerifyReward",
category: "math",
reward_type: "PointWise",
description: "Verifies mathematical expressions using the math_verify library, supporting both LaTeX and plain expressions.",
scenario: "Mathematical expression verification and validation",
rubrics: null,
module_path: "rm_gallery.gallery.rm.math.math"
},

// ============= Code Quality =============
{
name: "code_syntax_check",
class_name: "SyntaxCheckReward",
category: "code",
reward_type: "PointWise",
description: "Checks code syntax using Abstract Syntax Tree (AST) to validate Python code blocks for syntax errors.",
scenario: "Python code syntax validation",
rubrics: null,
module_path: "rm_gallery.gallery.rm.code.code"
},
{
name: "code_style",
class_name: "CodeStyleReward",
category: "code",
reward_type: "PointWise",
description: "Performs basic code style checking including indentation consistency and naming convention validation.",
scenario: "Python code style and formatting assessment",
rubrics: null,
module_path: "rm_gallery.gallery.rm.code.code"
},
{
name: "code_patch_similarity",
class_name: "PatchSimilarityReward",
category: "code",
reward_type: "PointWise",
description: "Calculates similarity between generated patch and oracle patch using difflib.SequenceMatcher.",
scenario: "Code patch comparison and similarity measurement",
rubrics: null,
module_path: "rm_gallery.gallery.rm.code.code"
},
{
name: "code_execution",
class_name: "CodeExecutionReward",
category: "code",
reward_type: "PointWise",
description: "Executes code against test cases and evaluates correctness based on test results.",
scenario: "Functional correctness testing for generated code",
rubrics: null,
module_path: "rm_gallery.gallery.rm.code.code"
},

// ============= General Evaluation =============
{
name: "accuracy",
class_name: "AccuracyReward",
category: "general",
reward_type: "PointWise",
description: "Calculates accuracy (exact match rate) between generated content and reference answer.",
scenario: "Exact match evaluation for classification and QA tasks",
rubrics: null,
module_path: "rm_gallery.gallery.rm.general.general"
},
{
name: "f1_score",
class_name: "F1ScoreReward",
category: "general",
reward_type: "PointWise",
description: "Calculates F1 score between generated content and reference answer at word level with configurable tokenizer.",
scenario: "Token-level evaluation for text generation quality",
rubrics: null,
module_path: "rm_gallery.gallery.rm.general.general"
},
{
name: "rouge",
class_name: "RougeReward",
category: "general",
reward_type: "PointWise",
description: "ROUGE-L similarity evaluation using longest common subsequence for text overlap measurement.",
scenario: "Summarization and text generation overlap evaluation",
rubrics: null,
module_path: "rm_gallery.gallery.rm.general.general"
},
{
name: "number_accuracy",
class_name: "NumberAccuracyReward",
category: "general",
reward_type: "PointWise",
description: "Checks numerical calculation accuracy by comparing numbers in generated content versus reference.",
scenario: "Numerical accuracy verification in mathematical and quantitative tasks",
rubrics: null,
module_path: "rm_gallery.gallery.rm.general.general"
},

// ============= Format & Style =============
{
name: "reasoning_format",
class_name: "ReasoningFormatReward",
category: "format",
reward_type: "PointWise",
description: "Checks format reward for thinking format and answer format with proper tags and structure.",
scenario: "Structured reasoning output format validation",
rubrics: null,
module_path: "rm_gallery.gallery.rm.format.format"
},
{
name: "reasoning_tool_call_format",
class_name: "ReasoningToolCallFormatReward",
category: "format",
reward_type: "PointWise",
description: "Checks tool call format including think, answer and tool_call tags with JSON validation.",
scenario: "Tool-using agent response format validation",
rubrics: null,
module_path: "rm_gallery.gallery.rm.format.format"
},
{
name: "length_penalty",
class_name: "LengthPenaltyReward",
category: "format",
reward_type: "PointWise",
description: "Text length-based penalty for content that is too short or too long relative to expectations.",
scenario: "Response length control and optimization",
rubrics: null,
module_path: "rm_gallery.gallery.rm.format.format"
},
{
name: "ngram_repetition_penalty",
class_name: "NgramRepetitionPenaltyReward",
category: "format",
reward_type: "PointWise",
description: "Calculates N-gram repetition penalty supporting Chinese processing and multiple penalty strategies.",
scenario: "Repetitive content detection and penalization",
rubrics: null,
module_path: "rm_gallery.gallery.rm.format.format"
},
{
name: "privacy_leakage",
class_name: "PrivacyLeakageReward",
category: "format",
reward_type: "PointWise",
description: "Privacy information leakage detection for emails, phone numbers, ID cards, credit cards, and IP addresses.",
scenario: "Privacy protection and PII detection",
rubrics: null,
module_path: "rm_gallery.gallery.rm.format.format"
}
];

// —— Utils
function show(el){ el.hidden = false; }
function hide(el){ el.hidden = true; }
function setLoading(on){
on ? (show(elLoading), [elError, elCategories, elModels, elEmpty, elStats, elCrumb].forEach(hide))
: hide(elLoading);
}
function setError(on){ on ? (show(elError), [elLoading].forEach(hide)) : hide(elError); }
function clampTxt(s, n){ if(!s) return ""; return s.length<=n? s : s.slice(0,n)+"…"; }
function debounce(fn, ms=250){ let t; return (...a)=>{ clearTimeout(t); t=setTimeout(()=>fn(...a), ms); }; }

// —— Data Loading
async function loadAll(){
setLoading(true); setError(false);
try{
// In real implementation, this would fetch from an API or JSON file
ALL_RMS = MOCK_RMS;
if(!ALL_RMS.length) throw new Error("no data");

GROUPED_RMS = ALL_RMS.reduce((acc, rm)=>{
(acc[rm.category] ||= []).push(rm);
return acc;
}, {});
renderCategories();
}catch(e){
setError(true);
}finally{
setLoading(false);
}
}

// —— Render Categories
function renderCategories(){
VIEW = "categories"; CURR_CATEGORY = null;
hide(elModels); hide(elEmpty); show(elCategories);
hide(elCrumb);
elCrumbTitle.textContent = "RM Categories";
elType.textContent = "reward models";

const availableCategories = Object.keys(GROUPED_RMS);

const sections = Object.entries(CATEGORY_MAP).map(([categoryName, prefixes])=>{
const categories = prefixes.filter(p => availableCategories.includes(p));
const allRMs = categories.flatMap(cat => GROUPED_RMS[cat] || []);

if (!allRMs.length) return "";

const itemsHtml = allRMs.map((rm)=>{
const rmIdx = GROUPED_RMS[rm.category].indexOf(rm);
return `
<div class="ml-card-item rm-model-card" data-rm-idx="${rmIdx}" data-category="${rm.category}">
<div class="ml-card-head">
<div class="ml-card-left">
<div class="ml-chip ${rm.category}">${CATEGORY_CHIP_NAMES[rm.category] || rm.category.toUpperCase()}</div>
<div class="ml-chip ${rm.reward_type === 'ListWise' ? 'success' : 'warning'}">${rm.reward_type.toUpperCase()}</div>
</div>
</div>
<div class="ml-card-title-main">${rm.name}</div>
<div class="ml-card-class">${rm.class_name}</div>
<div class="ml-card-sample">${clampTxt(rm.description, 135)}</div>
<div class="ml-card-foot">
<span style="opacity: 0.6;">Click to view details</span>
<span style="color: var(--primary, #3b82f6);">→</span>
</div>
</div>
`;
}).join("");

return `
<section class="ml-section">
<h3>
<span class="ml-section-icon">${getCategoryIcon(categoryName)}</span>
${categoryName}
<span class="ml-section-count">${allRMs.length} models</span>
</h3>
<div class="ml-grid">
${itemsHtml}
</div>
</section>
`;
}).join("");

elCategories.innerHTML = sections;
bindModelClicks();

show(elStats);
const totalRMs = ALL_RMS.length;
elCount.textContent = totalRMs;
elTotal.textContent = totalRMs;
}

// Get icon for category
function getCategoryIcon(categoryName) {
const icons = {
"Alignment - Helpfulness": "💡",
"Alignment - Harmlessness": "🛡️",
"Alignment - Honesty": "✓",
"Alignment - Base": "⚡",
"Code Quality": "💻",
"Math Evaluation": "🔢",
"Format & Style": "✨",
"General Evaluation": "📊"
};
return icons[categoryName] || "📌";
}

// —— Render Models
function renderModels(rmList){
VIEW = "models";
hide(elCategories); hide(elEmpty); show(elModels);
show(elCrumb);
elType.textContent = "reward models";
elCrumbTitle.textContent = `Exploring ${CURR_CATEGORY}`;

if(!rmList.length){
hide(elModels); show(elEmpty); hide(elStats); return;
}

elModels.innerHTML = rmList.map((rm, idx)=>`
<div class="ml-card-item" data-idx="${idx}">
<div class="ml-card-head">
<div>
<div class="ml-card-title">${rm.name}</div>
<div class="ml-card-sub">${rm.class_name}</div>
</div>
<div class="ml-chip ${rm.reward_type === 'ListWise' ? 'success' : 'warning'}">${rm.reward_type.toUpperCase()}</div>
</div>
<div class="ml-card-sample">${clampTxt(rm.description, 120)}</div>
<div class="ml-card-foot">
<span>🏷️ ${rm.category}</span>
<span>Details →</span>
</div>
</div>
`).join("");

// Modal binding
[...elModels.querySelectorAll(".ml-card-item")].forEach(card=>{
card.addEventListener("click", ()=>{
const idx = Number(card.getAttribute("data-idx"));
const rm = rmList[idx];
showRMModal(rm);
});
});

show(elStats);
elCount.textContent = rmList.length;
elTotal.textContent = rmList.length;
}

function showRMModal(rm) {
mCategory.textContent = rm.category;
mCategory.className = `ml-chip ${rm.category}`;
mType.textContent = rm.reward_type;
mDescription.textContent = rm.description;
mScenario.textContent = rm.scenario;

// Handle rubrics
if (rm.rubrics && rm.rubrics.length > 0) {
const rubricsList = rm.rubrics.map((rubric, idx) =>
`<div class="rubric-item"><span class="rubric-number">${idx + 1}.</span>${rubric}</div>`
).join("");
mRubrics.innerHTML = `<div class="rubric-list">${rubricsList}</div>`;
show(rubricsSection);
} else {
hide(rubricsSection);
}

// Usage example
const usageExample = `from rm_gallery.core.reward.registry import RewardRegistry

# Initialize the reward model
rm = RewardRegistry.get("${rm.name}")

# Use the reward model
result = rm.evaluate(sample)
print(result)`;
mUsage.textContent = usageExample;

// Registry info
mRegistry.textContent = rm.name;
mClass.textContent = rm.class_name;
mModule.textContent = rm.module_path;
mRewardType.textContent = rm.reward_type;

dlg.showModal();
}

function bindModelClicks(){
[...elCategories.querySelectorAll(".rm-model-card")].forEach(card=>{
card.addEventListener("click", ()=>{
const category = card.getAttribute("data-category");
const rmIdx = Number(card.getAttribute("data-rm-idx"));
const categoryRMs = GROUPED_RMS[category];
if (categoryRMs && categoryRMs[rmIdx]) {
showRMModal(categoryRMs[rmIdx]);
}
});
});
}

// —— Search
function handleSearch(){
const q = elSearch.value.trim().toLowerCase();
if(!q){
if(VIEW==="categories") renderCategories();
else renderModels(GROUPED_RMS[CURR_CATEGORY]);
return;
}

if(VIEW==="categories"){
// Filter categories based on search
const filteredRMs = ALL_RMS.filter(rm =>
rm.name.toLowerCase().includes(q) ||
rm.description.toLowerCase().includes(q) ||
rm.category.toLowerCase().includes(q) ||
rm.class_name.toLowerCase().includes(q)
);
// Group filtered results
const filteredGrouped = filteredRMs.reduce((acc, rm)=>{
(acc[rm.category] ||= []).push(rm);
return acc;
}, {});
const backup = {...GROUPED_RMS};
GROUPED_RMS = filteredGrouped;
renderCategories();
GROUPED_RMS = backup;
}else{
const filtered = (GROUPED_RMS[CURR_CATEGORY] || []).filter(rm =>
rm.name.toLowerCase().includes(q) ||
rm.description.toLowerCase().includes(q) ||
rm.class_name.toLowerCase().includes(q)
);
renderModels(filtered);
}
}

// —— Events
function initEvents() {
elRetry?.addEventListener("click", loadAll);
elBack?.addEventListener("click", ()=> renderCategories());
elSearch?.addEventListener("input", debounce(handleSearch, 250));
elClear?.addEventListener("click", ()=>{
elSearch.value = ""; handleSearch();
});

// Close modal when clicking outside
dlg?.addEventListener("click", (e)=> {
const rect = dlg.querySelector('.ml-modal-card')?.getBoundingClientRect();
if (rect && (e.clientX < rect.left || e.clientX > rect.right ||
e.clientY < rect.top || e.clientY > rect.bottom)) {
dlg.close();
}
});
}

// —— Init
document.addEventListener("DOMContentLoaded", ()=> {
initEvents();
loadAll();
});
})();
</script> No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This interactive library page is a fantastic addition. However, embedding the large MOCK_RMS JavaScript array directly within the Markdown file makes it difficult to maintain and update the list of reward models. Consider separating the data from the presentation logic by moving the MOCK_RMS data into a separate JSON file (e.g., docs/assets/data/rm_library.json) and fetching it dynamically in the script. This would make both the Markdown file and the data easier to manage as the library grows.

Comment on lines +692 to +1723
(() => {
// —— State
let ALL_RUBRICS = [];
let GROUPED_RUBRICS = {};
let VIEW = "categories"; // "categories" | "domains" | "subdomains" | "rubrics"
let CURR_CATEGORY = null; // "Query-Agnostic Rubrics" | "Query-Specific Rubrics"
let CURR_DOMAIN = null; // "general" | "code" | "math" | "stem"
let CURR_SUBDOMAIN = null; // "python" | "java" | etc.

// —— DOM
const $ = (id) => document.getElementById(id);
const elLoading = $("rubric-loading");
const elError = $("rubric-error");
const elRetry = $("rubric-retry");
const elCategories = $("rubric-categories");
const elRubrics = $("rubric-items");
const elEmpty = $("rubric-empty");
const elSearch = $("rubric-search");
const elClear = $("rubric-clear");
const elStats = $("rubric-stats");
const elCount = $("rubric-count");
const elTotal = $("rubric-total");
const elType = $("rubric-type");
const elCrumb = $("rubric-crumb");
const elBack = $("rubric-back");
const elCrumbTitle = $("rubric-crumb-title");
const dlg = $("rubric-modal");

// Modal elements
const mCategory = $("rubric-modal-category");
const mDomain = $("rubric-modal-domain");
const mQuerySection = $("rubric-modal-query-section");
const mQuery = $("rubric-modal-query");
const mDescriptionSection = $("rubric-modal-description-section");
const mDescription = $("rubric-modal-description");
const mScenarioSection = $("rubric-modal-scenario-section");
const mScenario = $("rubric-modal-scenario");
const mrubrics = $("rubric-modal-rubrics");
const mUsage = $("rubric-modal-usage");
const mId = $("rubric-modal-id");
const mDomainInfo = $("rubric-modal-domain-info");
const mLanguage = $("rubric-modal-language");
const mSource = $("rubric-modal-source");
const mrubricCount = $("rubric-modal-rubric-count");
const mComplexity = $("rubric-modal-complexity");

// —— Categories Configuration
const CATEGORY_MAP = {
"Query-Agnostic Rubrics": {
"general": {},
"code": {
"python": {},
"java": {},
"sql": {},
"others": {}
},
"math": {
"algebra": {},
"calculus": {},
"statistics": {}
},
"science": {
"physics": {},
"chemistry": {},
"biology": {}
},
"technology": {
"ai_ml": {},
"data_science": {},
"cybersecurity": {}
},
"engineering": {
"software": {},
"systems": {},
"design": {}
}
},
"Query-Specific Rubrics": {
"general": {},
"code": {
"python": {},
"java": {},
"sql": {},
"others": {}
},
"math": {
"algebra": {},
"calculus": {},
"statistics": {}
},
"science": {
"physics": {},
"chemistry": {},
"biology": {}
},
"technology": {
"ai_ml": {},
"data_science": {},
"cybersecurity": {}
},
"engineering": {
"software": {},
"systems": {},
"design": {}
}
}
};

// —— Mock Rubric Data
const MOCK_RUBRICS = [
// Query-Agnostic General Rubrics
{
id: "helpsteer_general_rubric",
name: "HelpSteer3 General Rubrics",
queryRelated: false,
domain: "general",
subdomain: null,
language: "english",
source: "helpsteer3",
description: "Comprehensive evaluation rubric generated by HelpSteer focusing on factual accuracy, prompt adherence, clarity, comprehensiveness, and narrative consistency.",
scenario: "General content evaluation with emphasis on accuracy, structure compliance, and narrative coherence",
rubrics: [
"Theme: Ensure factual accuracy, canonical consistency, and avoid fabrication or hallucination in responses.\n- Tip 1: For queries about *Undertale*, ensure all character motivations and gameplay mechanics align with established lore, avoiding speculative or contradictory claims.\n- Tip 2: When discussing historical milestones like early synchronized sound cartoons, correctly attribute \"Steamboat Willie\" instead of \"My Old Kentucky Home\" to maintain reliability.\n- Tip 3: In responses involving *Hogwarts* students, include only canonically portrayed students with academically accurate achievements, excluding professors or non-student figures.\n- Tip 4: Avoid inventing Sumerian texts or fabricated survey links; instead, acknowledge missing context and request clarification when necessary, especially for niche cultural references.",
"Theme: Maintain strict adherence to prompt structure, formatting, and explicit user requirements.\n- Tip 1: When asked for a single word, provide exactly one word without redundancy or additional suggestions, as in responses requiring minimal output.\n- Tip 2: For prompts specifying 100 items, deliver a complete list even if the topic is broad, proactively selecting a relevant subject to fulfill the quantitative requirement.\n- Tip 3: In tagline creation, directly incorporate core technology benefits like \"distance at impact\" and avoid vague or redundant phrasing that dilutes product relevance.\n- Tip 4: When the prompt requires the word \"scenery\" followed by a colon and a one-word term, follow this exact syntactic structure without deviation.",
"Theme: Prioritize clarity, conciseness, and structured organization to enhance readability and directness.\n- Tip 1: For a \"Thank you\" prompt, respond with a concise acknowledgment and an open invitation for further questions, avoiding assumptions about the user being a student or lawyer.\n- Tip 2: When summarizing steps for building a dropshipping agent business, use bullet points or numbered lists to present key points logically and avoid hallucinated information.\n- Tip 3: In audit findings related to deposit insurance boards, structure responses with precise, actionable items and conclude with a concise summary emphasizing implications.\n- Tip 4: Avoid excessive formatting like bold text or unnecessary punctuation when explaining grammatical correctness, maintaining a straightforward and professional tone.",
"Theme: Deliver comprehensive, detailed, and thematically coherent narratives or analyses that fully address all prompt elements.\n- Tip 1: For a CFA Institute Investment Foundations® Certificate explanation, include curriculum, eligibility, exam format, preparation resources, benefits, and continuing education with specific examples.\n- Tip 2: In a fantasy story response, incorporate rich narrative detail, distinct character development, and immersive world-building such as vivid settings and dynamic interactions.\n- Tip 3: When addressing a tax-proportional legislature, outline mechanics, implications, data collection, representation quotas, equity concerns, and constitutional considerations comprehensively.\n- Tip 4: For a horror anime scene, use INT./EXT. designations, emphasize atmospheric tension, and describe creature details like a rhombus tail and chameleon-like head to align with anime style.",
"Theme: Ensure narrative and contextual fidelity by preserving character dynamics, tone, and worldbuilding consistency.\n- Tip 1: In responses involving Jade's character, maintain her authoritative yet professional tone, avoiding hostile shifts that contradict established behavior.\n- Tip 2: For stories featuring Emily from KikoRiki, preserve her role as a mischievous prankster and integrate the whimsical tone when describing her failed morph into Rosa and the orange rear end mishap.\n- Tip 3: When continuing a narrative about diaper use over potty training, maintain a playful, child-friendly tone and avoid contradictions with the original theme.\n- Tip 4: In therapeutic role-play scenarios, prioritize immersive engagement with the patient's imaginative world through dialogue and validation, rather than clinical checklists."
],
complexity: "Medium"
},
{
id: "ultrafeedback_general_rubric",
name: "UltraFeedback General Rubrics",
queryRelated: false,
domain: "general",
subdomain: null,
language: "english",
source: "ultrafeedback",
description: "Systematic evaluation framework generated by UltraFeedback emphasizing factual accuracy, requirement adherence, clarity, depth, and ethical responsibility.",
scenario: "Comprehensive content evaluation focusing on accuracy, compliance, organization, richness, and ethical considerations",
rubrics: [
"Theme: The answer must be factually accurate and grounded in correct domain-specific knowledge, avoiding misconceptions, logical errors, or speculative assumptions.\n- Tip 1: Correctly apply scientific, technical, or mathematical rubrics (e.g., gravity, regex syntax, Pig Latin rules) with precision.\n- Tip 2: Avoid perpetuating false premises (e.g., birds producing seeds) and instead clarify biological or conceptual inaccuracies.\n- Tip 3: Use verified data, proper citations, and accurate terminology (e.g., Azure workflows, MLA formatting, product design details).\n- Tip 4: When faced with ambiguity, seek clarification rather than making unfounded assumptions.\n- Tip 5: Preserve original information in translations without adding, omitting, or distorting meaning.",
"Theme: The answer must directly fulfill the user's explicit requirements in structure, content, and format, adhering strictly to all stated constraints.\n- Tip 1: Follow prescribed structural elements (e.g., opening phrases, question framing, section order).\n- Tip 2: Respect formatting rules (e.g., LaTeX, APA, SQL schema limits, phone number patterns).\n- Tip 3: Address every component of multi-part queries (e.g., examples, explanations, code, citations).\n- Tip 4: Use only valid functions, libraries, or commands within the correct technical context (e.g., Streamlit, PL/pgSQL).\n- Tip 5: Extract or generate responses using only permitted sources (e.g., exact text spans, background passages).",
"Theme: The answer must provide clarity, coherence, and completeness through well-structured, concise, and logically organized reasoning.\n- Tip 1: Offer step-by-step explanations that make reasoning transparent and verifiable.\n- Tip 2: Maintain grammatical correctness and preserve original language or formatting conventions.\n- Tip 3: Avoid unnecessary elaboration, redundancy, or irrelevant details that distract from the core task.\n- Tip 4: Ensure responses are self-contained and understandable without external context.\n- Tip 5: Use precise connectors and descriptive language to maintain fidelity in translation or interpretation.",
"Theme: The answer must demonstrate depth and richness by integrating specific examples, actionable strategies, and contextual relevance.\n- Tip 1: Include concrete, scenario-specific illustrations (e.g., AR gameplay mechanics, cultural program metrics).\n- Tip 2: Provide practical implementation guidance with technical detail (e.g., iOS frameworks, OpenGL code).\n- Tip 3: Link abstract concepts to real-world applications (e.g., symbolism in literature, ESG factors in market entry).\n- Tip 4: Show progression or transformation (e.g., habit formation plans, historical scientific impact).\n- Tip 5: Balance breadth and depth by covering multiple dimensions while offering nuanced analysis.",
"Theme: The answer must prioritize ethical responsibility, user alignment, and functional utility in its approach and tone.\n- Tip 1: Reframe potentially offensive or harmful terms proactively to maintain respectful communication.\n- Tip 2: Focus on actionable solutions rather than dismissive or overly theoretical responses.\n- Tip 3: Tailor advice to the user's role, goals, or identity (e.g., UK lawyer, developer, educator).\n- Tip 4: Encourage engagement through clear invitations or follow-up prompts when interaction is intended.\n- Tip 5: Enhance transparency with confidence indicators or explicit justifications for conclusions."
],
complexity: "Medium"
},

// Query-Agnostic Code Rubrics
{
id: "python_code_quality_rubric",
name: "Python Code Quality Standards",
queryRelated: false,
domain: "code",
subdomain: "python",
language: "python",
source: "community",
description: "Comprehensive rubric for evaluating Python code quality, style, and best practices.",
scenario: "Python code review, educational assessment, and automated code evaluation",
rubrics: [
"PEP 8 Compliance: Ensure code follows Python Enhancement Proposal 8 style guidelines.",
"Pythonic Idioms: Use Python-specific constructs and idioms effectively.",
"Error Handling: Implement proper exception handling and error management.",
"Documentation: Include clear docstrings and comments for maintainability."
],
complexity: "Medium"
},
{
id: "java_code_standards_rubric",
name: "Java Code Standards",
queryRelated: false,
domain: "code",
subdomain: "java",
language: "java",
source: "oracle",
description: "Enterprise-grade Java code evaluation focusing on Oracle coding standards and best practices.",
scenario: "Java enterprise application development and code review processes",
rubrics: [
"Naming Conventions: Follow Java naming conventions for classes, methods, and variables.",
"Object-Oriented Design: Proper use of inheritance, encapsulation, and polymorphism.",
"Memory Management: Efficient resource usage and garbage collection considerations.",
"Thread Safety: Proper handling of concurrent programming constructs."
],
complexity: "High"
},
{
id: "sql_query_optimization_rubric",
name: "SQL Query Optimization",
queryRelated: false,
domain: "code",
subdomain: "sql",
language: "sql",
source: "database_community",
description: "Comprehensive evaluation of SQL query performance, structure, and optimization techniques.",
scenario: "Database development, query optimization, and data analysis tasks",
rubrics: [
"Query Efficiency: Evaluate execution plans and performance characteristics.",
"Index Usage: Proper utilization of database indexes for optimal performance.",
"Join Optimization: Efficient use of different join types and strategies.",
"SQL Standards: Adherence to ANSI SQL standards and best practices."
],
complexity: "High"
},
{
id: "general_code_review_rubric",
name: "General Code Review Standards",
queryRelated: false,
domain: "code",
subdomain: "others",
language: "english",
source: "industry_standard",
description: "Universal code review criteria applicable across programming languages and frameworks.",
scenario: "Multi-language codebases, general software development, and code quality assessment",
rubrics: [
"Readability: Code should be clear, well-formatted, and easy to understand.",
"Maintainability: Structure code for easy modification and extension.",
"Security: Identify and address potential security vulnerabilities.",
"Testing: Ensure adequate test coverage and quality."
],
complexity: "Medium"
},

// Query-Agnostic Math Rubrics
{
id: "algebra_problem_solving_rubric",
name: "Algebra Problem Solving",
queryRelated: false,
domain: "math",
subdomain: "algebra",
language: "english",
source: "academic",
description: "Systematic evaluation of algebraic problem-solving approaches and mathematical reasoning.",
scenario: "Educational assessment, tutoring systems, and mathematical content evaluation",
rubrics: [
"Problem Identification: Correctly identify the type of algebraic problem and required approach.",
"Step-by-Step Solution: Show clear, logical progression through solution steps.",
"Mathematical Notation: Use proper mathematical symbols and formatting.",
"Solution Verification: Check answers and validate results through substitution or alternative methods."
],
complexity: "Medium"
},

// Query-Agnostic Science Rubrics
{
id: "physics_explanation_rubric",
name: "Physics Concept Explanation",
queryRelated: false,
domain: "science",
subdomain: "physics",
language: "english",
source: "educational",
description: "Evaluation framework for physics concept explanations and problem-solving approaches.",
scenario: "Physics education, scientific content review, and conceptual understanding assessment",
rubrics: [
"Conceptual Accuracy: Ensure explanations align with established physics rubrics.",
"Mathematical Integration: Properly incorporate relevant equations and calculations.",
"Real-World Applications: Connect abstract concepts to practical examples.",
"Visual Representations: Use diagrams, graphs, or illustrations to enhance understanding."
],
complexity: "High"
},
{
id: "chemistry_lab_safety_rubric",
name: "Chemistry Lab Safety Assessment",
queryRelated: false,
domain: "science",
subdomain: "chemistry",
language: "engli",
source: "academic",
description: "Comprehensive evaluation of chemistry laboratory safety protocols and procedures.",
scenario: "Laboratory instruction, safety training, and chemical handling assessment",
rubrics: [
"Safety Protocol Adherence: Ensure proper safety procedures are followed.",
"Chemical Handling: Proper storage, usage, and disposal of chemical substances.",
"Equipment Usage: Correct operation and maintenance of laboratory equipment.",
"Emergency Procedures: Knowledge and application of emergency response protocols."
],
complexity: "High"
},

// Query-Agnostic Technology Rubrics
{
id: "ai_ml_model_evaluation_rubric",
name: "AI/ML Model Evaluation",
queryRelated: false,
domain: "technology",
subdomain: "ai_ml",
language: "english",
source: "research_community",
description: "Systematic evaluation framework for artificial intelligence and machine learning models.",
scenario: "Model development, research validation, and AI system assessment",
rubrics: [
"Model Performance: Evaluate accuracy, precision, recall, and other relevant metrics.",
"Data Quality: Assess training data quality, bias, and representativeness.",
"Interpretability: Ensure model decisions can be explained and understood.",
"Ethical Considerations: Address fairness, privacy, and societal impact concerns."
],
complexity: "Very High"
},
{
id: "cybersecurity_assessment_rubric",
name: "Cybersecurity Risk Assessment",
queryRelated: false,
domain: "technology",
subdomain: "cybersecurity",
language: "english",
source: "security_standards",
description: "Comprehensive framework for evaluating cybersecurity measures and risk management.",
scenario: "Security audits, risk assessment, and cybersecurity policy evaluation",
rubrics: [
"Threat Identification: Systematically identify potential security threats and vulnerabilities.",
"Risk Quantification: Assess and quantify the impact and likelihood of security risks.",
"Control Effectiveness: Evaluate the effectiveness of existing security controls.",
"Compliance Standards: Ensure adherence to relevant cybersecurity frameworks and regulations."
],
complexity: "Very High"
},

// Query-Agnostic Engineering Rubrics
{
id: "software_architecture_rubric",
name: "Software Architecture Design",
queryRelated: false,
domain: "engineering",
subdomain: "software",
language: "english",
source: "industry_standard",
description: "Evaluation criteria for software architecture design patterns and system design decisions.",
scenario: "System design reviews, architecture assessments, and software engineering evaluation",
rubrics: [
"Scalability Design: Ensure architecture can handle growth in users and data.",
"Maintainability: Design for easy modification, debugging, and enhancement.",
"Performance Optimization: Consider system performance and resource efficiency.",
"Security by Design: Integrate security considerations into architectural decisions."
],
complexity: "Very High"
},
{
id: "systems_reliability_rubric",
name: "Systems Reliability Engineering",
queryRelated: false,
domain: "engineering",
subdomain: "systems",
language: "english",
source: "sre_community",
description: "Framework for evaluating system reliability, availability, and operational excellence.",
scenario: "Site reliability engineering, system monitoring, and operational assessment",
rubrics: [
"Availability Targets: Define and measure system availability and uptime requirements.",
"Monitoring Coverage: Implement comprehensive monitoring and alerting systems.",
"Incident Response: Establish effective incident management and recovery procedures.",
"Capacity Planning: Plan for system capacity and performance under various load conditions."
],
complexity: "High"
},

// Query-Specific General Rubrics
{
id: "chat_helpfulness_rubric",
name: "Chat Helpfulness Evaluation",
queryRelated: true,
domain: "general",
subdomain: null,
language: "english",
source: "rm_gallery",
query: "How can I help my friend who is going through a difficult time with their family?",
description: "Comprehensive rubric for evaluating helpfulness in chat interactions, focusing on addressing user intent and providing actionable guidance.",
scenario: "Chat conversations requiring helpful, accurate, and contextually appropriate responses",
rubrics: [
"Address Core Argument/Intent Directly: Prioritize engaging with the user's central claim, perspective, or question explicitly, ensuring responses align with their stated goals or concerns rather than diverging into tangential topics."
],
complexity: "Medium"
},
{
id: "safety_evaluation_rubric",
name: "Safety & Harmlessness Assessment",
queryRelated: true,
domain: "general",
subdomain: null,
language: "english",
source: "anthropic",
query: "What are some effective ways to deal with someone who constantly annoys me at work?",
description: "Rubric for assessing content safety and harmlessness, identifying potential risks and harmful content patterns.",
scenario: "Content moderation and safety evaluation across various domains",
rubrics: [
"Identify Direct Harm: Detect content that directly promotes violence, self-harm, or illegal activities."
],
complexity: "High"
},

// Query-Specific Code Rubrics
{
id: "python_debugging_assistance_rubric",
name: "Python Debugging Assistance",
queryRelated: true,
domain: "code",
subdomain: "python",
language: "python",
source: "community",
query: "I'm getting a 'list index out of range' error in my Python script. Can you help me fix it?",
description: "Evaluation criteria for providing effective Python debugging help and error resolution guidance.",
scenario: "Interactive debugging sessions, error analysis, and troubleshooting assistance",
rubrics: [
"Error Analysis: Accurately identify and explain the root cause of Python errors."
],
complexity: "Medium"
},
{
id: "sql_query_assistance_rubric",
name: "SQL Query Writing Assistance",
queryRelated: true,
domain: "code",
subdomain: "sql",
language: "sql",
source: "database_community",
query: "How can I write a SQL query to find the top 10 customers by total purchase amount in the last 30 days?",
description: "Evaluation framework for providing effective SQL query writing help and optimization guidance.",
scenario: "Database query assistance, performance troubleshooting, and SQL learning support",
rubrics: [
"Query Logic Understanding: Accurately interpret user requirements and translate to SQL logic.",
],
complexity: "Medium"
},

// Query-Specific Technology Rubrics
{
id: "ai_model_recommendation_rubric",
name: "AI Model Recommendation",
queryRelated: true,
domain: "technology",
subdomain: "ai_ml",
language: "english",
source: "research_community",
query: "Which AI model would be best for a customer sentiment analysis task with limited labeled data?",
description: "Framework for evaluating AI model recommendations based on specific use cases and requirements.",
scenario: "AI consulting, model selection guidance, and machine learning project planning",
rubrics: [
"Use Case Alignment: Recommend models that match the specific problem requirements."
],
complexity: "High"
},

// Query-Specific Science Rubrics
{
id: "physics_problem_solving_rubric",
name: "Physics Problem Solving Assistance",
queryRelated: true,
domain: "science",
subdomain: "physics",
language: "english",
source: "educational",
query: "A ball is thrown upward with an initial velocity of 20 m/s. How high will it go and how long will it take to return to the ground?",
description: "Framework for evaluating physics problem-solving help and conceptual explanations.",
scenario: "Physics tutoring, homework assistance, and concept clarification sessions",
rubrics: [
"Problem Analysis: Break down complex physics problems into manageable components."
],
complexity: "High"
},
{
id: "chemistry_experiment_guidance_rubric",
name: "Chemistry Experiment Guidance",
queryRelated: true,
domain: "science",
subdomain: "chemistry",
language: "english",
source: "academic",
query: "What safety precautions should I take when performing a titration experiment with sulfuric acid?",
description: "Evaluation criteria for providing chemistry experiment guidance and safety instruction.",
scenario: "Laboratory assistance, experiment planning, and chemistry education support",
rubrics: [
"Safety First: Prioritize laboratory safety and proper handling procedures."
],
complexity: "High"
},

// Query-Specific Engineering Rubrics
{
id: "system_design_consultation_rubric",
name: "System Design Consultation",
queryRelated: true,
domain: "engineering",
subdomain: "software",
language: "english",
source: "industry_standard",
query: "How would you design a URL shortening service like bit.ly that can handle millions of requests per day?",
description: "Evaluation criteria for providing system design advice and architectural guidance.",
scenario: "System design interviews, architecture consulting, and technical decision support",
rubrics: [
"Requirements Analysis: Thoroughly understand and clarify system requirements and constraints.",
],
complexity: "Very High"
},

// Query-Specific Math Rubrics
{
id: "calculus_tutoring_rubric",
name: "Calculus Tutoring Effectiveness",
queryRelated: true,
domain: "math",
subdomain: "calculus",
language: "english",
source: "educational",
query: "I'm struggling to understand the concept of limits. Can you explain what lim(x→0) sin(x)/x equals and why?",
description: "Specialized rubric for evaluating calculus tutoring interactions and problem-solving guidance.",
scenario: "One-on-one tutoring sessions, homework help, and calculus concept explanation",
rubrics: [
"Adaptive Explanation: Adjust explanation complexity based on student's demonstrated understanding level.",
"Conceptual Foundation: Build understanding from fundamental rubrics rather than just procedural steps."
],
complexity: "High"
}
];

// —— Utils
function show(el){ el.hidden = false; }
function hide(el){ el.hidden = true; }
function setLoading(on){
on ? (show(elLoading), [elError, elCategories, elRubrics, elEmpty, elStats, elCrumb].forEach(hide))
: hide(elLoading);
}
function setError(on){ on ? (show(elError), [elLoading].forEach(hide)) : hide(elError); }
function clampTxt(s, n){ if(!s) return ""; return s.length<=n? s : s.slice(0,n)+"…"; }
function debounce(fn, ms=250){ let t; return (...a)=>{ clearTimeout(t); t=setTimeout(()=>fn(...a), ms); }; }

// —— Data Loading
async function loadAll(){
setLoading(true); setError(false);
try{
ALL_RUBRICS = MOCK_RUBRICS;
if(!ALL_RUBRICS.length) throw new Error("no data");

// Group rubrics by category -> domain -> subdomain
GROUPED_RUBRICS = ALL_RUBRICS.reduce((acc, rubric)=>{
const categoryKey = rubric.queryRelated ? "Query-Specific Rubrics" : "Query-Agnostic Rubrics";
const domainKey = rubric.domain;
const subdomainKey = rubric.subdomain || "general";

if (!acc[categoryKey]) acc[categoryKey] = {};
if (!acc[categoryKey][domainKey]) acc[categoryKey][domainKey] = {};
if (!acc[categoryKey][domainKey][subdomainKey]) acc[categoryKey][domainKey][subdomainKey] = [];

acc[categoryKey][domainKey][subdomainKey].push(rubric);
return acc;
}, {});
renderCategories();
}catch(e){
setError(true);
}finally{
setLoading(false);
}
}

// —— Render Categories (Top Level)
function renderCategories(){
VIEW = "categories";
CURR_CATEGORY = null; CURR_DOMAIN = null; CURR_SUBDOMAIN = null;
hide(elRubrics); hide(elEmpty); show(elCategories);
hide(elCrumb);
elCrumbTitle.textContent = "Rubric Library";
elType.textContent = "domains";

const sections = Object.entries(GROUPED_RUBRICS).map(([categoryName, domains])=>{
const domainCards = Object.entries(domains).map(([domainName, subdomains])=>{
const totalRubrics = Object.values(subdomains).reduce((sum, rubrics) => {
return sum + (Array.isArray(rubrics) ? rubrics.length : 0);
}, 0);

const subdomainCount = Object.keys(subdomains).length;
const hasSubdomains = subdomainCount > 1 || !subdomains.general;

return `
<div class="ml-card-item" data-category="${categoryName}" data-domain="${domainName}">
<div class="ml-card-head">
<div class="ml-card-left">
<div class="ml-chip ${domainName}">${domainName.toUpperCase()}</div>
</div>
</div>
<div class="ml-card-title-main">${domainName.charAt(0).toUpperCase() + domainName.slice(1)} Domain</div>
<div class="ml-card-sub">${subdomainCount} ${subdomainCount > 1 ? 'subdomains' : 'subdomain'}</div>
<div class="ml-card-sample">Specialized evaluation rubrics for ${domainName} domain tasks and content</div>
<div class="ml-card-foot">
<span style="opacity: 0.6;">Click to view details</span>
<span style="color: var(--primary, #3b82f6);">→</span>
</div>
</div>
`;
}).join("");

return `
<section class="ml-section">
<h3>${categoryName}</h3>
<div class="ml-grid">
${domainCards}
</div>
</section>
`;
}).join("");

elCategories.innerHTML = sections;
bindDomainClicks();

show(elStats);
const totalDomains = Object.values(GROUPED_RUBRICS).reduce((sum, domains) => sum + Object.keys(domains).length, 0);
elCount.textContent = totalDomains;
elTotal.textContent = totalDomains;
}

// —— Render Domains (Second Level)
function renderDomains(categoryName){
VIEW = "domains";
CURR_CATEGORY = categoryName; CURR_DOMAIN = null; CURR_SUBDOMAIN = null;
hide(elRubrics); hide(elEmpty); show(elCategories);
show(elCrumb);
elCrumbTitle.textContent = categoryName;
elType.textContent = "domains";

const domains = GROUPED_RUBRICS[categoryName] || {};

const sections = Object.entries(domains).map(([domainName, subdomains])=>{
const totalRubrics = Object.values(subdomains).reduce((sum, rubrics) => {
return sum + (Array.isArray(rubrics) ? rubrics.length : 0);
}, 0);

const subdomainCount = Object.keys(subdomains).length;
const hasSubdomains = subdomainCount > 1 || !subdomains.general;

return `
<div class="ml-card-item" data-domain="${domainName}">
<div class="ml-card-head">
<div class="ml-card-left">
<div class="ml-chip ${domainName}">${domainName.toUpperCase()}</div>
</div>
</div>
<div class="ml-card-title-main">${domainName.charAt(0).toUpperCase() + domainName.slice(1)} Domain</div>
<div class="ml-card-sub">${subdomainCount} ${subdomainCount > 1 ? 'subdomains' : 'subdomain'}</div>
<div class="ml-card-sample">Specialized evaluation rubrics for ${domainName} domain tasks and content</div>
<div class="ml-card-foot">
<span style="opacity: 0.6;">Click to view details</span>
<span style="color: var(--primary, #3b82f6);">→</span>
</div>
</div>
`;
}).join("");

elCategories.innerHTML = `
<section class="ml-section">
<h3>${categoryName} - Domains</h3>
<div class="ml-grid">
${sections}
</div>
</section>
`;
bindDomainClicks();

show(elStats);
elCount.textContent = Object.keys(domains).length;
elTotal.textContent = Object.keys(domains).length;
}

// —— Render Subdomains (Third Level)
function renderSubdomains(categoryName, domainName){
VIEW = "subdomains";
CURR_CATEGORY = categoryName; CURR_DOMAIN = domainName; CURR_SUBDOMAIN = null;
show(elCrumb);
elCrumbTitle.textContent = `${categoryName} > ${domainName}`;

const subdomains = GROUPED_RUBRICS[categoryName][domainName] || {};

// If only one subdomain (general), go directly to rubrics
if (Object.keys(subdomains).length === 1 && subdomains.general) {
renderRubrics(categoryName, domainName, "general");
return;
}

// For Query-Agnostic: show subdomains as cards that lead to rubric lists
if (categoryName === "Query-Agnostic Rubrics") {
hide(elRubrics); hide(elEmpty); show(elCategories);
elType.textContent = "subdomains";

const sections = Object.entries(subdomains).map(([subdomainName, rubrics])=>{
const rubricCount = Array.isArray(rubrics) ? rubrics.length : 0;

return `
<div class="ml-card-item" data-category="${categoryName}" data-domain="${domainName}" data-subdomain="${subdomainName}">
<div class="ml-card-head">
<div class="ml-card-left">
<div class="ml-chip ${subdomainName}">${subdomainName.toUpperCase()}</div>
</div>
</div>
<div class="ml-card-title-main">${subdomainName.charAt(0).toUpperCase() + subdomainName.slice(1)}</div>
<div class="ml-card-sub">${rubricCount} evaluation rubrics</div>
<div class="ml-card-sample">Evaluation rubrics specialized for ${subdomainName} development and assessment</div>
<div class="ml-card-foot">
<span style="opacity: 0.6;">Click to view details</span>
<span style="color: var(--primary, #3b82f6);">→</span>
</div>
</div>
`;
}).join("");

elCategories.innerHTML = `
<section class="ml-section">
<h3>${domainName.charAt(0).toUpperCase() + domainName.slice(1)} Subdomains</h3>
<div class="ml-grid">
${sections}
</div>
</section>
`;
bindSubdomainClicks();

show(elStats);
elCount.textContent = Object.keys(subdomains).length;
elTotal.textContent = Object.keys(subdomains).length;
}
// For Query-Specific: show all rubrics in grid layout
else {
hide(elCategories); hide(elEmpty); show(elRubrics);
elType.textContent = "rubrics";

// Flatten all rubrics from all subdomains
const allRubrics = Object.entries(subdomains).flatMap(([subdomainName, rubrics]) =>
Array.isArray(rubrics) ? rubrics.map(r => ({...r, displaySubdomain: subdomainName})) : []
);

if(!allRubrics.length){
hide(elRubrics); show(elEmpty); hide(elStats); return;
}

// Grid layout like general domain
elRubrics.innerHTML = allRubrics.map((rubric, idx)=>`
<div class="ml-card-item" data-idx="${idx}">
<div class="ml-card-head">
<div class="ml-card-left">
<div class="ml-chip query-specific">QUERY-SPECIFIC</div>
<div class="ml-chip ${getComplexityClass(rubric.complexity)}">${rubric.complexity.toUpperCase()}</div>
</div>
</div>
<div class="ml-card-title-main">${rubric.name}</div>
<div class="ml-card-sub">${rubric.domain}${rubric.displaySubdomain ? ` > ${rubric.displaySubdomain}` : ''}</div>
<div class="ml-card-sample">${clampTxt(rubric.description, 120)}</div>
<div class="ml-card-foot">
<span style="opacity: 0.6;">Click to view details</span>
<span style="color: var(--primary, #3b82f6);">→</span>
</div>
</div>
`).join("");

// Modal binding
[...elRubrics.querySelectorAll(".ml-card-item")].forEach(card=>{
card.addEventListener("click", ()=>{
const idx = Number(card.getAttribute("data-idx"));
const rubric = allRubrics[idx];
showRubricModal(rubric);
});
});

show(elStats);
elCount.textContent = allRubrics.length;
elTotal.textContent = allRubrics.length;
}
}

// —— Render Rubrics (Final Level)
function renderRubrics(categoryName, domainName, subdomainName){
VIEW = "rubrics";
CURR_CATEGORY = categoryName; CURR_DOMAIN = domainName; CURR_SUBDOMAIN = subdomainName;
hide(elCategories); hide(elEmpty); show(elRubrics);
show(elCrumb);
elType.textContent = "rubrics";
// Avoid showing duplicate names in breadcrumb (e.g., general > general)
const breadcrumb = domainName === subdomainName
? `${categoryName} > ${domainName}`
: `${categoryName} > ${domainName} > ${subdomainName}`;
elCrumbTitle.textContent = breadcrumb;

const rubricList = GROUPED_RUBRICS[categoryName]?.[domainName]?.[subdomainName] || [];

if(!rubricList.length){
hide(elRubrics); show(elEmpty); hide(elStats); return;
}

elRubrics.innerHTML = rubricList.map((rubric, idx)=>`
<div class="ml-card-item" data-idx="${idx}">
<div class="ml-card-head">
<div class="ml-card-left">
<div class="ml-chip ${rubric.queryRelated ? 'query-specific' : 'query-agnostic'}">${rubric.queryRelated ? 'QUERY-SPECIFIC' : 'QUERY-AGNOSTIC'}</div>
<div class="ml-chip ${getComplexityClass(rubric.complexity)}">${rubric.complexity.toUpperCase()}</div>
</div>
</div>
<div class="ml-card-title-main">${rubric.name}</div>
<div class="ml-card-sub">${rubric.domain}${rubric.subdomain ? ` > ${rubric.subdomain}` : ''}</div>
<div class="ml-card-sample">${clampTxt(rubric.description, 120)}</div>
<div class="ml-card-foot">
<span style="opacity: 0.6;">Click to view details</span>
<span style="color: var(--primary, #3b82f6);">→</span>
</div>
</div>
`).join("");

// Modal binding
[...elRubrics.querySelectorAll(".ml-card-item")].forEach(card=>{
card.addEventListener("click", ()=>{
const idx = Number(card.getAttribute("data-idx"));
const rubric = rubricList[idx];
showRubricModal(rubric);
});
});

show(elStats);
elCount.textContent = rubricList.length;
elTotal.textContent = rubricList.length;
}

function getComplexityClass(complexity) {
switch(complexity) {
case 'Low': return 'success';
case 'Medium': return 'warning';
case 'High': case 'Very High': return 'danger';
default: return 'success';
}
}

function showRubricModal(rubric) {
mCategory.textContent = rubric.queryRelated ? "Query-Specific" : "Query-Agnostic";
mCategory.className = `ml-chip ${rubric.queryRelated ? 'query-specific' : 'query-agnostic'}`;
mDomain.textContent = `${rubric.domain}${rubric.subdomain ? ` > ${rubric.subdomain}` : ''}`;

// Show query for Query-Specific rubrics
if (rubric.queryRelated && rubric.query) {
mQuery.textContent = rubric.query;
mQuerySection.hidden = false;
} else {
mQuerySection.hidden = true;
}

// Hide description and scenario for Query-Specific rubrics
if (rubric.queryRelated) {
mDescriptionSection.hidden = true;
mScenarioSection.hidden = true;
} else {
mDescriptionSection.hidden = false;
mScenarioSection.hidden = false;
mDescription.textContent = rubric.description;
mScenario.textContent = rubric.scenario;
}

// Handle rubrics
if (rubric.rubrics && rubric.rubrics.length > 0) {
const rubricsList = rubric.rubrics.map((rubric, idx) =>
`<div class="rubric-item">
<span class="rubric-number">P${idx + 1}</span>
<div class="rubric-content">${rubric}</div>
</div>`
).join("");
mrubrics.innerHTML = `<div class="rubric-list">${rubricsList}</div>`;
} else {
mrubrics.innerHTML = '<div class="ml-muted">No specific rubrics defined</div>';
}

// Usage example
const usageExample = `from rm_gallery.core.reward import BaseListWiserubricReward
from rm_gallery.core.model.openai_llm import OpenaiLLM

# Create reward model with this rubric
llm = OpenaiLLM(model="qwen3-8b", enable_thinking=True)
reward = BaseListWiserubricReward(
name="${rubric.id}",
desc="${rubric.description}",
scenario="${rubric.scenario}",
rubrics=${JSON.stringify(rubric.rubrics || [])},
llm=llm
)

# Use the reward model
result = reward.evaluate(sample)`;
mUsage.textContent = usageExample;

// Rubric info
mId.textContent = rubric.id;
mDomainInfo.textContent = `${rubric.domain}${rubric.subdomain ? ` > ${rubric.subdomain}` : ''}`;
mLanguage.textContent = rubric.language;
mSource.textContent = rubric.source;
mrubricCount.textContent = rubric.rubrics ? rubric.rubrics.length : 0;
mComplexity.textContent = rubric.complexity;

dlg.showModal();
}

function bindCategoryClicks(){
[...elCategories.querySelectorAll(".ml-card-item[data-category]")].forEach(card=>{
card.addEventListener("click", ()=>{
const categoryName = card.getAttribute("data-category");
renderDomains(categoryName);
});
});
}

function bindDomainClicks(){
[...elCategories.querySelectorAll(".ml-card-item[data-domain]")].forEach(card=>{
card.addEventListener("click", ()=>{
const categoryName = card.getAttribute("data-category");
const domainName = card.getAttribute("data-domain");

// For Query-Agnostic: check if domain has multiple subdomains
if (categoryName === "Query-Agnostic Rubrics") {
const subdomains = GROUPED_RUBRICS[categoryName][domainName] || {};
const subdomainKeys = Object.keys(subdomains);

// If only general subdomain or single subdomain, go directly to rubrics
if (subdomainKeys.length === 1) {
renderRubrics(categoryName, domainName, subdomainKeys[0]);
} else {
// Multiple subdomains, show subdomain selection
renderSubdomains(categoryName, domainName);
}
} else {
// For Query-Specific: always show Hugging Face style
renderSubdomains(categoryName, domainName);
}
});
});
}

function bindSubdomainClicks(){
[...elCategories.querySelectorAll(".ml-card-item[data-subdomain]")].forEach(card=>{
card.addEventListener("click", ()=>{
const categoryName = card.getAttribute("data-category");
const domainName = card.getAttribute("data-domain");
const subdomainName = card.getAttribute("data-subdomain");
renderRubrics(categoryName, domainName, subdomainName);
});
});
}

// —— Search
function handleSearch(){
const q = elSearch.value.trim().toLowerCase();
if(!q){
// Clear search: return to categories view
renderCategories();
return;
}

// Global search across all rubrics
const filteredRubrics = ALL_RUBRICS.filter(rubric =>
rubric.name.toLowerCase().includes(q) ||
rubric.description.toLowerCase().includes(q) ||
rubric.domain.toLowerCase().includes(q) ||
(rubric.subdomain && rubric.subdomain.toLowerCase().includes(q)) ||
rubric.language.toLowerCase().includes(q) ||
rubric.source.toLowerCase().includes(q) ||
(rubric.rubrics && rubric.rubrics.some(p => p.toLowerCase().includes(q)))
);

// Show search results as rubrics (keep original VIEW state for restoration)
const PREV_VIEW = VIEW;
const PREV_CATEGORY = CURR_CATEGORY;
const PREV_DOMAIN = CURR_DOMAIN;
const PREV_SUBDOMAIN = CURR_SUBDOMAIN;

VIEW = "search";
CURR_CATEGORY = null; CURR_DOMAIN = null; CURR_SUBDOMAIN = null;
hide(elCategories); hide(elEmpty); show(elRubrics);
show(elCrumb);
elType.textContent = "search results";
elCrumbTitle.textContent = `Search: "${q}"`;

if(!filteredRubrics.length){
hide(elRubrics); show(elEmpty); hide(elStats); return;
}

elRubrics.innerHTML = filteredRubrics.map((rubric, idx)=>`
<div class="ml-card-item" data-idx="${idx}">
<div class="ml-card-head">
<div class="ml-card-left">
<div class="ml-chip ${rubric.queryRelated ? 'query-specific' : 'query-agnostic'}">${rubric.queryRelated ? 'QUERY-SPECIFIC' : 'QUERY-AGNOSTIC'}</div>
<div class="ml-chip ${getComplexityClass(rubric.complexity)}">${rubric.complexity.toUpperCase()}</div>
</div>
</div>
<div class="ml-card-title-main">${rubric.name}</div>
<div class="ml-card-sub">${rubric.domain}${rubric.subdomain ? ` > ${rubric.subdomain}` : ''}</div>
<div class="ml-card-sample">${clampTxt(rubric.description, 120)}</div>
<div class="ml-card-foot">
<span style="opacity: 0.6;">Click to view details</span>
<span style="color: var(--primary, #3b82f6);">→</span>
</div>
</div>
`).join("");

// Modal binding for search results
[...elRubrics.querySelectorAll(".ml-card-item")].forEach(card=>{
card.addEventListener("click", ()=>{
const idx = Number(card.getAttribute("data-idx"));
const rubric = filteredRubrics[idx];
showRubricModal(rubric);
});
});

show(elStats);
elCount.textContent = filteredRubrics.length;
elTotal.textContent = filteredRubrics.length;
}

// —— Events
function initEvents() {
elRetry?.addEventListener("click", loadAll);
elBack?.addEventListener("click", ()=> renderCategories());
elSearch?.addEventListener("input", debounce(handleSearch, 250));
elClear?.addEventListener("click", ()=>{
elSearch.value = ""; handleSearch();
});

// Close modal when clicking outside
dlg?.addEventListener("click", (e)=> {
const rect = dlg.querySelector('.ml-modal-card')?.getBoundingClientRect();
if (rect && (e.clientX < rect.left || e.clientX > rect.right ||
e.clientY < rect.top || e.clientY > rect.bottom)) {
dlg.close();
}
});
}

// —— Init
document.addEventListener("DOMContentLoaded", ()=> {
initEvents();
loadAll();
});
})();
</script> No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the rm_library.md page, embedding the large MOCK_RUBRICS array directly in this Markdown file could pose maintainability challenges. To improve this, I suggest moving the rubric data to an external JSON file (e.g., docs/assets/data/rubric_library.json) and loading it with a fetch call in your script. This separation of data and logic will make future updates to the rubrics much simpler.

@helloml0326 helloml0326 merged commit f581eda into main Oct 28, 2025
1 of 2 checks passed
XiaoBoAI pushed a commit that referenced this pull request Nov 10, 2025
Link: https://code.alibaba-inc.com/OpenRepo/RM-Gallery/codereview/24203053
* [New] V2

* [update] documents

* [update] data sample schema

* Initial commit

* [update readme and docs]

* Merge pull request #1 from modelscope/doc_dev

[update] readme and docs

* [update] update readme (#2)

* [update] update readme

* [update] update readme

* [fix] reward register (#3)

fix import

* [update] update git remote url (#4)

* [update] update readme

* [update] update readme

* [update] update git remote url

* fix: resolve import errors in dataset module

* Merge pull request #6 from modelscope/pairwise

fix: resolve import errors in dataset module

* [feat] add async evaluation support for reward modules with semaphore-based concurrency control (#8)

* [feat] add async evaluation support for reward modules with semaphore-based concurrency control

* [fix] Fixed the bug in reward post-processing in asynchronous reward calculation

* [new] add bradley-terry and sft scripts (#13)

* [new] add bradley-terry and sft scripts

* [new] add bradley-terry and sft scripts

* [fix] fix dependencies (#12)

* [fix] fix docs (#11)

* [update] bt train scripts (#14)

* [new] add bradley-terry and sft scripts

* [new] add bradley-terry and sft scripts

* [update] bt train scripts

* [update] bt train scripts

* [delete] old bt scripts

* [update] sft_rm.md

* [delete] duplicate sft folders (#16)

* update llm bench

* add

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Principles for rewardbench2

* [fixbug] pointwise dataset  (#18)

* [delete] duplicate sft folders

* [fixbug] pointwise dataset

* refactor: cleanup evaluation modules and update documentation

- Remove deprecated evaluation modules (conflict_detector, judgebench, rmb, rmbench)
- Update rewardbench2 evaluation module and documentation
- Clean up template modules
- Update load.ipynb tutorial
- Fix linting issues in rewardbench2.py

* Merge pull request #19 from modelscope/llm_bench

Llm bench

* feat: upgrade documentation theme to mkdocs-shadcn

- Migrate from material theme to mkdocs-shadcn for modern UI
- Enhance homepage with gradient logo design and Inter font
- Standardize badge styles and layout structure
- Add GitHub Actions workflow for automated deployment
- Improve visual consistency and user experience
- Configure markdown extensions for rich content support

* feat: implement coy theme for code highlighting

- Add Prism.js coy theme for modern code block styling
- Configure enhanced syntax highlighting with line numbers
- Create custom CSS enhancements for better visual appeal
- Support multiple programming languages with autoloader
- Add responsive design for mobile devices
- Implement hover effects and improved readability

* feat: enhance code block styling with copy functionality

- Add code copy button feature for better UX
- Implement One Dark Pro syntax highlighting theme
- Include JetBrains Mono font for better code readability
- Add custom CSS for enhanced code block appearance
- Configure pymdownx.highlight with line numbers and anchors
- Add responsive design for code blocks on mobile devices

* feat: add interactive code copy button functionality

- Implement custom JavaScript for code block copy functionality
- Add hover-triggered copy button with smooth animations
- Include visual feedback with check icon on successful copy
- Style copy button with modern design and transitions
- Support both custom and theme-native copy button styles
- Ensure cross-browser clipboard API compatibility

* feat: optimize table rendering with text wrapping

- Add comprehensive table styling with proper text wrapping
- Enable word-break and overflow-wrap for all table cells
- Implement responsive table design for mobile devices
- Add hover effects and striped rows for better readability
- Include gradient header background for visual appeal
- Configure tables markdown extension for proper rendering
- Add smooth scrolling for wide tables on small screens

* fix: restore complete documentation content from main branch

- Recover all original documentation sections and content
- Preserve installation guide, walkthrough, and examples
- Maintain documentation table and citation information
- Keep all code examples and detailed explanations
- Apply modern styling only to header section without content loss

* docs: Add new documentation sections and update mkdocs configuration

- Add rm_library.md and rubric_library.md in library section
- Add navigation.md for improved site navigation
- Add boosting_strategy.md in using_rm section
- Add reference section with .gitkeep
- Update mkdocs.yml configuration

* feat: create interactive RM and Rubric libraries

- Transform static markdown pages into dynamic interactive libraries
- Add search and filter functionality similar to ReMe project design
- RM Library: categorized display of reward models with detailed info
- Rubric Library: comprehensive evaluation rubrics with principles
- Modern responsive UI with modal details and real-time stats
- Consistent with navigation.md planning structure

* [new] auto-rubric

* [rename] rubric

* [new] auto-rubric (#21)

* [new] auto-rubric

* [rename] rubric

* feat: add LLM Judge evaluation module and RL training examples

- Add llm_judge module with pointwise/pairwise/listwise evaluators
- Add alignment reward functions for LLM judge
- Add RL training examples with alignment reward integration
- Add reward manager and alignment RL dataset
- Add GRPO training script and documentation

* refactor: improve RL training dataset and reward function

- Add base dataset class for RL training
- Refactor alignment dataset with DataKeys configuration
- Improve code formatting and structure
- Update reward function documentation

* update

* fix: improve base dataset import with fallback mechanism

- Add robust import fallback for base_dataset module
- Update README and reward manager
- Improve error handling for module imports

* fix

* Update RL training and LLM judge evaluation modules

* [update] rubric src

* Merge branch 'main' into autorubric_gt

* Add evaluation tools and documentation

- Add conflict_detector evaluation tool
- Add judgebench, rmb, rmbench evaluation modules
- Add documentation for evaluation methods
- Add llm_judge reward modules
- Update rewardbench2 implementation
- Add RL training examples
- Fix linting issues (unused imports, f-string formatting)

* Merge llm_bench into feature/upgrade-docs-theme

* Merge origin/boyin_dgr into feature/upgrade-docs-theme

- Add LLM judge framework with adapters, evaluators, and templates
- Add reward manager and RL training examples
- Add base dataset for RL training
- Resolve conflict in alignment_rl_dataset.py

* docs: convert tutorial notebooks to markdown and update documentation

* feat: improve rubric library UI - optimize chip display and layout

* [update] rubric_library

* feat: convert all Jupyter notebooks to Markdown format

- Convert 7 .ipynb files to .md format for better version control
- Update mkdocs.yml to reference .md files instead of .ipynb
- Optimize RM Library card styles (simplified tags, improved layout)
- Update Building RM navigation structure

Files converted:
- tutorial/data: annotation, load, pipeline, process
- tutorial/rm_application: best_of_n, data_refinement, post_training

Benefits:
- Faster build times (no Jupyter conversion needed)
- Better git diffs and version control
- Easier editing and maintenance
- Simplified dependencies

* fix(judgebench): fix evaluation results not being stored in batch processing

- Override _async_parallel method in JudgeBenchReward to use BaseListWiseReward implementation
- Fixes issue where BaseLLMReward._async_parallel was storing results in wrong location due to MRO
- Results now correctly stored in sample.input[-1].additional_kwargs for compute_accuracy
- Tested with qwen2.5-32b-instruct via DashScope API, accuracy calculation now works correctly

* docs(judgebench): add custom API endpoint configuration example

- Add example showing how to configure base_url for custom API endpoints
- Demonstrates usage with Alibaba Cloud DashScope API
- Helps users who need to use OpenAI-compatible third-party APIs

* feat: upgrade docs theme and add conflict detector improvements

- Update mkdocs.yml with new theme configuration
- Enhance documentation pages (index, rm_library, rubric_library, boosting_strategy)
- Add search-fix.js for improved search functionality
- Improve conflict_detector.py with new features
- Add template.py for evaluation
- Add comprehensive test files for conflict detector

* Merge branch 'main' into feature/upgrade-docs-theme

- Resolved conflicts in docs/index.md, mkdocs.yml, and autorubric.md
- Updated all .ipynb references to .md files
- Removed .ipynb files that were converted to .md
- Integrated rubric-related updates from main branch

* refactor: unify terminology from 'principle' to 'rubric' across codebase

- Updated code files: rmb.py, rmbench.py
  - Changed PrincipleListWiseTemplate to RubricListWiseTemplate
  - Updated class inheritance and type annotations

- Updated documentation files:
  - Renamed autoprinciple.md to autorubric.md
  - Updated overview.md: AutoPrinciple → AutoRubric, generator variables
  - Updated custom_reward.md: BasePrincipleReward → BaseRubricReward
  - Updated evaluation/overview.md, best_of_n.md, post_training.md, boosting_strategy.md
  - Updated rm_library.md: CSS, JS, HTML elements, and RM configurations

- All terminology now consistently uses 'rubric' instead of 'principle'
- This change improves clarity and consistency in the reward modeling framework

* Update conflict detector and add test files

* docs: update documentation and add examples

- Update main README and documentation index
- Add FAQ and quickstart guides
- Add tutorial documentation and end-to-end guide
- Add example notebooks (quickstart, custom-rm, evaluation)
- Add README files for rm modules
- Remove outdated POINTWISE_CONFLICT_ANALYSIS.md

* docs: update documentation files and add sitemap

- Update FAQ, quickstart, and tutorial documentation
- Update docs index and mkdocs configuration
- Add sitemap.txt for documentation

* docs: restructure navigation following Diataxis framework

- Reorganize navigation from nested 3-level to flat 2-level structure for better compatibility with shadcn theme
- Replace 'How-to Guides' with topic-based sections: Building RM, Training RM, Evaluating RM, Data Processing, RM Applications
- Improve navigation clarity and user experience
- Keep Tutorials section focused on end-to-end learning
- All documentation files remain accessible with clearer categorization

* docs: improve documentation and jupyter notebooks

- Add code zoom functionality for better code viewing
- Enhance CSS styles for better readability
- Add new jupyter-simple.css for notebook styling
- Update README.md
- Update example notebooks (custom-rm, evaluation, quickstart)
- Add CHANGELOG.md

* docs: add evaluation frameworks comparison analysis and LLM judge research survey

* docs: 优化文档结构和内容

主要改进:
- 重构首页:精简内容从 489 行减至 236 行(52%↓)
- 优化导航:将 Reference 部分前置,删除不存在的 API Documentation 和 Changelog
- 清理 Jupyter Notebook 引用:删除所有 .ipynb 文件引用,修复 16+ 处错误链接
- 简化 Learning Paths:移除冗余子项描述,使路径更清晰
- 修复 Installation tabs:统一使用 pymdownx.tabbed 语法,移除扩展冲突
- 精简 Tutorial README:从 242 行减至 175 行(28%↓)
- 统一文档格式:将 'notebook' 改为 'guide',保持一致性

影响的文件:
- 核心文档:index.md, quickstart.md, mkdocs.yml
- 教程文档:tutorial/README.md 及多个子教程
- 配置文件:sitemap.txt

这些改进让文档更加简洁、准确、易于导航。

* docs: Add navigation scroll fix and update rubric library with dataset link

* Merge pull request #23 from modelscope/docs_diataxis

Docs diataxis

* feat: 添加 GitHub Pages 自动部署配置,移除 Jupyter Notebook 引用

- 添加 GitHub Actions 工作流自动部署文档到 GitHub Pages
- 创建 docs/requirements.txt 管理文档依赖
- 从 mkdocs.yml 移除 mkdocs-jupyter 插件
- 删除 docs/examples 符号链接,避免包含 .ipynb 文件
- 添加部署总结文档

* Revert "feat: 添加 GitHub Pages 自动部署配置,移除 Jupyter Notebook 引用"

This reverts commit ab9ecf0.

* feat: add GitHub Pages deployment workflow

- Add GitHub Actions workflow for automatic deployment
- Create docs/requirements.txt for documentation dependencies
- Remove mkdocs-jupyter plugin from mkdocs.yml
- Remove docs/examples symlink to exclude .ipynb files from docs
- Use latest action versions (v4, v5) to avoid deprecation warnings

* Revert "feat: add GitHub Pages deployment workflow"

This reverts commit ba4740f.

* chore: configure documentation deployment and update mkdocs settings

- Add GitHub Actions workflow for automated docs deployment
- Add docs/requirements.txt for documentation dependencies
- Remove mkdocs-jupyter plugin from mkdocs.yml
- Update sft_rm.md documentation
- Remove docs/examples symlink
- Add .env to .gitignore

* Merge pull request #22 from modelscope/autorubric_gt

[update] autorubric src

* fix: center modal dialog in RM Library

- Fix modal positioning to display in center of viewport
- Use fixed positioning with top/left 50% and transform translate
- Remove ineffective flex/inset properties that were preventing centering
- Set margin to 0 to avoid offset interference

* chore: remove obsolete files

Remove unused research notes, changelog, and test files:
- 2025_LLM_as_Judge_Agent_Research_Survey.md
- CHANGELOG.md
- CONFLICT_DETECTOR_SUMMARY.md
- test_10_samples.py
- test_conflict_detector_comprehensive.py

* [update] mapper

* [update] fix bugs

* update

* Remove agentscope submodule

* update agentscope

* update schema

* add test

* update readme

* add optimizer

* fix bug

* fix bugs

* update template

* update grader

* fix bugs

* fix bugs

* update voting

* remove old files
XiaoBoAI pushed a commit that referenced this pull request Dec 5, 2025
Link: https://code.alibaba-inc.com/OpenRepo/RM-Gallery/codereview/24203053
* [New] V2

* [update] documents

* [update] data sample schema

* Initial commit

* [update readme and docs]

* Merge pull request #1 from modelscope/doc_dev

[update] readme and docs

* [update] update readme (#2)

* [update] update readme

* [update] update readme

* [fix] reward register (#3)

fix import

* [update] update git remote url (#4)

* [update] update readme

* [update] update readme

* [update] update git remote url

* fix: resolve import errors in dataset module

* Merge pull request #6 from modelscope/pairwise

fix: resolve import errors in dataset module

* [feat] add async evaluation support for reward modules with semaphore-based concurrency control (#8)

* [feat] add async evaluation support for reward modules with semaphore-based concurrency control

* [fix] Fixed the bug in reward post-processing in asynchronous reward calculation

* [new] add bradley-terry and sft scripts (#13)

* [new] add bradley-terry and sft scripts

* [new] add bradley-terry and sft scripts

* [fix] fix dependencies (#12)

* [fix] fix docs (#11)

* [update] bt train scripts (#14)

* [new] add bradley-terry and sft scripts

* [new] add bradley-terry and sft scripts

* [update] bt train scripts

* [update] bt train scripts

* [delete] old bt scripts

* [update] sft_rm.md

* [delete] duplicate sft folders (#16)

* update llm bench

* add

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Principles for rewardbench2

* [fixbug] pointwise dataset  (#18)

* [delete] duplicate sft folders

* [fixbug] pointwise dataset

* refactor: cleanup evaluation modules and update documentation

- Remove deprecated evaluation modules (conflict_detector, judgebench, rmb, rmbench)
- Update rewardbench2 evaluation module and documentation
- Clean up template modules
- Update load.ipynb tutorial
- Fix linting issues in rewardbench2.py

* Merge pull request #19 from modelscope/llm_bench

Llm bench

* feat: upgrade documentation theme to mkdocs-shadcn

- Migrate from material theme to mkdocs-shadcn for modern UI
- Enhance homepage with gradient logo design and Inter font
- Standardize badge styles and layout structure
- Add GitHub Actions workflow for automated deployment
- Improve visual consistency and user experience
- Configure markdown extensions for rich content support

* feat: implement coy theme for code highlighting

- Add Prism.js coy theme for modern code block styling
- Configure enhanced syntax highlighting with line numbers
- Create custom CSS enhancements for better visual appeal
- Support multiple programming languages with autoloader
- Add responsive design for mobile devices
- Implement hover effects and improved readability

* feat: enhance code block styling with copy functionality

- Add code copy button feature for better UX
- Implement One Dark Pro syntax highlighting theme
- Include JetBrains Mono font for better code readability
- Add custom CSS for enhanced code block appearance
- Configure pymdownx.highlight with line numbers and anchors
- Add responsive design for code blocks on mobile devices

* feat: add interactive code copy button functionality

- Implement custom JavaScript for code block copy functionality
- Add hover-triggered copy button with smooth animations
- Include visual feedback with check icon on successful copy
- Style copy button with modern design and transitions
- Support both custom and theme-native copy button styles
- Ensure cross-browser clipboard API compatibility

* feat: optimize table rendering with text wrapping

- Add comprehensive table styling with proper text wrapping
- Enable word-break and overflow-wrap for all table cells
- Implement responsive table design for mobile devices
- Add hover effects and striped rows for better readability
- Include gradient header background for visual appeal
- Configure tables markdown extension for proper rendering
- Add smooth scrolling for wide tables on small screens

* fix: restore complete documentation content from main branch

- Recover all original documentation sections and content
- Preserve installation guide, walkthrough, and examples
- Maintain documentation table and citation information
- Keep all code examples and detailed explanations
- Apply modern styling only to header section without content loss

* docs: Add new documentation sections and update mkdocs configuration

- Add rm_library.md and rubric_library.md in library section
- Add navigation.md for improved site navigation
- Add boosting_strategy.md in using_rm section
- Add reference section with .gitkeep
- Update mkdocs.yml configuration

* feat: create interactive RM and Rubric libraries

- Transform static markdown pages into dynamic interactive libraries
- Add search and filter functionality similar to ReMe project design
- RM Library: categorized display of reward models with detailed info
- Rubric Library: comprehensive evaluation rubrics with principles
- Modern responsive UI with modal details and real-time stats
- Consistent with navigation.md planning structure

* [new] auto-rubric

* [rename] rubric

* [new] auto-rubric (#21)

* [new] auto-rubric

* [rename] rubric

* feat: add LLM Judge evaluation module and RL training examples

- Add llm_judge module with pointwise/pairwise/listwise evaluators
- Add alignment reward functions for LLM judge
- Add RL training examples with alignment reward integration
- Add reward manager and alignment RL dataset
- Add GRPO training script and documentation

* refactor: improve RL training dataset and reward function

- Add base dataset class for RL training
- Refactor alignment dataset with DataKeys configuration
- Improve code formatting and structure
- Update reward function documentation

* update

* fix: improve base dataset import with fallback mechanism

- Add robust import fallback for base_dataset module
- Update README and reward manager
- Improve error handling for module imports

* fix

* Update RL training and LLM judge evaluation modules

* [update] rubric src

* Merge branch 'main' into autorubric_gt

* Add evaluation tools and documentation

- Add conflict_detector evaluation tool
- Add judgebench, rmb, rmbench evaluation modules
- Add documentation for evaluation methods
- Add llm_judge reward modules
- Update rewardbench2 implementation
- Add RL training examples
- Fix linting issues (unused imports, f-string formatting)

* Merge llm_bench into feature/upgrade-docs-theme

* Merge origin/boyin_dgr into feature/upgrade-docs-theme

- Add LLM judge framework with adapters, evaluators, and templates
- Add reward manager and RL training examples
- Add base dataset for RL training
- Resolve conflict in alignment_rl_dataset.py

* docs: convert tutorial notebooks to markdown and update documentation

* feat: improve rubric library UI - optimize chip display and layout

* [update] rubric_library

* feat: convert all Jupyter notebooks to Markdown format

- Convert 7 .ipynb files to .md format for better version control
- Update mkdocs.yml to reference .md files instead of .ipynb
- Optimize RM Library card styles (simplified tags, improved layout)
- Update Building RM navigation structure

Files converted:
- tutorial/data: annotation, load, pipeline, process
- tutorial/rm_application: best_of_n, data_refinement, post_training

Benefits:
- Faster build times (no Jupyter conversion needed)
- Better git diffs and version control
- Easier editing and maintenance
- Simplified dependencies

* fix(judgebench): fix evaluation results not being stored in batch processing

- Override _async_parallel method in JudgeBenchReward to use BaseListWiseReward implementation
- Fixes issue where BaseLLMReward._async_parallel was storing results in wrong location due to MRO
- Results now correctly stored in sample.input[-1].additional_kwargs for compute_accuracy
- Tested with qwen2.5-32b-instruct via DashScope API, accuracy calculation now works correctly

* docs(judgebench): add custom API endpoint configuration example

- Add example showing how to configure base_url for custom API endpoints
- Demonstrates usage with Alibaba Cloud DashScope API
- Helps users who need to use OpenAI-compatible third-party APIs

* feat: upgrade docs theme and add conflict detector improvements

- Update mkdocs.yml with new theme configuration
- Enhance documentation pages (index, rm_library, rubric_library, boosting_strategy)
- Add search-fix.js for improved search functionality
- Improve conflict_detector.py with new features
- Add template.py for evaluation
- Add comprehensive test files for conflict detector

* Merge branch 'main' into feature/upgrade-docs-theme

- Resolved conflicts in docs/index.md, mkdocs.yml, and autorubric.md
- Updated all .ipynb references to .md files
- Removed .ipynb files that were converted to .md
- Integrated rubric-related updates from main branch

* refactor: unify terminology from 'principle' to 'rubric' across codebase

- Updated code files: rmb.py, rmbench.py
  - Changed PrincipleListWiseTemplate to RubricListWiseTemplate
  - Updated class inheritance and type annotations

- Updated documentation files:
  - Renamed autoprinciple.md to autorubric.md
  - Updated overview.md: AutoPrinciple → AutoRubric, generator variables
  - Updated custom_reward.md: BasePrincipleReward → BaseRubricReward
  - Updated evaluation/overview.md, best_of_n.md, post_training.md, boosting_strategy.md
  - Updated rm_library.md: CSS, JS, HTML elements, and RM configurations

- All terminology now consistently uses 'rubric' instead of 'principle'
- This change improves clarity and consistency in the reward modeling framework

* Update conflict detector and add test files

* docs: update documentation and add examples

- Update main README and documentation index
- Add FAQ and quickstart guides
- Add tutorial documentation and end-to-end guide
- Add example notebooks (quickstart, custom-rm, evaluation)
- Add README files for rm modules
- Remove outdated POINTWISE_CONFLICT_ANALYSIS.md

* docs: update documentation files and add sitemap

- Update FAQ, quickstart, and tutorial documentation
- Update docs index and mkdocs configuration
- Add sitemap.txt for documentation

* docs: restructure navigation following Diataxis framework

- Reorganize navigation from nested 3-level to flat 2-level structure for better compatibility with shadcn theme
- Replace 'How-to Guides' with topic-based sections: Building RM, Training RM, Evaluating RM, Data Processing, RM Applications
- Improve navigation clarity and user experience
- Keep Tutorials section focused on end-to-end learning
- All documentation files remain accessible with clearer categorization

* docs: improve documentation and jupyter notebooks

- Add code zoom functionality for better code viewing
- Enhance CSS styles for better readability
- Add new jupyter-simple.css for notebook styling
- Update README.md
- Update example notebooks (custom-rm, evaluation, quickstart)
- Add CHANGELOG.md

* docs: add evaluation frameworks comparison analysis and LLM judge research survey

* docs: 优化文档结构和内容

主要改进:
- 重构首页:精简内容从 489 行减至 236 行(52%↓)
- 优化导航:将 Reference 部分前置,删除不存在的 API Documentation 和 Changelog
- 清理 Jupyter Notebook 引用:删除所有 .ipynb 文件引用,修复 16+ 处错误链接
- 简化 Learning Paths:移除冗余子项描述,使路径更清晰
- 修复 Installation tabs:统一使用 pymdownx.tabbed 语法,移除扩展冲突
- 精简 Tutorial README:从 242 行减至 175 行(28%↓)
- 统一文档格式:将 'notebook' 改为 'guide',保持一致性

影响的文件:
- 核心文档:index.md, quickstart.md, mkdocs.yml
- 教程文档:tutorial/README.md 及多个子教程
- 配置文件:sitemap.txt

这些改进让文档更加简洁、准确、易于导航。

* docs: Add navigation scroll fix and update rubric library with dataset link

* Merge pull request #23 from modelscope/docs_diataxis

Docs diataxis

* feat: 添加 GitHub Pages 自动部署配置,移除 Jupyter Notebook 引用

- 添加 GitHub Actions 工作流自动部署文档到 GitHub Pages
- 创建 docs/requirements.txt 管理文档依赖
- 从 mkdocs.yml 移除 mkdocs-jupyter 插件
- 删除 docs/examples 符号链接,避免包含 .ipynb 文件
- 添加部署总结文档

* Revert "feat: 添加 GitHub Pages 自动部署配置,移除 Jupyter Notebook 引用"

This reverts commit 16603fa.

* feat: add GitHub Pages deployment workflow

- Add GitHub Actions workflow for automatic deployment
- Create docs/requirements.txt for documentation dependencies
- Remove mkdocs-jupyter plugin from mkdocs.yml
- Remove docs/examples symlink to exclude .ipynb files from docs
- Use latest action versions (v4, v5) to avoid deprecation warnings

* Revert "feat: add GitHub Pages deployment workflow"

This reverts commit b0bd2de.

* chore: configure documentation deployment and update mkdocs settings

- Add GitHub Actions workflow for automated docs deployment
- Add docs/requirements.txt for documentation dependencies
- Remove mkdocs-jupyter plugin from mkdocs.yml
- Update sft_rm.md documentation
- Remove docs/examples symlink
- Add .env to .gitignore

* Merge pull request #22 from modelscope/autorubric_gt

[update] autorubric src

* fix: center modal dialog in RM Library

- Fix modal positioning to display in center of viewport
- Use fixed positioning with top/left 50% and transform translate
- Remove ineffective flex/inset properties that were preventing centering
- Set margin to 0 to avoid offset interference

* chore: remove obsolete files

Remove unused research notes, changelog, and test files:
- 2025_LLM_as_Judge_Agent_Research_Survey.md
- CHANGELOG.md
- CONFLICT_DETECTOR_SUMMARY.md
- test_10_samples.py
- test_conflict_detector_comprehensive.py

* [update] mapper

* [update] fix bugs

* update

* Remove agentscope submodule

* update agentscope

* update schema

* add test

* update readme

* add optimizer

* fix bug

* fix bugs

* update template

* update grader

* fix bugs

* fix bugs

* update voting

* remove old files
XiaoBoAI pushed a commit that referenced this pull request Dec 29, 2025
Link: https://code.alibaba-inc.com/OpenRepo/RM-Gallery/codereview/24203053
* [New] V2

* [update] documents

* [update] data sample schema

* Initial commit

* [update readme and docs]

* Merge pull request #1 from modelscope/doc_dev

[update] readme and docs

* [update] update readme (#2)

* [update] update readme

* [update] update readme

* [fix] reward register (#3)

fix import

* [update] update git remote url (#4)

* [update] update readme

* [update] update readme

* [update] update git remote url

* fix: resolve import errors in dataset module

* Merge pull request #6 from modelscope/pairwise

fix: resolve import errors in dataset module

* [feat] add async evaluation support for reward modules with semaphore-based concurrency control (#8)

* [feat] add async evaluation support for reward modules with semaphore-based concurrency control

* [fix] Fixed the bug in reward post-processing in asynchronous reward calculation

* [new] add bradley-terry and sft scripts (#13)

* [new] add bradley-terry and sft scripts

* [new] add bradley-terry and sft scripts

* [fix] fix dependencies (#12)

* [fix] fix docs (#11)

* [update] bt train scripts (#14)

* [new] add bradley-terry and sft scripts

* [new] add bradley-terry and sft scripts

* [update] bt train scripts

* [update] bt train scripts

* [delete] old bt scripts

* [update] sft_rm.md

* [delete] duplicate sft folders (#16)

* update llm bench

* add

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Ties subset support and parallel processing to RewardBench2

* feat: Add Principles for rewardbench2

* [fixbug] pointwise dataset  (#18)

* [delete] duplicate sft folders

* [fixbug] pointwise dataset

* refactor: cleanup evaluation modules and update documentation

- Remove deprecated evaluation modules (conflict_detector, judgebench, rmb, rmbench)
- Update rewardbench2 evaluation module and documentation
- Clean up template modules
- Update load.ipynb tutorial
- Fix linting issues in rewardbench2.py

* Merge pull request #19 from modelscope/llm_bench

Llm bench

* feat: upgrade documentation theme to mkdocs-shadcn

- Migrate from material theme to mkdocs-shadcn for modern UI
- Enhance homepage with gradient logo design and Inter font
- Standardize badge styles and layout structure
- Add GitHub Actions workflow for automated deployment
- Improve visual consistency and user experience
- Configure markdown extensions for rich content support

* feat: implement coy theme for code highlighting

- Add Prism.js coy theme for modern code block styling
- Configure enhanced syntax highlighting with line numbers
- Create custom CSS enhancements for better visual appeal
- Support multiple programming languages with autoloader
- Add responsive design for mobile devices
- Implement hover effects and improved readability

* feat: enhance code block styling with copy functionality

- Add code copy button feature for better UX
- Implement One Dark Pro syntax highlighting theme
- Include JetBrains Mono font for better code readability
- Add custom CSS for enhanced code block appearance
- Configure pymdownx.highlight with line numbers and anchors
- Add responsive design for code blocks on mobile devices

* feat: add interactive code copy button functionality

- Implement custom JavaScript for code block copy functionality
- Add hover-triggered copy button with smooth animations
- Include visual feedback with check icon on successful copy
- Style copy button with modern design and transitions
- Support both custom and theme-native copy button styles
- Ensure cross-browser clipboard API compatibility

* feat: optimize table rendering with text wrapping

- Add comprehensive table styling with proper text wrapping
- Enable word-break and overflow-wrap for all table cells
- Implement responsive table design for mobile devices
- Add hover effects and striped rows for better readability
- Include gradient header background for visual appeal
- Configure tables markdown extension for proper rendering
- Add smooth scrolling for wide tables on small screens

* fix: restore complete documentation content from main branch

- Recover all original documentation sections and content
- Preserve installation guide, walkthrough, and examples
- Maintain documentation table and citation information
- Keep all code examples and detailed explanations
- Apply modern styling only to header section without content loss

* docs: Add new documentation sections and update mkdocs configuration

- Add rm_library.md and rubric_library.md in library section
- Add navigation.md for improved site navigation
- Add boosting_strategy.md in using_rm section
- Add reference section with .gitkeep
- Update mkdocs.yml configuration

* feat: create interactive RM and Rubric libraries

- Transform static markdown pages into dynamic interactive libraries
- Add search and filter functionality similar to ReMe project design
- RM Library: categorized display of reward models with detailed info
- Rubric Library: comprehensive evaluation rubrics with principles
- Modern responsive UI with modal details and real-time stats
- Consistent with navigation.md planning structure

* [new] auto-rubric

* [rename] rubric

* [new] auto-rubric (#21)

* [new] auto-rubric

* [rename] rubric

* feat: add LLM Judge evaluation module and RL training examples

- Add llm_judge module with pointwise/pairwise/listwise evaluators
- Add alignment reward functions for LLM judge
- Add RL training examples with alignment reward integration
- Add reward manager and alignment RL dataset
- Add GRPO training script and documentation

* refactor: improve RL training dataset and reward function

- Add base dataset class for RL training
- Refactor alignment dataset with DataKeys configuration
- Improve code formatting and structure
- Update reward function documentation

* update

* fix: improve base dataset import with fallback mechanism

- Add robust import fallback for base_dataset module
- Update README and reward manager
- Improve error handling for module imports

* fix

* Update RL training and LLM judge evaluation modules

* [update] rubric src

* Merge branch 'main' into autorubric_gt

* Add evaluation tools and documentation

- Add conflict_detector evaluation tool
- Add judgebench, rmb, rmbench evaluation modules
- Add documentation for evaluation methods
- Add llm_judge reward modules
- Update rewardbench2 implementation
- Add RL training examples
- Fix linting issues (unused imports, f-string formatting)

* Merge llm_bench into feature/upgrade-docs-theme

* Merge origin/boyin_dgr into feature/upgrade-docs-theme

- Add LLM judge framework with adapters, evaluators, and templates
- Add reward manager and RL training examples
- Add base dataset for RL training
- Resolve conflict in alignment_rl_dataset.py

* docs: convert tutorial notebooks to markdown and update documentation

* feat: improve rubric library UI - optimize chip display and layout

* [update] rubric_library

* feat: convert all Jupyter notebooks to Markdown format

- Convert 7 .ipynb files to .md format for better version control
- Update mkdocs.yml to reference .md files instead of .ipynb
- Optimize RM Library card styles (simplified tags, improved layout)
- Update Building RM navigation structure

Files converted:
- tutorial/data: annotation, load, pipeline, process
- tutorial/rm_application: best_of_n, data_refinement, post_training

Benefits:
- Faster build times (no Jupyter conversion needed)
- Better git diffs and version control
- Easier editing and maintenance
- Simplified dependencies

* fix(judgebench): fix evaluation results not being stored in batch processing

- Override _async_parallel method in JudgeBenchReward to use BaseListWiseReward implementation
- Fixes issue where BaseLLMReward._async_parallel was storing results in wrong location due to MRO
- Results now correctly stored in sample.input[-1].additional_kwargs for compute_accuracy
- Tested with qwen2.5-32b-instruct via DashScope API, accuracy calculation now works correctly

* docs(judgebench): add custom API endpoint configuration example

- Add example showing how to configure base_url for custom API endpoints
- Demonstrates usage with Alibaba Cloud DashScope API
- Helps users who need to use OpenAI-compatible third-party APIs

* feat: upgrade docs theme and add conflict detector improvements

- Update mkdocs.yml with new theme configuration
- Enhance documentation pages (index, rm_library, rubric_library, boosting_strategy)
- Add search-fix.js for improved search functionality
- Improve conflict_detector.py with new features
- Add template.py for evaluation
- Add comprehensive test files for conflict detector

* Merge branch 'main' into feature/upgrade-docs-theme

- Resolved conflicts in docs/index.md, mkdocs.yml, and autorubric.md
- Updated all .ipynb references to .md files
- Removed .ipynb files that were converted to .md
- Integrated rubric-related updates from main branch

* refactor: unify terminology from 'principle' to 'rubric' across codebase

- Updated code files: rmb.py, rmbench.py
  - Changed PrincipleListWiseTemplate to RubricListWiseTemplate
  - Updated class inheritance and type annotations

- Updated documentation files:
  - Renamed autoprinciple.md to autorubric.md
  - Updated overview.md: AutoPrinciple → AutoRubric, generator variables
  - Updated custom_reward.md: BasePrincipleReward → BaseRubricReward
  - Updated evaluation/overview.md, best_of_n.md, post_training.md, boosting_strategy.md
  - Updated rm_library.md: CSS, JS, HTML elements, and RM configurations

- All terminology now consistently uses 'rubric' instead of 'principle'
- This change improves clarity and consistency in the reward modeling framework

* Update conflict detector and add test files

* docs: update documentation and add examples

- Update main README and documentation index
- Add FAQ and quickstart guides
- Add tutorial documentation and end-to-end guide
- Add example notebooks (quickstart, custom-rm, evaluation)
- Add README files for rm modules
- Remove outdated POINTWISE_CONFLICT_ANALYSIS.md

* docs: update documentation files and add sitemap

- Update FAQ, quickstart, and tutorial documentation
- Update docs index and mkdocs configuration
- Add sitemap.txt for documentation

* docs: restructure navigation following Diataxis framework

- Reorganize navigation from nested 3-level to flat 2-level structure for better compatibility with shadcn theme
- Replace 'How-to Guides' with topic-based sections: Building RM, Training RM, Evaluating RM, Data Processing, RM Applications
- Improve navigation clarity and user experience
- Keep Tutorials section focused on end-to-end learning
- All documentation files remain accessible with clearer categorization

* docs: improve documentation and jupyter notebooks

- Add code zoom functionality for better code viewing
- Enhance CSS styles for better readability
- Add new jupyter-simple.css for notebook styling
- Update README.md
- Update example notebooks (custom-rm, evaluation, quickstart)
- Add CHANGELOG.md

* docs: add evaluation frameworks comparison analysis and LLM judge research survey

* docs: 优化文档结构和内容

主要改进:
- 重构首页:精简内容从 489 行减至 236 行(52%↓)
- 优化导航:将 Reference 部分前置,删除不存在的 API Documentation 和 Changelog
- 清理 Jupyter Notebook 引用:删除所有 .ipynb 文件引用,修复 16+ 处错误链接
- 简化 Learning Paths:移除冗余子项描述,使路径更清晰
- 修复 Installation tabs:统一使用 pymdownx.tabbed 语法,移除扩展冲突
- 精简 Tutorial README:从 242 行减至 175 行(28%↓)
- 统一文档格式:将 'notebook' 改为 'guide',保持一致性

影响的文件:
- 核心文档:index.md, quickstart.md, mkdocs.yml
- 教程文档:tutorial/README.md 及多个子教程
- 配置文件:sitemap.txt

这些改进让文档更加简洁、准确、易于导航。

* docs: Add navigation scroll fix and update rubric library with dataset link

* Merge pull request #23 from modelscope/docs_diataxis

Docs diataxis

* feat: 添加 GitHub Pages 自动部署配置,移除 Jupyter Notebook 引用

- 添加 GitHub Actions 工作流自动部署文档到 GitHub Pages
- 创建 docs/requirements.txt 管理文档依赖
- 从 mkdocs.yml 移除 mkdocs-jupyter 插件
- 删除 docs/examples 符号链接,避免包含 .ipynb 文件
- 添加部署总结文档

* Revert "feat: 添加 GitHub Pages 自动部署配置,移除 Jupyter Notebook 引用"

This reverts commit 1afbf9f.

* feat: add GitHub Pages deployment workflow

- Add GitHub Actions workflow for automatic deployment
- Create docs/requirements.txt for documentation dependencies
- Remove mkdocs-jupyter plugin from mkdocs.yml
- Remove docs/examples symlink to exclude .ipynb files from docs
- Use latest action versions (v4, v5) to avoid deprecation warnings

* Revert "feat: add GitHub Pages deployment workflow"

This reverts commit 2d65939.

* chore: configure documentation deployment and update mkdocs settings

- Add GitHub Actions workflow for automated docs deployment
- Add docs/requirements.txt for documentation dependencies
- Remove mkdocs-jupyter plugin from mkdocs.yml
- Update sft_rm.md documentation
- Remove docs/examples symlink
- Add .env to .gitignore

* Merge pull request #22 from modelscope/autorubric_gt

[update] autorubric src

* fix: center modal dialog in RM Library

- Fix modal positioning to display in center of viewport
- Use fixed positioning with top/left 50% and transform translate
- Remove ineffective flex/inset properties that were preventing centering
- Set margin to 0 to avoid offset interference

* chore: remove obsolete files

Remove unused research notes, changelog, and test files:
- 2025_LLM_as_Judge_Agent_Research_Survey.md
- CHANGELOG.md
- CONFLICT_DETECTOR_SUMMARY.md
- test_10_samples.py
- test_conflict_detector_comprehensive.py

* [update] mapper

* [update] fix bugs

* update

* Remove agentscope submodule

* update agentscope

* update schema

* add test

* update readme

* add optimizer

* fix bug

* fix bugs

* update template

* update grader

* fix bugs

* fix bugs

* update voting

* remove old files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants