modelscope · helloml0326 · Jan 8, 2026 · Jan 8, 2026 · Jan 8, 2026
diff --git a/README.md b/README.md
@@ -98,7 +98,7 @@ Access **50+ production-ready graders** featuring a comprehensive taxonomy, rigo
 ### 🛠️ Flexible Grader Building Methods
 Choose the build method that fits your requirements:
 * **Customization:** Easily extend or modify pre-defined graders to fit your specific needs.  👉 [Custom Grader Development Guide](https://modelscope.github.io/OpenJudge/building_graders/create_custom_graders/)
-* **Data-Driven Rubrics:** Have a few examples but no clear rules? Use our tools to automatically generate white-box evaluation criteria (Rubrics) based on your data.👉 [Automatic Rubric Generation Tutorial](https://modelscope.github.io/OpenJudge/building_graders/generate_graders_from_data/)
+* **Generate Rubrics:** Need evaluation criteria but don't want to write them manually? Use **Simple Rubric** (from task description) or **Iterative Rubric** (from labeled data) to automatically generate white-box evaluation rubrics. 👉 [Generate Rubrics as Graders](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/)
 * **Training Judge Models ( Coming Soon🚀):** For high-scale and specialized scenarios, we are developing the capability to train dedicated Judge models. Support for SFT, Bradley-Terry models, and Reinforcement Learning workflows is on the way to help you build high-performance, domain-specific graders.
 
 

diff --git a/README_zh.md b/README_zh.md
@@ -98,7 +98,7 @@ OpenJudge 将评估指标和奖励信号统一为标准化的 **Grader** 接口
 ### 🛠️ 灵活的评分器构建方法
 选择适合您需求的构建方法：
 * **自定义：** 轻松扩展或修改预定义的评分器以满足您的特定需求。👉 [自定义评分器开发指南](https://modelscope.github.io/OpenJudge/building_graders/create_custom_graders/)
-* **数据驱动的评分标准：** 有一些示例但没有明确规则？使用我们的工具根据您的数据自动生成白盒评估标准（Rubrics）。👉 [自动评分标准生成教程](https://modelscope.github.io/OpenJudge/building_graders/generate_graders_from_data/)
+* **生成评估标准：** 需要评估标准但不想手动编写？使用 **Simple Rubric**（基于任务描述）或 **Iterative Rubric**（基于标注数据）自动生成白盒评估标准。👉 [生成评估标准作为 Grader](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/)
 * **训练评判模型（即将推出🚀）：** 对于大规模和专业化场景，我们正在开发训练专用评判模型的能力。SFT、Bradley-Terry 模型和强化学习工作流的支持即将推出，帮助您构建高性能、领域特定的评分器。
 
 

diff --git a/docs/building_graders/create_custom_graders.md b/docs/building_graders/create_custom_graders.md
@@ -301,7 +301,7 @@ When running graders, focus on configuring data mappers to connect your dataset
 
 ## Next Steps
 
-- [Generate Graders from Data](generate_graders_from_data.md) — Automate grader creation from labeled examples
+- [Generate Rubrics as Graders](generate_rubrics_as_graders.md) — Automatically generate graders from task description or labeled data
 - [Run Grading Tasks](../running_graders/run_tasks.md) — Evaluate your models at scale
 - [Grader Analysis](../running_graders/grader_analysis.md) — Validate and analyze grader results
 
diff --git a/...ing_graders/generate_graders_from_data.md → ...ng_graders/generate_rubrics_as_graders.md b/...ing_graders/generate_graders_from_data.md → ...ng_graders/generate_rubrics_as_graders.md
@@ -1,10 +1,15 @@
-# Generate Graders from Data
+# Generate Rubrics as Graders
 
-Automatically create evaluation graders from labeled data instead of manually designing criteria. The system learns evaluation rubrics by analyzing what makes responses good or bad in your dataset.
+Automatically create evaluation graders instead of manually designing criteria. OpenJudge provides two approaches:
+
+| Approach | Module | Data Required | Best For |
+|----------|--------|---------------|----------|
+| **Simple Rubric** | `simple_rubric` | Task description only | Quick prototyping, when you have no labeled data |
+| **Iterative Rubric** | `iterative_rubric` | Labeled preference data | Production quality, when you have training examples |
 
 !!! tip "Key Benefits"
     - **Save time** — Eliminate manual rubric design
-    - **Data-driven** — Learn criteria from actual examples
+    - **Intelligent** — Learn criteria from labeled data (Iterative) or task description (Simple)
     - **Consistent** — Produce reproducible evaluation standards
     - **Scalable** — Quickly prototype graders for new domains
 
@@ -29,42 +34,157 @@ Theme: Completeness
 - With rubrics, evaluations become reproducible and explainable
 - The challenge: manually writing good rubrics is time-consuming and requires domain expertise
 
-**The solution:** Auto-Rubric automatically extracts these criteria from your labeled data.
-
+**The solution:** Automatically extract these criteria from your task description (Simple Rubric) or labeled data (Iterative Rubric).
 
-## How It Works
 
-Auto-Rubric extracts evaluation rubrics from preference data without training. Based on [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314).
+## When to Use Each Approach
 
-**Two-stage approach:**
+### Simple Rubric (Zero-Shot)
 
-1. **Infer query-specific rubrics** — For each labeled example, the system proposes criteria that explain why one response is better than another
-2. **Generalize to core set** — Similar rubrics are merged and organized into a compact, non-redundant "Theme-Tips" structure
+Use when you have a clear task description but **no labeled data**.
 
-**Data efficiency:** Using just 70 preference pairs, this method enables smaller models to match or outperform fully-trained reward models.
-
-<figure markdown="span">
-  ![Auto-Rubric Pipeline Overview](../images/auto_rubric_overview.png){ width="100%" }
-  <figcaption>Auto-Rubric Pipeline: From preference data to evaluation rubrics</figcaption>
-</figure>
+!!! tip "Use Simple Rubric When"
+    - You need to quickly prototype a grader
+    - You have no labeled preference or scored data
+    - Your task is well-defined and you can describe it clearly
+    - You want to get started immediately without data collection
 
+!!! warning "Limitations"
+    - Quality depends on task description clarity
+    - May not capture domain-specific nuances
+    - Less accurate than data-driven approaches
 
-## When to Use This Approach
+### Iterative Rubric (Data-Driven)
 
-Suppose you have a dataset of query-response pairs with quality labels (scores or rankings), and you want to create a grader that can evaluate new responses using the same criteria.
+Use when you have **labeled preference data** and want production-quality graders.
 
-!!! tip "Use Data-Driven Generation When"
+!!! tip "Use Iterative Rubric When"
     - You have labeled evaluation data (preference pairs or scored responses)
     - Manual rubric design is too time-consuming or subjective
     - Your evaluation criteria are implicit and hard to articulate
+    - You need high accuracy for production use
 
 !!! warning "Don't Use When"
-    - You have no labeled data
+    - You have no labeled data (use Simple Rubric instead)
     - Your criteria are already well-defined and documented
     - Simple Code-Based evaluation is sufficient
 
+## Simple Rubric: Zero-Shot Generation
+
+Generate evaluation rubrics from task descriptions without any labeled data. The system uses an LLM to create relevant evaluation criteria based on your task context.
+
+### How It Works
+
+1. **Provide task description** — Describe what your system does
+2. **Add context** — Optionally provide usage scenario and sample queries
+3. **Generate rubrics** — LLM creates evaluation criteria automatically
+4. **Create grader** — Rubrics are injected into an LLMGrader
+
+### Quick Example
+
+```python
+import asyncio
+from openjudge.generator.simple_rubric import (
+    SimpleRubricsGenerator,
+    SimpleRubricsGeneratorConfig
+)
+from openjudge.models import OpenAIChatModel
+from openjudge.graders.schema import GraderMode
+
+async def main():
+    config = SimpleRubricsGeneratorConfig(
+        grader_name="translation_quality_grader",
+        model=OpenAIChatModel(model="qwen3-32b"),
+        grader_mode=GraderMode.POINTWISE,
+        task_description="English to Chinese translation assistant for technical documents. Generate rubrics in English.",
+        scenario="Users need accurate, fluent translations of technical content. Please respond in English.",
+        min_score=0,
+        max_score=5,
+    )
+
+    generator = SimpleRubricsGenerator(config)
+    grader = await generator.generate(
+        dataset=[],
+        sample_queries=[
+            "Translate: 'Machine learning is a subset of AI.'",
+            "Translate: 'The API endpoint returned an error.'",
+        ]
+    )
+
+    return grader
+
+grader = asyncio.run(main())
+```
+
+### Inspect Generated Rubrics
+
+```python
+print(grader.kwargs.get("rubrics"))
+```
 
-## Choose Your Evaluation Mode
+**Output (Example):**
+
+```
+1. Accuracy: Whether the translation correctly conveys the technical meaning of the original English text
+2. Fluency: Whether the translated Chinese is grammatically correct and natural-sounding
+3. Technical Appropriateness: Whether the terminology used in the translation is appropriate for a technical context
+4. Consistency: Whether similar terms or phrases are consistently translated throughout the response
+```
+
+### Evaluate Responses
+
+```python
+result = await grader.aevaluate(
+    query="Translate: 'The database query returned an error.'",
+    response="数据库查询返回了一个错误。"
+)
+print(result)
+```
+
+**Output:**
+
+```python
+GraderScore(
+    name='translation_quality_grader',
+    reason="The translation is accurate and correctly conveys the technical meaning of the original English text. The Chinese sentence is grammatically correct and natural-sounding, making it fluent. The terminology used ('数据库查询' for 'database query', '返回了一个错误' for 'returned an error') is appropriate for a technical context. Additionally, the terms are consistently translated throughout the response.",
+    score=5.0
+)
+```
+
+### Simple Rubric Configuration
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `grader_name` | `str` | required | Name for the generated grader |
+| `model` | `BaseChatModel` | required | LLM for generation and evaluation |
+| `grader_mode` | `GraderMode` | `POINTWISE` | `POINTWISE` or `LISTWISE` |
+| `task_description` | `str` | `""` | Description of the task |
+| `scenario` | `str` | `None` | Optional usage context |
+| `language` | `LanguageEnum` | `EN` | Language for prompts (`EN` or `ZH`) |
+| `min_score` | `int` | `0` | Minimum score (pointwise only) |
+| `max_score` | `int` | `1` | Maximum score (pointwise only) |
+| `default_rubrics` | `List[str]` | `[]` | Fallback rubrics if generation fails |
+| `max_retries` | `int` | `3` | Retry attempts for LLM calls |
+
+## Iterative Rubric: Data-Driven Generation
+
+Learn evaluation rubrics from labeled preference data. Based on [Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling](https://arxiv.org/abs/2510.17314).
+
+### How It Works
+
+**Two-stage approach:**
+
+1. **Infer query-specific rubrics** — For each labeled example, the system proposes criteria that explain why one response is better than another
+2. **Generalize to core set** — Similar rubrics are merged and organized into a compact, non-redundant "Theme-Tips" structure
+
+**Data efficiency:** Using just 70 preference pairs, this method enables smaller models to match or outperform fully-trained reward models.
+
+<figure markdown="span">
+  ![Auto-Rubric Pipeline Overview](../images/auto_rubric_overview.png){ width="100%" }
+  <figcaption>Auto-Rubric Pipeline: From preference data to evaluation rubrics</figcaption>
+</figure>
+
+### Choose Your Evaluation Mode
 
 | Mode | Config Class | Use Case | Data Format | Output |
 |------|--------------|----------|-------------|--------|
@@ -76,7 +196,7 @@ Suppose you have a dataset of query-response pairs with quality labels (scores o
     Pairwise is a special case of Listwise with exactly 2 responses. Use the same `IterativeListwiseRubricsGeneratorConfig` for both.
 
 
-## Complete Example: Build a Code Review Grader (Pointwise)
+### Complete Example: Build a Code Review Grader (Pointwise)
 
 Let's walk through a complete example: building a grader that evaluates code explanation quality.
 
@@ -218,7 +338,7 @@ GraderScore(
 )
 ```
 
-## Complete Example: Build a Code Solution Comparator (Pairwise)
+### Complete Example: Build a Code Solution Comparator (Pairwise)
 
 Let's build a grader that compares two code implementations and determines which solution is better. This is useful for code review, interview assessment, or selecting the best implementation from multiple candidates.
 
@@ -394,7 +514,7 @@ GraderRank(
 ```
 
 
-## Configuration Reference
+## Iterative Rubric Configuration Reference
 
 ### Core Parameters
 
@@ -427,9 +547,29 @@ GraderRank(
     - `LISTWISE_EVALUATION_TEMPLATE` — for ranking
 
 
+---
+
+## Choosing Between Simple and Iterative Rubric
+
+| Scenario | Recommended Approach |
+|----------|---------------------|
+| Quick prototype, no data | **Simple Rubric** |
+| Production grader with labeled data | **Iterative Rubric** |
+| Well-defined task, need fast setup | **Simple Rubric** |
+| Complex domain, implicit criteria | **Iterative Rubric** |
+| < 50 labeled examples | **Simple Rubric** (or collect more data) |
+| 50-100+ labeled examples | **Iterative Rubric** |
+
+!!! tip "Workflow Recommendation"
+    1. Start with **Simple Rubric** for quick prototyping
+    2. Collect preference data during initial deployment
+    3. Upgrade to **Iterative Rubric** when you have 50+ labeled examples
+
+---
+
 ## Tips
 
-### Data Quality
+### Data Quality (Iterative Rubric)
 
 !!! tip "Good Practices"
     - Clear preference signals (good vs. bad is obvious)
@@ -440,7 +580,20 @@ GraderRank(
     - Ambiguous cases where labels are debatable
     - Noisy or contradictory labels
 
-### Parameter Tuning
+### Task Description Quality (Simple Rubric)
+
+!!! tip "Good Practices"
+    - Be specific about what your system does
+    - Include the target audience or use case
+    - Mention key quality dimensions you care about
+    - Provide representative sample queries
+
+!!! warning "Avoid"
+    - Vague descriptions like "chatbot" or "assistant"
+    - Missing context about the domain
+    - No sample queries (the LLM needs examples)
+
+### Parameter Tuning (Iterative Rubric)
 
 | Goal | Recommended Settings |
 |------|---------------------|

diff --git a/docs/building_graders/overview.md b/docs/building_graders/overview.md
@@ -68,11 +68,11 @@ Define evaluation logic using LLM judges or code-based functions with no trainin
 **Learn more:** [Create Custom Graders →](create_custom_graders.md) | [Built-in Graders →](../built_in_graders/overview.md)
 
 
-### Approach 2: Generate Graders from Data
+### Approach 2: Generate Rubrics as Graders
 
-Automatically analyze evaluation data to create structured scoring rubrics. Provide 50-500 labeled examples, and the generator extracts patterns to build interpretable criteria. Generated graders produce explicit rubrics that explain scoring decisions, ideal for scenarios requiring transparency and rapid refinement.
+Automatically generate evaluation rubrics and create graders. Two approaches available: **Simple Rubric** generates rubrics from task descriptions (zero-shot, no data required), while **Iterative Rubric** learns from 50-500 labeled examples to extract patterns. Both produce explicit rubrics that explain scoring decisions, ideal for scenarios requiring transparency and rapid refinement.
 
-**Learn more:** [Generate Graders from Data →](generate_graders_from_data.md)
+**Learn more:** [Generate Rubrics as Graders →](generate_rubrics_as_graders.md)
 
 
 ### Approach 3: Train Reward Models
@@ -86,7 +86,7 @@ Train neural networks on preference data to learn evaluation criteria automatica
 ## Next Steps
 
 - [Create Custom Graders](create_custom_graders.md) — Build graders using LLM or code-based logic
-- [Generate Graders from Data](generate_graders_from_data.md) — Auto-generate rubrics from labeled data
+- [Generate Rubrics as Graders](generate_rubrics_as_graders.md) — Automatically generate graders from task description or labeled data
 - [Train with GRPO](training_grpo.md) — Train generative judge models with reinforcement learning
 - [Built-in Graders](../built_in_graders/overview.md) — Explore pre-built graders to customize
 - [Run Grading Tasks](../running_graders/run_tasks.md) — Deploy graders at scale with batch workflows

diff --git a/docs/index.md b/docs/index.md
@@ -20,7 +20,7 @@ OpenJudge unifies evaluation metrics and reward signals into a single, standardi
 
 + **Flexible Grader Building**: Choose the build method that fits your requirements:
     - **Customization:** Easily extend or modify pre-defined graders to fit your specific needs. <a href="building_graders/create_custom_graders/" class="feature-link">Custom Grader Development Guide <span class="link-arrow">→</span></a>
-    - **Data-Driven Rubrics:** Have a few examples but no clear rules? Use our tools to automatically generate white-box evaluation criteria (Rubrics) based on your data. <a href="building_graders/generate_graders_from_data/" class="feature-link">Automatic Rubric Generation Tutorial <span class="link-arrow">→</span></a>
+    - **Generate Rubrics:** Need evaluation criteria but don't want to write them manually? Use **Simple Rubric** (from task description) or **Iterative Rubric** (from labeled data) to automatically generate white-box evaluation rubrics. <a href="building_graders/generate_rubrics_as_graders/" class="feature-link">Generate Rubrics as Graders <span class="link-arrow">→</span></a>
     - **Training Judge Models:** For high-scale and specialized scenarios, we are developing the capability to train dedicated Judge models. Support for SFT, Bradley-Terry models, and Reinforcement Learning workflows is on the way to help you build high-performance, domain-specific graders. <span class="badge-wip">🚧 Coming Soon</span>
 
 + **Easy Integration**: We're actively building seamless connectors for mainstream observability platforms and training frameworks. Stay tuned!<span class="badge-wip">🚧 Coming Soon</span>
@@ -139,7 +139,7 @@ OpenJudge unifies evaluation metrics and reward signals into a single, standardi
     </p>
   </a>
 
-  <a href="building_graders/generate_graders_from_data/" class="feature-card-sm">
+  <a href="building_graders/generate_rubrics_as_graders/" class="feature-card-sm">
     <div class="card-header">
       <img src="https://unpkg.com/lucide-static@latest/icons/database.svg" class="card-icon card-icon-data">
       <h3>Data-Driven Rubrics</h3>

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -34,7 +34,7 @@ nav:
   - Building Graders:
       - Overview: building_graders/overview.md
       - Create Custom Graders: building_graders/create_custom_graders.md
-      - Generate Graders from Data: building_graders/generate_graders_from_data.md
+      - Generate Rubrics as Graders: building_graders/generate_rubrics_as_graders.md
       # - Train Reward Models: building_graders/training/overview.md  # Coming soon
 
   - Running Graders: