You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- 🌐 **Multi-Scenario Coverage:** Extensive support for diverse domains including Agent, text, code, math, and multimodal tasks. 👉 [Explore Supported Scenarios](https://modelscope.github.io/OpenJudge/built_in_graders/overview/)
94
-
- 🔄 **Holistic Agent Evaluation:** Beyond final outcomes, we assess the entire lifecycle—including trajectories, Memory, Reflection, and Tool Use. 👉 [Agent Lifecycle Evaluation](https://modelscope.github.io/OpenJudge/built_in_graders/agent_graders/)
93
+
- 🌐 **Multi-Scenario Coverage:** Extensive support for diverse domains including Agent, text, code, math, and multimodal tasks. 👉 [Explore Supported Scenarios](https://agentscope-ai.github.io/OpenJudge/built_in_graders/overview/)
94
+
- 🔄 **Holistic Agent Evaluation:** Beyond final outcomes, we assess the entire lifecycle—including trajectories, Memory, Reflection, and Tool Use. 👉 [Agent Lifecycle Evaluation](https://agentscope-ai.github.io/OpenJudge/built_in_graders/agent_graders/)
95
95
- ✅ **Quality Assurance:** Every grader comes with benchmark datasets and pytest integration for validation. 👉 [View Benchmark Datasets](https://huggingface.co/datasets/agentscope-ai/OpenJudge)
96
96
97
97
98
98
### 🛠️ Flexible Grader Building Methods
99
99
Choose the build method that fits your requirements:
100
-
***Customization:** Clear requirements, but no existing grader? If you have explicit rules or logic, use our Python interfaces or Prompt templates to quickly define your own grader. 👉 [Custom Grader Development Guide](https://modelscope.github.io/OpenJudge/building_graders/create_custom_graders/)
101
-
***Zero-shot Rubrics Generation:** Not sure what criteria to use, and no labeled data yet? Just provide a task description and optional sample queries—the LLM will automatically generate evaluation rubrics for you. Ideal for rapid prototyping when you want to get started immediately. 👉 [Zero-shot Rubrics Generation Guide](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/#simple-rubric-zero-shot-generation)
100
+
***Customization:** Clear requirements, but no existing grader? If you have explicit rules or logic, use our Python interfaces or Prompt templates to quickly define your own grader. 👉 [Custom Grader Development Guide](https://agentscope-ai.github.io/OpenJudge/building_graders/create_custom_graders/)
101
+
***Zero-shot Rubrics Generation:** Not sure what criteria to use, and no labeled data yet? Just provide a task description and optional sample queries—the LLM will automatically generate evaluation rubrics for you. Ideal for rapid prototyping when you want to get started immediately. 👉 [Zero-shot Rubrics Generation Guide](https://agentscope-ai.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/#simple-rubric-zero-shot-generation)
102
102
***Data-driven Rubrics Generation:** Ambiguous requirements, but have few examples? Use the GraderGenerator to automatically
103
-
summarize evaluation Rubrics from your annotated data, and generate a llm-based grader. 👉 [Data-driven Rubrics Generation Guide](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/#iterative-rubric-data-driven-generation)
104
-
***Training Judge Models:** Massive data and need peak performance? Use our training pipeline to train a dedicated Judge Model. This is ideal for complex scenarios where prompt-based grading falls short.👉 [Train Judge Models](https://modelscope.github.io/OpenJudge/building_graders/training_judge_models/)
103
+
summarize evaluation Rubrics from your annotated data, and generate a llm-based grader. 👉 [Data-driven Rubrics Generation Guide](https://agentscope-ai.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/#iterative-rubric-data-driven-generation)
104
+
***Training Judge Models:** Massive data and need peak performance? Use our training pipeline to train a dedicated Judge Model. This is ideal for complex scenarios where prompt-based grading falls short.👉 [Train Judge Models](https://agentscope-ai.github.io/OpenJudge/building_graders/training_judge_models/)
105
105
106
106
107
107
### 🔌 Easy Integration
@@ -125,7 +125,7 @@ Using mainstream observability platforms like **LangSmith** or **Langfuse**? We
125
125
pip install py-openjudge
126
126
```
127
127
128
-
> 💡 More installation methods can be found in the [Quickstart Guide](https://modelscope.github.io/OpenJudge/get_started/quickstart/#installation).
128
+
> 💡 More installation methods can be found in the [Quickstart Guide](https://agentscope-ai.github.io/OpenJudge/get_started/quickstart/#installation).
129
129
130
130
---
131
131
@@ -159,7 +159,7 @@ if __name__ == "__main__":
159
159
asyncio.run(main())
160
160
```
161
161
162
-
> 📚 Complete Quickstart can be found in the [Quickstart Guide](https://modelscope.github.io/OpenJudge/get_started/quickstart/).
162
+
> 📚 Complete Quickstart can be found in the [Quickstart Guide](https://agentscope-ai.github.io/OpenJudge/get_started/quickstart/).
163
163
164
164
---
165
165
@@ -169,13 +169,13 @@ Seamlessly connect OpenJudge with mainstream observability and training platform
169
169
170
170
| Category | Platform | Status | Documentation |
171
171
|:---------|:---------|:------:|:--------------|
172
-
|**Observability**|[LangSmith](https://smith.langchain.com/)| ✅ Available | 👉 [LangSmith Integration Guide](https://modelscope.github.io/OpenJudge/integrations/langsmith/)|
173
-
||[Langfuse](https://langfuse.com/)| ✅ Available | 👉 [Langfuse Integration Guide](https://modelscope.github.io/OpenJudge/integrations/langfuse/)|
172
+
|**Observability**|[LangSmith](https://smith.langchain.com/)| ✅ Available | 👉 [LangSmith Integration Guide](https://agentscope-ai.github.io/OpenJudge/integrations/langsmith/)|
173
+
||[Langfuse](https://langfuse.com/)| ✅ Available | 👉 [Langfuse Integration Guide](https://agentscope-ai.github.io/OpenJudge/integrations/langfuse/)|
174
174
|| Other frameworks | 🔵 Planned | — |
175
175
|**Training**|[verl](https://github.com/volcengine/verl)| 🟡 In Progress | — |
> 💬 Have a framework you'd like us to prioritize? [Open an Issue](https://github.com/modelscope/OpenJudge/issues)!
178
+
> 💬 Have a framework you'd like us to prioritize? [Open an Issue](https://github.com/agentscope-ai/OpenJudge/issues)!
179
179
180
180
---
181
181
@@ -184,11 +184,11 @@ Seamlessly connect OpenJudge with mainstream observability and training platform
184
184
We love your input! We want to make contributing to OpenJudge as easy and transparent as possible.
185
185
186
186
> **🎨 Adding New Graders** — Have domain-specific evaluation logic? Share it with the community!
187
-
> **🐛 Reporting Bugs** — Found a glitch? Help us fix it by [opening an issue](https://github.com/modelscope/OpenJudge/issues)
187
+
> **🐛 Reporting Bugs** — Found a glitch? Help us fix it by [opening an issue](https://github.com/agentscope-ai/OpenJudge/issues)
188
188
> **📝 Improving Docs** — Clearer explanations or better examples are always welcome
189
189
> **💡 Proposing Features** — Have ideas for new integrations? Let's discuss!
190
190
191
-
📖 See full [Contributing Guidelines](https://modelscope.github.io/OpenJudge/community/contributing/) for coding standards and PR process.
191
+
📖 See full [Contributing Guidelines](https://agentscope-ai.github.io/OpenJudge/community/contributing/) for coding standards and PR process.
192
192
193
193
---
194
194
@@ -214,11 +214,11 @@ If you are currently using v0.1.x, choose one of the following paths:
214
214
pip install rm-gallery
215
215
```
216
216
217
-
We preserved the source code of **v0.1.7 (the latest v0.1.x release)** in the [`v0.1.7-legacy` branch](https://github.com/modelscope/OpenJudge/tree/v0.1.7-legacy).
217
+
We preserved the source code of **v0.1.7 (the latest v0.1.x release)** in the [`v0.1.7-legacy` branch](https://github.com/agentscope-ai/OpenJudge/tree/v0.1.7-legacy).
218
218
219
-
-**Migrate to v0.2.0 (recommended)**: follow the **[Installation](#-installation)** section above, then walk through **[Quickstart](#-quickstart)** (or the full [Quickstart Guide](https://modelscope.github.io/OpenJudge/get_started/quickstart/)) to update your imports / usage.
219
+
-**Migrate to v0.2.0 (recommended)**: follow the **[Installation](#-installation)** section above, then walk through **[Quickstart](#-quickstart)** (or the full [Quickstart Guide](https://agentscope-ai.github.io/OpenJudge/get_started/quickstart/)) to update your imports / usage.
220
220
221
-
If you run into migration issues, please [open an issue](https://github.com/modelscope/OpenJudge/issues) with your minimal repro and current version.
221
+
If you run into migration issues, please [open an issue](https://github.com/agentscope-ai/OpenJudge/issues) with your minimal repro and current version.
222
222
223
223
---
224
224
@@ -230,7 +230,7 @@ If you use OpenJudge in your research, please cite:
230
230
@software{
231
231
title = {OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards},
0 commit comments