Skip to content

Commit 372bf85

Browse files
committed
Merge branch 'docs/sample-reports' into feature/zero-shot-evaluation
2 parents bd0eed4 + b961fe0 commit 372bf85

File tree

63 files changed

+4908
-420
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+4908
-420
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
name: Bug Report
3+
about: Create a report to help us improve
4+
title: '[Bug]:'
5+
labels: 'bug'
6+
assignees: ''
7+
8+
---
9+
10+
**<u>OpenJudge is an open-source project. To involve a broader community, we recommend asking your questions in English.</u>**
11+
12+
**Describe the bug**
13+
A clear and concise description of what the bug is.
14+
15+
**To Reproduce**
16+
Steps to reproduce the behavior:
17+
18+
1. Code snippet that causes the bug:
19+
2. Steps to execute the code:
20+
3. Error message or unexpected behavior observed:
21+
22+
**Expected behavior**
23+
A clear and concise description of what you expected to happen.
24+
25+
**Error messages**
26+
Detailed error messages.
27+
28+
**Environment (please complete the following information):**
29+
30+
- OpenJudge Version: [e.g. 1.0.0 via `print(openjudge.__version__)`]
31+
- Python Version: [e.g. 3.10]
32+
- OS: [e.g. macos, windows]
33+
34+
**Additional context**
35+
Add any other context about the problem here.

.github/ISSUE_TEMPLATE/custom.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
name: Custom issue template
3+
about: Describe this issue template's purpose here.
4+
title: ''
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
**<u>OpenJudge is an open-source project. To involve a broader community, we recommend asking your questions in English.</u>**
11+
12+
**Please describe your issue or question in detail below:**
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
name: Feature Request
3+
about: Suggest an idea for this project
4+
title: '[Feature]: '
5+
labels: 'enhancement'
6+
assignees: ''
7+
8+
---
9+
10+
**<u>OpenJudge is an open-source project. To involve a broader community, we recommend asking your questions in English.</u>**
11+
12+
13+
**Is your feature request related to a problem? Please describe.**
14+
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
15+
16+
**Describe the solution you'd like**
17+
A clear and concise description of what you want to happen.
18+
19+
**Describe alternatives you've considered**
20+
A clear and concise description of any alternative solutions or features you've considered.
21+
22+
**Additional context**
23+
Add any other context or screenshots about the feature request here.

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
## OpenJudge Version
2+
3+
[The version of OpenJudge you are working on, e.g. `import openjudge; print(openjudge.__version__)`]
4+
5+
## Description
6+
7+
[Please describe the background, purpose, changes made, and how to test this PR]
8+
9+
## Checklist
10+
11+
Please check the following items before code is ready to be reviewed.
12+
13+
- [ ] Code has been formatted with `pre-commit run --all-files` command
14+
- [ ] All tests are passing
15+
- [ ] Docstrings are in Google style
16+
- [ ] Related documentation has been updated (e.g. links, examples, etc.)
17+
- [ ] Code is ready for review

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
- [Quickstart](#-quickstart)
3131
- [Integrations](#-integrations)
3232
- [Contributing](#-contributing)
33+
- [Community](#-community)
3334
- [Citation](#-citation)
3435

3536
OpenJudge is a unified framework designed to drive **LLM and Agent application excellence** through **Holistic Evaluation** and **Quality Rewards**.
@@ -188,9 +189,17 @@ We love your input! We want to make contributing to OpenJudge as easy and transp
188189

189190
---
190191

192+
## 💬 Community
191193

194+
Join our DingTalk group to connect with the community:
192195

193-
### Migration Guide (v0.1.x → v0.2.0)
196+
<div align="center">
197+
<img src="./docs/images/dingtalk_qr_code.png" alt="DingTalk QR Code" width="200">
198+
</div>
199+
200+
---
201+
202+
## Migration Guide (v0.1.x → v0.2.0)
194203
> OpenJudge was previously distributed as the legacy package `rm-gallery` (v0.1.x). Starting from v0.2.0, it is published as `py-openjudge` and the Python import namespace is `openjudge`.
195204
196205
**OpenJudge v0.2.0 is NOT backward compatible with v0.1.x.**

README_zh.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
- [快速开始](#-快速开始)
3131
- [集成](#-集成)
3232
- [贡献](#-贡献)
33+
- [社区](#-社区)
3334
- [引用](#-引用)
3435

3536
OpenJudge 是一个统一框架,旨在通过**全面评估****质量奖励**来提升 **LLM 和 Agent 应用效果**
@@ -188,7 +189,17 @@ if __name__ == "__main__":
188189

189190
---
190191

191-
### 迁移指南(v0.1.x → v0.2.0)
192+
## 💬 社区
193+
194+
欢迎加入 OpenJudge 钉钉交流群,与我们一起讨论:
195+
196+
<div align="center">
197+
<img src="./docs/images/dingtalk_qr_code.png" alt="钉钉群二维码" width="200">
198+
</div>
199+
200+
---
201+
202+
## 迁移指南(v0.1.x → v0.2.0)
192203
> OpenJudge 之前以旧包名 `rm-gallery`(v0.1.x)发布。从 v0.2.0 开始,它以 `py-openjudge` 发布,Python 导入命名空间为 `openjudge`
193204
194205
**OpenJudge v0.2.0 与 v0.1.x 不向后兼容。**

cookbooks/grader_validation/accuracy.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,7 @@
88

99
from typing import Dict, List
1010

11-
from tutorials.grader_validation.base import GraderValidator
12-
11+
from cookbooks.grader_validation.base import GraderValidator
1312
from openjudge.analyzer.base_analyzer import AnalysisResult
1413
from openjudge.analyzer.validation.accuracy_analyzer import AccuracyAnalyzer
1514
from openjudge.graders.base_grader import BaseGrader

cookbooks/grader_validation/rewardbench2.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -307,8 +307,10 @@ async def _evaluate_four_way(
307307
GraderScore: Result with score=1.0 if predicted best answer matches ground truth
308308
"""
309309
# Handle None case for mutable arguments
310-
answers = answers if answers is not None else []
311-
chosen_indices = chosen_indices if chosen_indices is not None else []
310+
if not answers:
311+
answers = []
312+
if not chosen_indices:
313+
chosen_indices = []
312314

313315
# Ensure we have exactly 4 answers
314316
if len(answers) < 4:
@@ -402,8 +404,10 @@ async def _evaluate_ties(
402404
GraderScore: Result with score=1.0 if any top-rated answer is in chosen_indices
403405
"""
404406
# Handle None case for mutable arguments
405-
answers = answers if answers is not None else []
406-
chosen_indices = chosen_indices if chosen_indices is not None else []
407+
if not answers:
408+
answers = []
409+
if not chosen_indices:
410+
chosen_indices = []
407411

408412
correct_indices = set(chosen_indices)
409413

0 commit comments

Comments
 (0)