Skip to content

Commit 3a83299

Browse files
Add model_args for judge LLM (#1241)
* Add model_args for judge LLM * Fix type hint for model_args in LLMJudge * update doc --------- Co-authored-by: Yunnglin <mao.looper@qq.com>
1 parent 6e43428 commit 3a83299

File tree

3 files changed

+7
-1
lines changed

3 files changed

+7
-1
lines changed

docs/en/get_started/parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@ LLM-as-a-Judge evaluation parameters using a judge model to determine correctnes
174174
| `system_prompt` | `str` | System prompt | - |
175175
| `prompt_template` | `str` | Prompt template | Auto-selected based on `score_type` |
176176
| `generation_config` | `dict` | Generation parameters (same as `--generation-config`) | - |
177+
| `model_args` | `dict` | Judge model loading parameters (same as `--model-args`), e.g. `{"default_headers": {"X-API-KEY": "your-api-key"}}` | `{}` |
177178
| `score_type` | `str` | Scoring method<br>• `pattern`: Judge if answer matches reference<br>• `numeric`: Score without reference (0-1) | `pattern` |
178179
| `score_pattern` | `str` | Regex to parse output | `pattern` mode: `(A\|B)`<br>`numeric` mode: `\[\[(\d+(?:\.\d+)?)\]\]` |
179180
| `score_mapping` | `dict` | Score mapping for `pattern` mode | `{'A': 1.0, 'B': 0.0}` |

docs/zh/get_started/parameters.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ LLM-as-a-Judge评测参数,使用裁判模型判断正误:
157157
|------|------|------|--------|
158158
| `--judge-strategy` | `str` | 裁判模型策略<br>• `auto`: 根据数据集自动决定<br>• `llm`: 总是使用裁判模型<br>• `rule`: 只使用规则判断<br>• `llm_recall`: 规则失败后使用裁判模型 | `auto` |
159159
| `--judge-worker-num` | `int` | 裁判模型并发数 | `1` |
160-
| `--judge-model-args` | `str` | 裁判模型配置(JSON字符串),详见下表 | - |
160+
| `--judge-model-args` | `dict` | 裁判模型配置(JSON字符串),详见下表 | - |
161161
| `--analysis-report` | `bool` | 是否生成分析报告(自动判断语言) | `false` |
162162

163163
### judge-model-args 配置项
@@ -170,6 +170,7 @@ LLM-as-a-Judge评测参数,使用裁判模型判断正误:
170170
| `system_prompt` | `str` | 系统prompt | - |
171171
| `prompt_template` | `str` | Prompt模板 | 根据`score_type`自动选择 |
172172
| `generation_config` | `dict` | 生成参数(同`--generation-config`| - |
173+
| `model_args` | `dict` | 裁判模型加载参数(同`--model-args`),例如`{"default_headers": {"X-API-KEY": "your-api-key"}}` | `{}` |
173174
| `score_type` | `str` | 打分方式<br>• `pattern`: 判断与参考答案是否相同<br>• `numeric`: 无参考答案打分(0-1) | `pattern` |
174175
| `score_pattern` | `str` | 解析输出的正则表达式 | `pattern`模式:`(A\|B)`<br>`numeric`模式:`\[\[(\d+(?:\.\d+)?)\]\]` |
175176
| `score_mapping` | `dict` | `pattern`模式的分数映射 | `{'A': 1.0, 'B': 0.0}` |

evalscope/metrics/llm_judge.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ def __init__(
5454
api_url: Optional[str] = None,
5555
model_id: Optional[str] = None,
5656
eval_type: Optional[str] = None,
57+
model_args: Optional[Dict[str, Any]] = None,
5758
system_prompt: Optional[str] = None,
5859
prompt_template: Optional[str] = None,
5960
generation_config: Optional[Dict[str, Any]] = None,
@@ -70,6 +71,7 @@ def __init__(
7071
api_base (str, optional): API base URL
7172
model_id (str, optional): Model ID for LLM
7273
eval_type (str, optional): Evaluation LLM type for the judge
74+
model_args (dict, optional): Additional model arguments for the judge
7375
system_prompt (str, optional): System prompt for the judge
7476
prompt_template (str, optional): Prompt template for the judge
7577
generation_config (dict, optional): Generation configuration for the judge
@@ -85,6 +87,7 @@ def __init__(
8587
self.eval_type = eval_type or EvalType.OPENAI_API
8688
self.system_prompt = system_prompt or os.environ.get('JUDGE_SYSTEM_PROMPT', None)
8789
self.generation_config = generation_config or {'temperature': 0.0, 'max_tokens': 4096}
90+
self.model_args = model_args or {}
8891

8992
# Default score mapping for A/B pattern
9093
self.score_type = score_type
@@ -112,6 +115,7 @@ def _init_server_adapter(self):
112115
base_url=self.api_url,
113116
api_key=self.api_key,
114117
config=GenerateConfig(**self.generation_config),
118+
model_args=self.model_args,
115119
)
116120

117121
def judge(

0 commit comments

Comments
 (0)