Error when using DeepSeek as a judge in alpaca_eval

I can't use chatgpt api. I tried to use DeepSeek as a judge with alpaca_eval. Since the library does not provide built-in support, I followed the existing code patterns and wrote my own evaluators_configs. With this setup, I was able to successfully run the evaluation and generate the expected JSON output.

However, when I proceed to the metrics computation step, I always encounter an error. The evaluation itself runs fine, but calculating the metrics fails.

Would it be possible to add support for DeepSeek as a judge？What should I do?

error:
```
  File "D:\project\Py\LLM-eval\.venv\Lib\site-packages\alpaca_eval\metrics\glm_winrate.py", line 89, in get_length_controlled_winrate
    assert len(df["generator_2"].unique()) == 1
```
but no generator_2 found

example in json file :
<img width="420" height="267" alt="Image" src="https://github.com/user-attachments/assets/10b683d2-3c40-4ea4-a4fb-f690bbe1b00e" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using DeepSeek as a judge in alpaca_eval #455

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when using DeepSeek as a judge in alpaca_eval #455

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions