Skip to content

Error when using DeepSeek as a judge in alpaca_eval #455

@AqualCross

Description

@AqualCross

I can't use chatgpt api. I tried to use DeepSeek as a judge with alpaca_eval. Since the library does not provide built-in support, I followed the existing code patterns and wrote my own evaluators_configs. With this setup, I was able to successfully run the evaluation and generate the expected JSON output.

However, when I proceed to the metrics computation step, I always encounter an error. The evaluation itself runs fine, but calculating the metrics fails.

Would it be possible to add support for DeepSeek as a judge?What should I do?

error:

  File "D:\project\Py\LLM-eval\.venv\Lib\site-packages\alpaca_eval\metrics\glm_winrate.py", line 89, in get_length_controlled_winrate
    assert len(df["generator_2"].unique()) == 1

but no generator_2 found

example in json file :
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions