-
Notifications
You must be signed in to change notification settings - Fork 302
Open
Description
I can't use chatgpt api. I tried to use DeepSeek as a judge with alpaca_eval. Since the library does not provide built-in support, I followed the existing code patterns and wrote my own evaluators_configs. With this setup, I was able to successfully run the evaluation and generate the expected JSON output.
However, when I proceed to the metrics computation step, I always encounter an error. The evaluation itself runs fine, but calculating the metrics fails.
Would it be possible to add support for DeepSeek as a judge?What should I do?
error:
File "D:\project\Py\LLM-eval\.venv\Lib\site-packages\alpaca_eval\metrics\glm_winrate.py", line 89, in get_length_controlled_winrate
assert len(df["generator_2"].unique()) == 1
but no generator_2 found
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
