-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Hello!
Why does the show_result.py::compute_ratings function override the ELO-rating of all models with respect to BASELINE_MODEL_NAME=gpt-3.5-turbo-0125 and not with respect to the baseline model in judge_config.yaml? If I want to use a different baseline model, the current ELO-rating calculation seems to be wrong because the ELO-rating is overridden relative to gpt-3.5-turbo-0125.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels