About calculating ELO ratings

Hello! 

Why does the `show_result.py::compute_ratings` function override the ELO-rating of all models with respect to [BASELINE_MODEL_NAME=gpt-3.5-turbo-0125](https://github.com/VikhrModels/ru_llm_arena/blob/464ad28c1909c1e2390ab74afbc787834408f720/show_result.py#L44) and not with respect to the baseline model in judge_config.yaml? If I want to use a different baseline model, the current ELO-rating calculation seems to be wrong because the ELO-rating is overridden relative to gpt-3.5-turbo-0125.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About calculating ELO ratings #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About calculating ELO ratings #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions