You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add IMO-Bench evaluation scripts for answers and proofs
Introduces two new scripts: eval_imobench_answer.py for evaluating short-answer mathematical problems from the AnswerBench dataset, and eval_imobench_proof.py for evaluating rigorous proof problems from the ProofBench dataset. Both scripts support model evaluation, result saving, and detailed performance analysis.
0 commit comments