Skip to content

Commit ab60b30

Browse files
committed
Add IMO-Bench evaluation scripts for answers and proofs
Introduces two new scripts: eval_imobench_answer.py for evaluating short-answer mathematical problems from the AnswerBench dataset, and eval_imobench_proof.py for evaluating rigorous proof problems from the ProofBench dataset. Both scripts support model evaluation, result saving, and detailed performance analysis.
1 parent dba3950 commit ab60b30

File tree

2 files changed

+1007
-0
lines changed

2 files changed

+1007
-0
lines changed

0 commit comments

Comments
 (0)