OpenCompass v0.5.2 Release Notes
π Highlights
β¨ π§ͺ Extensive New Benchmarks Support: We have introduced comprehensive support for Scientific and General Benchmarks, including SciReasoner, Biology Instructions, Mol Instructions, CMPhysBench, IFBench, LCB-pro, etc.
β¨ π€ New Model & API Support: Added support for Intern-S1-Pro and TeleChat API evaluation examples.
β¨ π οΈ Infrastructure & Enhancements: Fixed bugs, improved evaluation pipelines and updated CI.
π New Features
π§ Introduced support for HMMT2025 (#2305), AMO-Bench (#2305), IMO-Bench (#2305), ATLAS (#2297), OpenSWI (#2312), CMPhysBench (#2313), Biology Instructions (#2314), Mol Instructions (#2326), ARC_AGI_2 (#2330), IFBench (#2354), SciReasoner (#2360), PI-LLM (#2283), ProcessBench (#2274), and LCB_pro (#2361).
π§ Supported monitoring of multi-dimensional evaluation metrics, including output length, logprobs, and finish reasons (#2351).
π§ Added support for Intern-S1-Pro evaluation examples (#2394).
π§ Added support for TeleChat API inference (#2371).
π§ Added LLM-judge-based config for C-Eval (#2398).
π Bug Fixes
π§ Fixed OpenAISDKStreaming regarding output completeness and related issues (#2367, #2389, #2399).
π§ Removed Pyext in runtime requirement (#2306).
π§ Fixed pattern match in Smolinstruct (#2384).
π§ Fixed buffer-related error in the LiveCodeBench evaluation (#2393).
β Enhancements and Refactors
β Infrastructure Refactors:
- Updated LCBench (#2166).
- Added headers as input param in BigCodeBench (#2302).
- Updated rjob with metadata name (#2316).
- Parametrized timeout in OpenAISDK (#2352).
- Added meta logger in OpenICLInferTask (#2383).
β CI/CD Improvements:
- Refactored dailytest (#2308).
- Added CI for new datasets (#2358, #2369).
- Changed github runner (#2373).
- Added uni-test (#2390).
π Welcome New Contributors
A warm welcome and special thanks to our newest contributors who made this release possible:
- @zhuangziGiantfish made their first contribution in (#2283).
- @xgao922 made their first contribution in (#2307).
- @Jensen246 made their first contribution in (#2310).
- @ccx06 made their first contribution in (#2371).
Full Changelog: 0.5.1.post1...0.5.2
Thank you for using OpenCompass! These updates empower deeper insights and more reliable evaluations. Keep exploring, and stay tuned for future innovations! π