Skip to content

0.5.2

Latest

Choose a tag to compare

@Myhs-phz Myhs-phz released this 14 Feb 03:46
9741792

OpenCompass v0.5.2 Release Notes

🌟 Highlights

✨ πŸ§ͺ Extensive New Benchmarks Support: We have introduced comprehensive support for Scientific and General Benchmarks, including SciReasoner, Biology Instructions, Mol Instructions, CMPhysBench, IFBench, LCB-pro, etc.
✨ πŸ€– New Model & API Support: Added support for Intern-S1-Pro and TeleChat API evaluation examples.
✨ πŸ› οΈ Infrastructure & Enhancements: Fixed bugs, improved evaluation pipelines and updated CI.


πŸš€ New Features

πŸ”§ Introduced support for HMMT2025 (#2305), AMO-Bench (#2305), IMO-Bench (#2305), ATLAS (#2297), OpenSWI (#2312), CMPhysBench (#2313), Biology Instructions (#2314), Mol Instructions (#2326), ARC_AGI_2 (#2330), IFBench (#2354), SciReasoner (#2360), PI-LLM (#2283), ProcessBench (#2274), and LCB_pro (#2361).
πŸ”§ Supported monitoring of multi-dimensional evaluation metrics, including output length, logprobs, and finish reasons (#2351).
πŸ”§ Added support for Intern-S1-Pro evaluation examples (#2394).
πŸ”§ Added support for TeleChat API inference (#2371).
πŸ”§ Added LLM-judge-based config for C-Eval (#2398).


πŸ› Bug Fixes

πŸ”§ Fixed OpenAISDKStreaming regarding output completeness and related issues (#2367, #2389, #2399).
πŸ”§ Removed Pyext in runtime requirement (#2306).
πŸ”§ Fixed pattern match in Smolinstruct (#2384).
πŸ”§ Fixed buffer-related error in the LiveCodeBench evaluation (#2393).


βš™ Enhancements and Refactors

βš™ Infrastructure Refactors:

  • Updated LCBench (#2166).
  • Added headers as input param in BigCodeBench (#2302).
  • Updated rjob with metadata name (#2316).
  • Parametrized timeout in OpenAISDK (#2352).
  • Added meta logger in OpenICLInferTask (#2383).

βš™ CI/CD Improvements:

  • Refactored dailytest (#2308).
  • Added CI for new datasets (#2358, #2369).
  • Changed github runner (#2373).
  • Added uni-test (#2390).

πŸŽ‰ Welcome New Contributors

A warm welcome and special thanks to our newest contributors who made this release possible:


Full Changelog: 0.5.1.post1...0.5.2

Thank you for using OpenCompass! These updates empower deeper insights and more reliable evaluations. Keep exploring, and stay tuned for future innovations! 🌟