Releases: llm-jp/llm-jp-eval-mm
Releases · llm-jp/llm-jp-eval-mm
v0.4.1
What's Changed
- Add useful features for make_leaderboard.py by @speed1313 in #152
- Add MNIST task by @speed1313 in #153
- change model names and split qwen2 and qwen2.5 by @Silviase in #155
- Mypy typing and some formatting by @Silviase in #157
- add z-score calc. configuration at make_leaderboard.py by @Silviase in #159
- Fix dependency by @speed1313 in #162
- Add Heron-NVILA by @riron1206 in #163
- CC_OCR 日本語サブセットの追加 by @Silviase in #168
New Contributors
- @riron1206 made their first contribution in #163
Full Changelog: v0.4.0...v0.4.1
v0.4.0
What's Changed
- Refactoring Task and Scorer class by @speed1313 in #150
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's Changed
- Generalize aggregate() output type and Remove unnecessary methods by @speed1313 in #131
- Improve JDocQA's preparation time and Fix JMMMU scoring and Add phi4 and Refactoring by @speed1313 in #141
- Add visualization script by @speed1313 in #143
- Fix Heron-bench scoring and Add Asagi model by @speed1313 in #146
Full Changelog: v0.2.2...v0.3.0
v0.2.2
What's Changed
- [WIP] Add gemma3 and Qwen2.5 VL and sarashina and Refactoring by @speed1313 in #123
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- Fix vilaja group dependency by @speed1313 in #107
- JIC-VQA評価データセットの追加 by @PeifeiZhu in #116
- add mecha-ja by @Silviase in #122
New Contributors
- @PeifeiZhu made their first contribution in #116
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's Changed
- Refactoring and Fix OpenAI API bug by @speed1313 in #89
- Use uv by @speed1313 in #96
- Add mmmu and llava-itw tasks by @speed1313 in #97
- Add GitHub pages by @speed1313 in #100
- 不要なファイルの消去 by @Silviase in #102
- Add acknowledge section by @speed1313 in #103
Full Changelog: v0.1.2...v0.2.0
v0.1.2
v0.1.1
Full Changelog: v0.1.0...v0.1.1
Fix dependencies to publish the package.
v0.1.0
What's Changed
- Generation Configの追加 by @speed1313 in #77
- Add custom metrics features and Refactoring by @speed1313 in #78
- add record benchmark result table by @Silviase in #80
- Add JDocQA Task by @speed1313 in #79
Full Changelog: v0.0.7...v0.1.0
v0.0.7
Full Changelog: v0.0.6...v0.0.7