Skip to content

[Update]Academic bench llm judge update#1876

Merged
MaiziXiao merged 16 commits intoopen-compass:mainfrom
Zhudongsheng75:academicBench_llmJudge_update
Feb 24, 2025
Merged

[Update]Academic bench llm judge update#1876
MaiziXiao merged 16 commits intoopen-compass:mainfrom
Zhudongsheng75:academicBench_llmJudge_update

Conversation

@Zhudongsheng75
Copy link
Collaborator

@Zhudongsheng75 Zhudongsheng75 commented Feb 17, 2025

  1. In academic rankings, bbh, aime, math have introduced the LLM judge to evaluate answers;
  2. The evaluation configs related to the Deepseek-R1-Distill models has been added;
  3. IFEval has added regularization of to be applicable to R1-like models;
  4. A new running template for the academic leaderboard 2502 has been added.

#######################################################################
# PART 5 Utils Configuaration #
#######################################################################
work_dir = './outputs/oc_academic_202412'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update work_dir

@MaiziXiao
Copy link
Contributor

Fix lint plz

Copy link
Contributor

@MaiziXiao MaiziXiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MaiziXiao
Copy link
Contributor

Fix Lint plz

@MaiziXiao MaiziXiao merged commit 465e93e into open-compass:main Feb 24, 2025
8 checks passed
stephen-nju pushed a commit to stephen-nju/opencompass that referenced this pull request May 14, 2025
* BigCodeBench update

* update LCBench

* update LCBench 2

* update code

* academicBench update

* academic bench ifeval&math update

* generic_llmjudge_aime_academic_postprocess delete

* aime delete

* postprocessors update

* ifeval delete

* update work_dir

* linting

* linting double-quote-string-fixer

* r1-distill out_len update

* fix lint

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
zyc140345 pushed a commit to zyc140345/opencompass that referenced this pull request Oct 23, 2025
* BigCodeBench update

* update LCBench

* update LCBench 2

* update code

* academicBench update

* academic bench ifeval&math update

* generic_llmjudge_aime_academic_postprocess delete

* aime delete

* postprocessors update

* ifeval delete

* update work_dir

* linting

* linting double-quote-string-fixer

* r1-distill out_len update

* fix lint

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
iamkaia pushed a commit to iamkaia/opencompass that referenced this pull request Feb 4, 2026
* BigCodeBench update

* update LCBench

* update LCBench 2

* update code

* academicBench update

* academic bench ifeval&math update

* generic_llmjudge_aime_academic_postprocess delete

* aime delete

* postprocessors update

* ifeval delete

* update work_dir

* linting

* linting double-quote-string-fixer

* r1-distill out_len update

* fix lint

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants