Feature: Benchmarks with EleutherAI lm-evaluation-harness #49

jvm123 · 2025-06-03T03:56:43Z

Adapter for EleutherAI's lm-evaluation-harness to be able to execute their benchmarks, in particular for non-coding tasks with LLM feedback.

Early implementation, likely some issues with the prompts still, see README.md.

CLAassistant · 2025-06-03T03:56:49Z

All committers have signed the CLA.

…e in templates.py

…e, when it does not find code blocks

jvm123 · 2025-06-03T04:09:18Z

This PR depends on/references PR #47

codelion · 2025-06-03T04:15:13Z

Would it make sense to move it as an example in the examples folder instead of the scripts? I think this is similar to the symbolic regresssion benchmark example in the current folder - https://github.com/codelion/openevolve/tree/main/examples/symbolic_regression

jvm123 · 2025-06-03T04:36:29Z

Done!

jvm123 · 2025-06-03T23:37:50Z

Note: I would like to contribute more benchmark results, but don't really have the compute or (big) budget. Happy for hints if anyone has any.

jvm123 added 2 commits June 3, 2025 06:05

Default system prompts in config.py refer to the default template nam…

886c1d6

…e in templates.py

parse_full_rewrite() falls back to direct pass-through of LLM respons…

1e89614

…e, when it does not find code blocks

jvm123 force-pushed the feat-lm-eval branch from b416d29 to 33ce67c Compare June 3, 2025 04:05

jvm123 added 2 commits June 3, 2025 06:06

Integration of EleutherAI's lm-evaluation-harness

dabcc74

Tiny docker fix and removed debug output

cc9b6db

jvm123 force-pushed the feat-lm-eval branch from 33ce67c to cc9b6db Compare June 3, 2025 04:07

jvm123 changed the title ~~Benchmarks with EleutherAI lm-evaluation-harness~~ Feature: Benchmarks with EleutherAI lm-evaluation-harness Jun 3, 2025

Moved scripts/lm_eval/ to examples/lm_eval/

1dc2983

jvm123 force-pushed the feat-lm-eval branch from da3c126 to 1dc2983 Compare June 3, 2025 04:25

Removal of scripts/ folder

af8499a

jvm123 and others added 2 commits June 4, 2025 07:05

Merge branch 'main' into feat-lm-eval

8850a2d

Merge branch 'main' into pr/49

e93890e

codelion merged commit 4b099e3 into algorithmicsuperintelligence:main Jun 4, 2025
3 checks passed

jvm123 deleted the feat-lm-eval branch June 5, 2025 02:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Benchmarks with EleutherAI lm-evaluation-harness #49

Feature: Benchmarks with EleutherAI lm-evaluation-harness #49

Uh oh!

jvm123 commented Jun 3, 2025

Uh oh!

CLAassistant commented Jun 3, 2025 •

edited

Loading

Uh oh!

jvm123 commented Jun 3, 2025

Uh oh!

codelion commented Jun 3, 2025

Uh oh!

jvm123 commented Jun 3, 2025

Uh oh!

jvm123 commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature: Benchmarks with EleutherAI lm-evaluation-harness #49

Feature: Benchmarks with EleutherAI lm-evaluation-harness #49

Uh oh!

Conversation

jvm123 commented Jun 3, 2025

Uh oh!

CLAassistant commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvm123 commented Jun 3, 2025

Uh oh!

codelion commented Jun 3, 2025

Uh oh!

jvm123 commented Jun 3, 2025

Uh oh!

jvm123 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jun 3, 2025 •

edited

Loading

jvm123 commented Jun 3, 2025 •

edited

Loading