EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2.9k
Star 10.9k

Code
Issues 527
Pull requests 162
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: EleutherAI/lm-evaluation-harness

Labels 10 Milestones 1

New pull request New

162 Open 1,700 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Multiple Bangla Benchmark datasets added

#3454 opened Dec 9, 2025 by Ismail-Hossain-1

Loading…

Add Uncheatable Eval

#3442 opened Dec 2, 2025 by ziqing-huang

Loading…

Refactor TaskManager

#3432 opened Nov 26, 2025 by baberabb

Loading…

Adding pisa task

#3412 opened Nov 17, 2025 by HallerPatrick

Loading…

Fix gsm8k_platinum description

#3411 opened Nov 17, 2025 by fxmarty-amd

Loading…

Fix wrong gpqa_diamond_generative_n_shot answer template

#3407 opened Nov 15, 2025 by fxmarty-amd

Loading…

Fix MBPP task configuration and code extraction logic

#3387 opened Nov 6, 2025 by tanios13

Loading…

feat: refine Chain-of-Thought removal logic

#3386 opened Nov 6, 2025 by Co-Cl2

Loading…

[feat] Add Countdown Task

#3384 opened Nov 4, 2025 by StephenXie

Loading…

Math 500

#3381 opened Nov 1, 2025 by seldereyy

Loading…

Fix: Prevent infinite loop when max_seq_lengths < 4096 in prepare_niah.py

#3372 opened Oct 28, 2025 by vnayakde

Loading…

Add support for configurable chrF metric parameters in task YAML, fix…

#3363 opened Oct 23, 2025 by augustlakia

Loading…

Add gsm_symbolic and gsm_symbolic_cot tasks

#3354 opened Oct 19, 2025 by MengAiDev

Loading…

[AIME24 | AIME25] Enable Multiple Generation Repeats with Pass@k and Majority@k Metrics

#3351 opened Oct 17, 2025 by ihebchaa

Loading…

Added ULQA benchmark

#3340 opened Oct 13, 2025 by keramjan

Loading…

feat: Add support for accelerate-wrapped models in simple_evaluate()

#3313 opened Sep 26, 2025 by DhruvaKashyap

Loading…

Support empty response for Completions and ChatCompletions API

#3309 opened Sep 22, 2025 by tboerstad

Loading…

Adding New Task SLR-Bench : Scalable Logical Reasoning Benchmark

#3305 opened Sep 20, 2025 by Ahmad21Omar

Loading…

Support torchrun vllm DP

#3304 opened Sep 19, 2025 by luccafong

Loading…

Gemini evaluation support

#3300 opened Sep 15, 2025 by IsraelAbebe

Loading…

Adding SPaRC to lm eval harness

#3262 opened Aug 25, 2025 by lkaesberg

Loading…

Add long-context evaluation benchmarks (LongBench v2, Babilong, InfiniteBench, Phonebook)

#3256 opened Aug 21, 2025 by Mariani-code

Loading…

fix gsm8k normalization

#3254 opened Aug 20, 2025 by huaanrui

Loading…

Main

#3250 opened Aug 20, 2025 by seongtaehong

Loading…

Adding 3LM to lm eval harness

#3241 opened Aug 14, 2025 by GeorgeSherif

Loading…

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!