Adding `pisa` task #3412

HallerPatrick · 2025-11-17T13:28:33Z

Hello!

this PR adds our benchmark PisaBench to the eval harness. It is a multi-lingual, multi-modal benchmark based on the PISA study. More information is provided in the task README.

Each instance contains a question and a complementary image and a set of multiple choice answers. Evaluation is done through substring matching, or by using gpt-4o-mini as a judged for each generated answer.

This task should be runnable with the model type hf-multimodal and vllm-vlm.

Link to the HF orga containing the dataset and a leaderboard for current models
Link to the paper

CLAassistant · 2025-11-17T13:28:40Z

All committers have signed the CLA.

HallerPatrick · 2025-12-02T10:18:19Z

Sorry for the delay. Didn't see that the CLA check didn't went through. I cannot interpret the error of the Tasks Modifiedcheck. Am I missing something?

baberabb · 2025-12-02T17:28:40Z

Sorry for the delay. Didn't see that the CLA check didn't went through. I cannot interpret the error of the Tasks Modifiedcheck. Am I missing something?

Could I ask you to remove the extension from lm_eval/tasks/pisa/_template.yaml. It currently expects all .yaml files to be proper configs, but this one is only used as an import in the other configs

HallerPatrick requested a review from baberabb as a code owner November 17, 2025 13:28

HallerPatrick added 3 commits December 2, 2025 10:43

Adding task based on the PisaBench benchmark

c7db581

Update tasks README with pisa task

71c6c7c

Fixing formatting for pre-commit hook

735b312

HallerPatrick force-pushed the pisa branch from 771c602 to 735b312 Compare December 2, 2025 09:47

Renamed template.yaml to template_yaml, adjusted includes in tasks

bc9095f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding `pisa` task #3412

Adding `pisa` task #3412

Uh oh!

HallerPatrick commented Nov 17, 2025

Uh oh!

CLAassistant commented Nov 17, 2025 •

edited

Loading

Uh oh!

HallerPatrick commented Dec 2, 2025

Uh oh!

baberabb commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding pisa task #3412

Are you sure you want to change the base?

Adding pisa task #3412

Uh oh!

Conversation

HallerPatrick commented Nov 17, 2025

Uh oh!

CLAassistant commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HallerPatrick commented Dec 2, 2025

Uh oh!

baberabb commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding `pisa` task #3412

Adding `pisa` task #3412

CLAassistant commented Nov 17, 2025 •

edited

Loading