-
Notifications
You must be signed in to change notification settings - Fork 407
Open
Description
To run an eval on a very large dataset that can't fit in memory or cached on disk, we have to use HF dataset loader's streaming option. Currently Inspect AI's HF dataset wrapper by default will call dataset.save_to_disk which is unsupported on the dataset iterator. Additionally, MemoryDataset assumes a fully loaded dataset for slicing, shuffling, etc.
I would love to be able to run evals on large datasets so just wanted to put in this as a feature request.
Much <3 to the maintainers of this repo.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels