Describe the bug
datasets downloads and generates all splits, even though a single split is requested. This is the dataset in question. It uses a loading script. I am not 100% sure that this is a bug, because maybe with loading scripts datasets must actually process all the splits? But I thought loading scripts were designed to avoid this.
Steps to reproduce the bug
See this notebook
Or:
from datasets import load_dataset
dataset = load_dataset('jordiae/exebench', split='test_synth', trust_remote_code=True)
Expected behavior
I expected only the test_synth split to be downloaded and processed.
Environment info
datasets version: 3.5.1
- Platform: Linux-6.1.123+-x86_64-with-glibc2.35
- Python version: 3.11.12
huggingface_hub version: 0.30.2
- PyArrow version: 18.1.0
- Pandas version: 2.2.2
fsspec version: 2025.3.0