-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Description
Describe the bug
datasets.load_dataset revision semantics are a bit inconsistent when the dataset is not found on the huggingface hub. When fetching the latest cached version of the dataset, the revision argument is ignored, so long as any cached versions of the dataset already exist in the HF cache.
Steps to reproduce the bug
import datasets
datasets.load_dataset(
"sentientfutures/ahb",
"dimensions",
split="train",
revision="main"
)
# would expect some error to raise here
datasets.load_dataset(
"sentientfutures/ahb",
"dimensions",
split="train",
revision="invalid_revision"
)Expected behavior
On the second call to datasets.load_dataset in the 'steps to reproduce the bug' example, expect something like:
raise DatasetNotFoundError(
datasets.exceptions.DatasetNotFoundError: Revision 'invalid_revision' doesn't exist for dataset 'sentientfutures/ahb' on the Hub.Environment info
datasetsversion: 4.4.1- Platform: Linux-6.2.0-39-generic-x86_64-with-glibc2.37
- Python version: 3.12.12
huggingface_hubversion: 0.36.0- PyArrow version: 22.0.0
- Pandas version: 2.2.3
fsspecversion: 2025.9.0
Metadata
Metadata
Assignees
Labels
No labels