-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Short description
tfds.load('imagenet2012_multilabel') fails during the download phase. The helper file
https://storage.googleapis.com/brain-car-datasets/imagenet-mistakes/human_accuracy_v3.0.0.json returns HTTP 403, raising tensorflow_datasets.core.download.downloader.DownloadError.
I have manually downloaded the ImageNet validation set.
Environment information
Operating System: Ubuntu 22.04
Python version: 3.9.21
tensorflow-datasets/tfds-nightly version: tensorflow-datasets 4.9.3
tensorflow/tf-nightly version: tensorflow 2.19.0
Does the issue still exist with the latest tfds-nightly package (pip install --upgrade tfds-nightly)? Yes / No (please check)
- Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ?
Yes
Reproduction instructions
import tensorflow_datasets as tfds
# Fails with HTTP 403 on a helper JSON file
ds = tfds.load('imagenet2012_multilabel')
If you share a colab, make sure to update the permissions to share it.
Link to logs
ds = tfds.load('imagenet2012_multilabel')
2025-08-25 23:01:29.655977: W external/local_xla/xla/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Could not resolve hostname', error details: Could not resolve host: metadata.google.internal".
Downloading and preparing dataset 191.13 MiB (download: 191.13 MiB, generated: 2.50 GiB, total: 2.69 GiB) to /home/.../tensorflow_datasets/imagenet2012_multilabel/3.0.0...
Dl Size...: 0 MiB [00:00, ? MiB/s] | 0/1 [00:00<?, ? url/s]
Dl Completed...: 0%| | 0/1 [00:00<?, ? url/s]
Traceback (most recent call last):
File "", line 1, in
File ".../tensorflow_datasets/core/load.py", line 639, in load
_download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
File ".../tensorflow_datasets/core/load.py", line 498, in _download_and_prepare_builder
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File ".../tensorflow_datasets/core/dataset_builder.py", line 1547, in _download_and_prepare
split_generators = self._split_generators(
File ".../imagenet2012_multilabel_dataset_builder.py", line 115, in _split_generators
_get_multi_labels_and_problematic_images(dl_manager)
File ".../imagenet2012_multilabel_dataset_builder.py", line 48, in _get_multi_labels_and_problematic_images
with tf.io.gfile.GFile(dl_manager.download(_MULTI_LABELS_URL), 'r') as f:
File ".../download_manager.py", line 601, in download
return _map_promise(self._download, url_or_urls)
File ".../downloader.py", line 230, in _sync_download
with _open_url(url, verify=verify) as (response, iter_content):
File ".../downloader.py", line 330, in _assert_status
raise DownloadError(
tensorflow_datasets.core.download.downloader.DownloadError: Failed to get url https://storage.googleapis.com/brain-car-datasets/imagenet-mistakes/human_accuracy_v3.0.0.json. HTTP code: 403
Expected behavior
Dataset downloads and prepares without access-denied errors.
Additional context
It looks like the auxiliary JSON file used to build the multilabel annotations is no longer publicly readable.